Understanding and extracting text within images programmatically was always a huge challenge. If you ever wanted to moderate, curate or filter images there was no foolproof method to do this at scale. Sure you could always hire a team to manually tag images with required meta data, plus add a layer of some basic automation to do this. But if you have millions of images how do you scale and do this within a reasonable timeline?

At re:Invent 2016 Amazon introduced the Rekognition service within the AWS suite that allows developers to add high end image analysis within their solutions. You could detect objects within a scene (like mountains, faces etc.) and extract a lot of context from an image. This year at re:Invent 2017, Amazon added a ton of new features such as text detection. This is truly groundbreaking for me as a developer, as it allows me to concentrate on building my apps and solutions while using Amazon’s industry leading cutting edge technology at literally no cost.

This surely opens up a whole new frontier for developers worldwide and introduces some very interesting use cases. I cannot reiterate how big a deal this is.

Let me try and explain the simplicity in which I can now extract text from an image. Since I already store all my media files usually on S3 buckets, here is my flow:

I have an image stored at this S3 location. Name of my bucket is ‘com.teamavengers.images’ and name of the image is ‘morning/PzcpPsYdsM5QX9xpysxVY0o111leUH0l.jpg’

Sample image for text detection using AWS Rekognition

Using the AWS CLI on my laptop I can trigger the Rekognition API on this image as follows:

The API will return a JSON array of detected text objects within my image.

Text Detection with AWS Rekognition

Pay attention to the “DetectedText” attribute of the objects. It has detected the text “YOU DON’T HAVE TO BE GREAT TO START” in the first object and “BUT YOU HAVE TO START TO BE GREAT” as the DetectedText in the second object. I can simply concatenate these to extract all text present on my image. If you had a look at the image I provided, AWS Rekognition was bang on accurate!

Here is script I wrote in Node.JS using the AWS SDK to execute Rekognition API on any image within my S3 bucket.


There you have it, easy as pie! I can also wrap the above code into a nice Lambda function that can be triggered automatically every time a new image is uploaded into my S3 buckets.

Like I said earlier, there are a ton of applications that can be now built using this service. Out of the box Rekognition currently provides the following:

  1. Object, scene and activity detection
  2. Facial recognition
  3. Facial analysis
  4. Analyse videos with Person Tracking
  5. Unsafe content detection
  6. Celebrity Recognition
  7. Text detection in images

I am sure more and more features will be added to this service. For full documentation visit the Rekognition page here.



