Taking into account the huge volume of photos shared each day on Facebook and Instagram, understanding the text that appears on images is important. And as Facebook moderators can’t look through each post, the social media giant is working on an AI named Rosetta that is to extract text from more than a billion public Facebook and Instagram images and video frames (in a wide variety of languages), every day.
The extracted text will then be passed through a text recognition model trained to understand the text in the context of the image, which will help systems to proactively identify inappropriate or harmful content.
'We perform text extraction on an image in two independent steps: detection and recognition. In the first step, we detect rectangular regions that potentially contain text. In the second step, we perform text recognition, where, for each of the detected regions, we use a convolutional neural network (CNN) to recognize and transcribe the word in the region,' Facebook said in blog post.
According to Facebook, the model is not limited to English and supports different languages and encodings, such as Arabic and Hindi, among others, in a unified model. Some of these present interesting technical challenges, such as right-to-left reading order or stacked characters.
Rosetta will be widely adopted by various products and teams within Facebook and Instagram. Text extracted from images will be used as a feature in various upstream machine learning models, such as those to improve the relevance and quality of photo search, automatically identify content that violates the hate-speech policy of Facebook in various languages and improve the accuracy of classification of photos in News Feed to offer more personalised content.