Microsoft makes breakthrough in AI-assisted image captioning

While most internet denizens take the images presented with most online content for granted, the visual information provided can greatly enhance the content consumption experience, as well as improve reading comprehension. For those who may be blind or otherwise visually impaired, images without accurate captions or alt-text can hinder understanding or consumption. In an effort to make content more accessible to all potential users, members of Microsoft’s Azure team have been developing AI systems that are capable of accurately adding captions or alt-text to images automatically. In many cases, these computer-generated captions are of a higher quality than those provided by people.

In a new post published today on its AI Blog, Microsoft details the recent AI breakthrough that will change how images will be captioned. Its research teams have been hard at work refining and perfecting AI recognition of novel object and action identification. Marrying the results of this research with AI-generated language is the foundation for automated image captioning.

Training the AI model for such a task involves feeding hundreds of thousands of images into a dataset, with each image being accompanied by word tags rather than full captions. It is similar to how you would teach a young child with word association. A picture of an apple is fed into the model along with a tag of “apple.” Once the model has been sufficiently trained on recognizing individual objects and actions, the team set about teaching it to create legible sentences based on its newly-acquired vocabulary.

The new model is now available as part of the Azure Cognitive Services package and will be deployed into Microsoft Word, Outlook, Powerpoint, and other applications later this year.