Why AI image generators face hurdles in spelling and attention-to-detail
Despite the remarkable progress in artificial intelligence (AI), text-to-image generators continue to grapple with spelling and detail recognition. For example, AI systems like DALL-E, often generate nonsensical outputs when tasked with creating a menu for a specific cuisine. Matthew Guzdial, an AI researcher and assistant professor at the University of Alberta, noted that these models struggle to coherently structure their outputs.
Deciphering AI's difficulty with details
AI models encounter similar issues with details, whether they are image or text generators. Asmelash Teka Hadgu, co-founder of Lesan and a fellow at the DAIR Institute, explains that image generators use diffusion models to reconstruct an image from noise. However, these algorithms lack an inherent understanding of rules we consider obvious and often fail to recreate accurate representations when it comes to generating text.
Potential remedies and ongoing challenges
Engineers can enhance AI's detail recognition by supplementing their data sets with training models, specifically designed to instruct AI on how certain objects should appear. However, rectifying spelling issues is not as straightforward. Guzdial pointed out that "the English language is really complicated," making it challenging for AI to master correct spelling, signifying a persistent hurdle in the field.
Deficiencies in text generation
Some AI models, like Adobe Firefly, are programmed not to generate text at all. They produce an image of a blank paper or a white billboard when given simple prompts like "menu at a restaurant," or "billboard with an advertisement." However, these safeguards can be bypassed if you use detailed prompts, exposing the limitations of such protective measures.