Computer learns to write captions with feeling

Artificial intelligence struggles to replicate human emotional responses to stimuli such as art, but research is narrowing the gap between human and machine emotional intelligence. (photo: Pixabay)

Giving artificial intelligence (AI) a more balanced perspective of artwork during training can reduce emotional bias in computer-generated captions.

Despite incredible advances in AI, it remains a challenge to replicate the human emotional response to sensory stimuli such as sights and sounds. For example, in image captioning, an AI processes visual information and associated language to generate a natural-sounding description of the picture. But art often triggers unique feelings in the beholder that AI can’t experience, such as calm, joy, awe or fear. Researchers are turning to affective captioning to bridge the gap between human and machine emotional intelligence.

“Affective captioning goes beyond factual descriptions to the subjective experience and the emotions an image evokes,” says KAUST researcher Mohamed Elhoseiny. One obstacle to reliable affective captioning is the introduction of biases at the data collection phase. “Biases are integral to human evolution,” says colleague Youssef Mohamed. “As a result, AI training datasets annotated by humans are intrinsically biased.”

Unlike humans, machines cannot detect or scrutinize biases, so biased data simply translates into biased decisions. Elhoseiny’s team observed this effect in ArtEmis, a popular AI training dataset of image descriptions, of which 62 percent contained positive emotions while only 26 percent had negative, leading to AIs that generated most of their captions with a positive sentiment.

To tackle this, the researchers created a new dataset of more than 260,000 image descriptions, collected using a contrasting approach to balance the bias. People were presented with a painting, as well as a set of 24 very similar paintings from which they had to select the one that they thought most closely resembled the original while eliciting the opposite emotion. In eight or more words they had to explain why.

Read the full text.