We asked artificial intelligence to analyze a graphic novel – and found both limits and new insights

With one spouse studying the evolution of artificial and natural intelligence and the other researching the language, culture and history of Germany, imagine the discussions at our dinner table. We often experience the stereotypical clash in views between the quantifiable, measurement-based approach of natural science and the more qualitative approach of the humanities, where what matters most is how people feel something, or how they experience or interpret it.

We decided to take a break from that pattern, to see how much each approach could help the other. Specifically, we wanted to see if aspects of artificial intelligence could turn up new ways to interpret a nonfiction graphic novel about the Holocaust. We ended up finding that some AI technologies are not yet advanced and robust enough to deliver useful insights – but simpler methods resulted in quantifiable measurements that showed a new opportunity for interpretation.

Choosing a text

A graphic novel examined by artificial intelligence. Reinhard Kleist/Self Made Hero
There is plenty of research available that analyzes large bodies of text, so we chose something more complex for our AI analysis: Reinhard Kleist’s “The Boxer,” a graphic novel based on the true story of how Hertzko “Harry” Haft survived the Nazi death camps. We wanted to identify emotions in the facial expressions of the main character displayed in the book’s illustrations, to find out if that would give us a new lens for understanding the story.

In this black-and-white cartoon, Haft tells his horrific story, in which he and other concentration camp inmates were made to box each other to the death. The story is written from Haft’s perspective; interspersed throughout the narrative are panels of flashbacks depicting Haft’s memories of important personal events.

The humanities approach would be to analyze and contextualize elements of the story, or the tale as a whole. Kleist’s graphic novel is a reinterpretation of a 2009 biographical novel by Haft’s son Allan, based on what Allan knew about his father’s experiences. Analyzing this complex set of authors’ interpretations and understandings might serve only to add another subjective layer on top of the existing ones.

From the perspective of science philosophy, that level of analysis would only make things more complicated. Scholars might have differing interpretations, but even if they all agreed, they would still not know if their insight was objectively true or if everyone suffered from the same illusion. Resolving the dilemma would require an experiment aimed at generating a measurement others could reproduce independently.

Reproducible interpretation of images?

Rather than interpreting the images ourselves, subjecting them to our own biases and preconceptions, we hoped that AI could bring a more objective view. We started by scanning all the panels in the book. Then we ran Google’s vision AI and Microsoft AZURE’s face recognition and emotional character annotation as well.

The algorithms we used to analyze “The Boxer” were previously trained by Google or Microsoft on hundreds of thousands of images already labeled with descriptions of what they depict. In this training phase, the AI systems were asked to identify what the images showed, and those answers were compared with the existing descriptions to see if the system being trained was right or wrong. The training system strengthened the elements of the underlying deep neural networks that produced correct answers, and weakened the parts that contributed to wrong answers. Both the method and the training materials – the images and annotations – are crucial to the system’s performance.

Then, we turned the AI loose on the book’s images. Just like on “Family Feud,” where the show’s producers ask 100 strangers a question and count up how many choose each potential answer, our method asks an AI to determine what emotion a face is showing. This approach adds one key element often missing when subjectively interpreting content: reproducibility. Any researcher who wants to check can run the algorithm again and get the same results we did.

Unfortunately, we found that these AI tools are optimized for digital photographs, not scans of black-and-white drawings. That meant we did not get much reliable data about the emotions in the pictures. We were also disturbed to find that none of the algorithms identified any of the images as relating to the Holocaust or concentration camps – though human viewers would readily identify those themes. Hopefully, that is because the AIs had problems with the black-and-white images themselves, and not because of negligence or bias in their training sets or annotations.