Team: Mahsan Nourani, Emma Drobina, Brianna Richardson, Prashant Singh

As machine learning has become more widespread, the question of explainability has reared its head. Many machine learning algorithms are black boxes – that is, they take in an input and return an output, but they do not explain how they calculated their output. To solve this, some researchers have proposed the idea of explainable AI, which generate explanations of how they came to their conclusions. But what should these explanations look like? An excessively detailed or technical explanation may only confuse human users further. Furthermore, how might existing domain knowledge affect how users’ trust in a system develops as we show them explanations of labels?

As part of the Research Methods class at the University of Florida, I helped pilot a study testing the effects of expert-level knowledge of entomology on user trust, perception of system accuracy, and time for completion in regards to an image classifier and accompanying explanation. We showed 18 participants (9 entomologists and 9 novices) a series of 8 labeled images of arthropods (insects, spiders, etc.), accompanied by an image with a highlighted explanation for the label. This label could be incorrect or correct, and the arthropod could be easy or hard to identify.

Fig 1. An example image explanation and question shown to survey participants.

Participants were told that the label and explanation were generated by an intelligent system, but they were actually created by an entomology expert. We had three hypotheses: that entomologists will believe the system is less trustworthy, that entomologists will believe the system is less accurate, and that entomologists will take more time to complete the survey.

Fig. 2. The mean and standard deviation of trust between Trial 1 and Trial 8.

After all participants completed the eight trials, we determined that there was a significant difference between the reported accuracy and amount of trust for the novice group vs. the expert entomologist group. This has potential significance for the design of human-friendly explainable systems that could later be explored in a larger project – namely, explanations can be used to mislead non-expert populations. Without the domain knowledge necessary to recognize if a decision is correct on their own, attractive images that seemed easily human-interpretable could be used to disguise significant classification errors in the system.

This project helped me develop my survey design skills and build my knowledge of the statistical methods needed to interpret project results. Additionally, I was able to draw on the skills I built deploying a survey online to be answered by a remote population in my earlier project “Pay Attention!”: Developing the Social Behavior of a Responsive Robot Tutor. Furthermore, this project helped solidify my interest in studying human interaction with machine learning and the ethics of AI.