The urgent need to understand the inner workings of LLMs and other machine learning algorithms has given rise to the emergence of a burgeoning field of research known as a “eXplainable AI (XAI)”, sometimes also known as “Interpretable AI”. In this article we intend examine a different methodological strategy to address the alignment problem, one that does not involve the development of new architectures, explanations, or training techniques. Our intention is, firstly, to examine the attempt to tackle the alignment problem with the methods of psychology, which seeks to understand an even more complicated black box, namely: the human brain or, as the case may be, the human mind.
If a person is asked how their brain interprets visual input, or to rate their own ability to offer accurate explanations of how ordinary objects such as bikes or zippers work, they will probably be unable to provide a cogent explanation. Over the last decades, researchers in the field of empirical psychology have advanced a variety of methodological strategies to understand the human black box without having to rely too heavily on what people say (supposing they would have anything to say) about their own mental states or brain processes. The attempt to address the alignment problem by applying the methods of psychology to LLMs has recently led one author to coin the expression machine psychology. We are particularly interested in the attempt to use the methods commonly applied in social psychology studies as a promising strategy to address the alignment problem.
[ PDF ]
© Como citar este artigo:
Araujo, Marcelo de; Nunes, José Luiz; Almeida, Guilherme de. 2024. "LLM (Large Language Models) as surrogate for multi-voice groups: from epistemic reliability to political legitimacy". In: Oliveira, Nythamar; Tauchen, Jair. (ed.). Recasting Bioethics, Neuroethics, and AI Ethics. 1ed.Porto Alegre: Fênix, 2024, vol. 1, p. 159–181.