Why should we care?

Behavioral economics seeks to understand and explain how individuals make decisions in economic contexts. It examines decisions made by individual agents as shaped by their circumstances, preferences, and biases.

For the first time in history decision-making agents can now be artificially created. Advances in technology have introduced sophisticated large language models (LLMs) capable of mimicking human conversation. With state-of-the-art models now indistinguishable from humans for most people (1) , a pertinent question arises: How similar is economic decision-making by humans and LLMs? If the differences are minimal, LLMs could serve as powerful tools for understanding human decision-making [1]. On the other hand, significant discrepancies between human and LLM decision-making would raise questions about the sources and nature of these biases. In a world in which artificial intelligence makes increasing numbers of decisions, it seems imperative to learn in which ways their preferences differ from ours.

How to go about investigating differences

Behavioral economists frequently rely on laboratory experiments to isolate and analyze how changes in circumstances influence decision-making. This approach is responsible for most of our current knowledge about behavioural economics and especially useful for investigations of AI behaviour as it is very controlled and provides us with a vast established set of results we can compare our findings to. Where to start? There is a very young literature [@Guo.2023] showing LLMs to exhibit similar to human behaviour in both Ultimatum Games and Prisoner Dilemmas. Those are some of the most famous and established experimental setups in behavioural economics. Some others include:

Dictator Game
Public Goods game
Trust Game
3rd Party Punishment Game
Gift Exchange Games
Inequality Aversion (modified DG)
Risk aversion
Prestige Motives
Indirect reciprocity
In-group preferences

This is by no means an exhaustive list, and it’s limited to behavioural econ topics. Other fields like psychology, ethics, and political economy are also of great interest.

I decided to start by adapting the experimental setup of the Global Preference Survey (GPS), which observes time preference, risk preference, positive and negative reciprocity, altruism, as well as trust of 80,000 people in 76 countries [@Falk_et_al.2018]. This provides a valuable benchmark for some of the most prominent economic preferences. Furthermore, the demographically very varied dataset lets us not only compare AIs preferences with those of humanity as a whole (which is problematic anyway), but with a large array of demographic groups.

[1] Because they provide much cheaper test subjects. Furthermore, testing can be much more controlled and invasive (e.g. deleting single neurons to see behavioural effects).