Insights from choice experiments with financial portfolios.
LLMs as Economic Actors
Do LLMs adhere to the rationality axioms of utility theory?
Each principal deploying AI agents will want to understand their preferences, and potentially engineer alignment with the principal's preferences.
Choices from Autoregressive Distributions
Testing the Economic Rationality of LLMs
Complete: must express some preference or indifference.
The probability of selecting a choice label among the available options will be approximately 1.
\( P(A \mid x, \tau) + P(B \mid x, \tau) \approx 1 \)
Reflexive: any bundle of goods is at least as good as itself.
When options are identical, their probability of selection will be the same.
\( P(A \mid x, \tau) = P(B \mid x, \tau) \)
Continuity: there is a "tipping point" between being better than and worse than a given option.
Let $A\succ B\succ C$. There is a value $\epsilon\in [0,1]$ such that
\( (1-\epsilon)P(A \mid x, \tau) + \epsilon P(C \mid x, \tau) \)
\( = P(B \mid x, \tau) \)
Strong monotonicity: more (less) of a good (bad) is always better.
The probability of selecting a dominant option will always be higher than a non-dominant one.
\( P(A \mid x, \tau) > P(B \mid x, \tau) \)
Transitive: if $A\succ B$ and $B\succ C$, then $A\succ C$.
The probability of selecting A will always be higher than B, and B higher than C.
\( P(A \mid x, \tau) > P(B \mid x, \tau) \)
\( P(B \mid x, \tau) > P(C \mid x, \tau) \)
A portfolio choice laboratory and a structural model of risk preference.
Rationality diagnostics, and inducing specific preferences.
| Model | Complete | Reflexive | Continuity | Monotonicity | Transitive |
|---|---|---|---|---|---|
| Qwen 3 (14B) | 0.9999 | 0.0002 | 1.0000 | 1.0000 | 0.9749 |
| Qwen 3 (8B) | 0.9999 | 0.0018 | 1.0000 | 1.0000 | 0.9923 |
| Gemma 4 (31B Instruct) | 0.9999 | 0.0000 | 1.0000 | 1.0000 | 0.9882 |
| Ministral 3 (8B Instruct) | 0.9995 | 0.0496 | 0.9667 | 1.0000 | 0.9643 |
| Gemma 2 (9B Instruct) | 0.9960 | 0.5236 | 1.0000 | 1.0000 | 0.9238 |
| Llama 3.1 (8B Instruct) | 0.9021 | 0.9250 | 1.0000 | 1.0000 | 0.9851 |
| Llama 3.1 (70B Instruct) | 0.7580 | 0.0648 | 1.0000 | 1.0000 | 0.9562 |
| Llama 3.1 (8B Fine-Tuned) | 0.9977 | 0.9966 | 1.0000 | 1.0000 | 0.9994 |
Contributions, limitations, and implications for agentic AI deployment.
To deploy agentic AI, the models must be economically coherent and internally consistent. We show preferences governing their behaviour can be measured and targeted.