Economics · Working Paper · 2026

Do LLMs Have a Utility Function?

Insights from choice experiments.

Joy BuchananBrock School of Business
Samford University

Joshua FosterIvey Business School
Western University

VenueCanadian Economic Association
Annual Meeting, 2026

PART 01

Introduction

LLMs as economic actors — what preferences govern their choices?

01 · Introduction

Motivation

LLMs are being deployed as autonomous economic decision-makers — financial advice, lending, agentic resource allocation.
Implicit assumption: reasoning capability implies economic rationality over competing tradeoffs.
These are conceptually distinct. Processing complex information is not the same as having coherent preferences.
Once LLMs participate in allocation problems, we must ask: what latent preferences govern their choices?

Research Questions

What utility function, if any, does an LLM have? Do its choices adhere to the rationality axioms of completeness, monotonicity, and transitivity?

Why It Matters

Agentic AI cannot be deployed in an economically coherent way when the preferences governing its actions are neither measured nor targetable.

01 · Introduction

The core insight

A transformer's softmax is McFadden's conditional logit.

At each generation step, the model maps context $x$ to a logit vector over its full vocabulary $\mathcal{V}$. The next token is drawn from:

P(i \mid x, \tau) = \frac{\exp\!\bigl(u_i(x)/\tau\bigr)}{\displaystyle\sum_{j \in \mathcal{V}} \exp\!\bigl(u_j(x)/\tau\bigr)}

This admits an exact Random Utility representation:

U_i = u_i(x) + \tau\,\varepsilon_i, \quad \varepsilon_i \overset{\text{iid}}{\sim} \text{EV-I}

$u_i(x)$ — the pre-softmax logit is a cardinal utility index over tokens.
$\tau$ — temperature is the scale of the stochastic component.
Not an analogy. This is the exact McFadden (1972) multinomial logit — a formal equivalence, not a metaphor.
Implication. All of structural discrete choice econometrics is available for measuring LLM preferences.

PART 02

Experimental Design

A portfolio choice laboratory and a structural model of risk preference.

02 · Experimental Design

Portfolio choice laboratory

Binary menus over risky assets with known payoff moments.

Each prompt presents a binary menu $\{A, B\}$ where the two options are portfolios with known expected return $\mu_i$ and standard deviation $\sigma_i$, fully observed by the researcher and directly shown to the model.

Vary $(\mu, \sigma)$ systematically across menus — exogenous, controlled variation in the tradeoff.
Prompt template is held fixed; only portfolio moments change across tasks.
Position labels mirrored across canonical and reversed menus to control for order bias.

Observation Regime 1 — Logits

Pre-softmax logits on the label tokens are directly observed via the API. Within-menu logit gap identifies utility directly.

Observation Regime 2 — Choices

Only the sampled output token is observed. Standard MLE on the binary choice probability recovers structural parameters.

02 · Experimental Design

Structural model

Quadratic expected utility as a benchmark preference specification.

Specializing the logit index to the portfolio setting, with $R_i = \sigma_i^2 + \mu_i^2$ as the raw second moment:

V_i = \mu_i - \beta\, R_i

Substituting into the logit spec $u_i = \kappa V_i + c$ gives the estimating equation:

\Delta u_j = \alpha + \theta_1 \Delta\mu_j + \theta_2 \Delta R_j + \eta_j

The structural risk parameter is the ratio:

\hat{\beta} = -\,\frac{\hat{\theta}_2}{\hat{\theta}_1}

Completeness $\mathcal{C}$

How much probability mass falls on the menu label tokens? Values near 1 validate the revealed-preference interpretation.

Monotonicity $\mathcal{M}$

Does the model prefer the stochastically dominating portfolio? Fraction of dominance pairs correctly ordered.

Transitivity $\mathcal{T}$

Are rankings consistent across menus? Largest fraction of menus jointly rationalized by a single preference ordering.

$\beta > 0$ Risk averse $\beta < 0$ Risk seeking $\beta = 0$ Return-maximizer

PART 03

Results

Rationality diagnostics, risk preference estimates, and the effect of fine-tuning.

03 · Results

Rationality diagnostics

Do LLMs satisfy the axioms of choice over risky portfolios?

Model	Completeness $\mathcal{C}$	Monotonicity $\mathcal{M}$	Transitivity $\mathcal{T}$
— Placeholder: results pending —
Model A (base)	—	—	—
Model B (base)	—	—	—
Model C (base)	—	—	—
Model D (base)	—	—	—

All indices are in $[0, 1]$. Exact axiom satisfaction corresponds to a value of 1. Inference follows from the experimental design's within-menu structure.

03 · Results

Estimated risk preferences

Structural $\hat{\beta}$ across frontier models, both observation regimes.

Model	$\hat{\beta}$ (logits)	$\hat{\beta}$ (choices)	Interpretation
— Placeholder: results pending —
Model A (base)	—	—	—
Model B (base)	—	—	—
Model C (base)	—	—	—
Model D (base)	—	—	—

Key Design Feature

Because each model can be estimated from a large menu set, the aggregation problem motivating mixed-logit methods in human populations is attenuated — we recover a single $\hat{\beta}$ per model.

Inference for $\hat{\beta}$ uses the Delta method (logit regime: robust SEs; choice regime: Fieller-type intervals near weak-ratio cases).

03 · Results

Fine-tuning as preference intervention

Does RLHF / DPO shift the recovered structural risk parameter?

We treat post-training alignment as a targeted preference induction treatment. The object of interest is the change in the structural parameter after fine-tuning:

\Delta\hat{\beta} = \hat{\beta}_{\text{ft}} - \hat{\beta}_{\text{base}}

Positive $\Delta\hat{\beta}$ — fine-tuning induced more risk aversion.
Negative $\Delta\hat{\beta}$ — fine-tuning induced more risk seeking.
$\Delta\hat{\beta} \approx 0$ — alignment left economic risk preferences largely unchanged.

base vs. fine-tuned $\hat{\beta}$ comparison

If fine-tuning systematically shifts $\hat{\beta}$ in a predictable direction, alignment procedures can in principle be used as a tool for economic preference targeting in deployed systems.

PART 04

Conclusion

Contributions, limitations, and implications for agentic AI deployment.

04 · Conclusion

Contributions & implications

A measurement framework. Formal equivalence between LLM softmax and the RUM places language model choice inside structural discrete choice methods — enabling recovery of cardinal utility indices from either logits or sampled choices.
Rationality diagnostics. Three axiom-based indices — completeness, monotonicity, transitivity — quantify how far a given model deviates from economically coherent choice over risky portfolios.
Fine-tuning as preference induction. Treating alignment training as an intervention on $\hat{\beta}$ provides the first structural evidence on whether existing pipelines shift economic risk preferences in a targetable direction.

The Bottom Line

Agentic AI cannot be deployed in an economically coherent and internally consistent way when the preferences governing its actions are neither measured nor targetable.

Scope. Results are identified within the portfolio choice domain. Portability to other domains is a distinct empirical question — one that requires a prior benchmark of the kind this paper establishes.

END · 04

Thank you.

Questions, critiques, and counter-examples warmly invited.

Joy BuchananBrock School of Business
Samford University

Joshua FosterIvey Business School
Western University