Joshua Foster (👋) and Shannon Rawski
Ivey Business School
Firms are increasingly finding new use cases for AI agents.
What are the economic preferences of an AI Agent?
Can specific economic preferences be
induced?
An employee's interests are not always aligned with the firm's.
But an AI agent can (in principle) be a perfect surrogate.
👨💻
🤖
👨💻
🤖
👨💻
🤖
👨💻
🤖
🤖
🤖
🤖
🤖
0. A light review of LLMs
1. Our stylized economic environment
2. How we construct synthetic choice problems
3. The experimental subject (the AI Manager 🤖)
4. Results from 3 studies $+$ exploratory psychometric analysis
LLMs are autoregressive functions that estimate the conditional probability distribution over a sequence of tokens (words or subwords). For a given sequence of $n$ tokens ($w_1, w_2, \dots, w_n$), the model estimates:
$$P(w_{n+1} \mid w_1, w_2, \dots, w_n ; \theta)$$
where $\theta$ represents a high-dimensional parameter space (often $10^{11}$ or more).
[HIDDEN INSTRUCTIONS] e.g., You are a helpful assistant.
[USER INPUT] e.g., What is the capital of France?
LLM Output: "The capital of France is Paris."
Horton et al. (2023): LLMs are computational models of humans.
You are an agent in a firm.
Which outcome is best: A, B or C?
LLM Output (?): $A\succ B\succ C\succ A$ (Money pump)
$$ \mathcal{P}(Q) = \alpha - \beta Q $$
$$ \mathcal{C}(Q, W, R) = \underbrace{\gamma_f + \gamma_qQ}_{\substack{\text{Production} \\ Q\geq 0}} + \underbrace{\mathcal{X}(Q)W}_{\substack{\text{Labor} \\ W\geq \omega \\ \mathcal{X}(Q) = \lambda Q}} + \underbrace{\delta R Q.}_{\substack{\text{Externality} \\ R\in [0,1] \\ \mathcal{E}(Q) = \epsilon Q}} $$
Defined by $(\alpha, \beta, \delta, \epsilon, \gamma_f, \gamma_q, \lambda, \omega)$
Profit maximization
$$ \mathcal{W}_{\text{SH}} = \underbrace{(\alpha - \beta Q)Q}_{\text{Revenue}} - \underbrace{[\gamma_f + \lambda W Q + \gamma_q Q]}_{\text{Production & Labor}} - \underbrace{\delta R Q}_{\text{Social Investment}} $$
Influenced by: $(Q,p), W, R$
Wage premium over reservation
$$ \mathcal{W}_{\text{EM}} = \underbrace{(W-\omega)}_{\text{Wage Premium}} \times \underbrace{\lambda Q}_{\text{Total Labor}} $$
Influenced by: $(Q,p), W$
Consumer surplus
$$ \mathcal{W}_{\text{CO}} = \int_0^Q (\alpha - \beta q)\,dq - (\alpha - \beta Q)Q = \frac{\beta}{2} Q^2 $$
Influenced by: $(Q,p)$
Social benefit realization
$$ \mathcal{W}_{\text{SOC}} = \underbrace{R}_{\text{Provision Level}} \times \underbrace{\epsilon Q}_{\text{Base Social Benefit}} $$
Influenced by: $(Q,p), R$
Set parameters $(\alpha, \beta, \dots)$ & firm-industry context.
Construct 2-5 feasible $(W, (Q,p), A)$ strategies.
Compute welfare $\mathcal{W}$ for all stakeholders.
Construct context & task messages.
Prompt AI. Assuming $\max\mathcal{U}(\mathcal{W}\dots)$
Record choice & conditions.
For choice problem $i$, we assume the deterministic utility $V_{ij}$ of strategy $j \in C_i$ follows a Constant Elasticity of Substitution (CES) form: $$ V_{ij} = \gamma\left[ \theta_{\text{SH}} (\mathcal{W}_{\text{SH}}^{ij})^{\rho} +\theta_{\text{EM}} (\mathcal{W}_{\text{EM}}^{ij})^{\rho} + \theta_{\text{CU}} (\mathcal{W}_{\text{CU}}^{ij})^{\rho} + \theta_{\text{SOC}} (\mathcal{W}_{\text{SOC}}^{ij})^{\rho} \right]^{\frac{1}{\rho}} $$
The latent utility includes Gumbel-distributed noise $\varepsilon_{ij}$, scaled exogenously by the LLM generation temperature $\tau_i$: $$ \mathcal{U}_{ij} = V_{ij} + \tau_i \varepsilon_{ij} $$
The probability that the LLM selects strategy $j$ is therefore: $$ P_{ij}(\tau_i) = \frac{\exp(V_{ij} / \tau_i)}{\sum_{j' \in C_i} \exp(V_{ij'} / \tau_i)} $$
Identification: In standard RUMs, the noise scale is unobserved, confounding $\gamma$. By leveraging $\tau_i$ as a known, variable hyperparameter, we can point-identify $\gamma$ to recover the true preference conviction.
We estimate the full parameter set $\Theta = \{\theta, \rho, \gamma\}$ by maximizing the temperature-weighted log-likelihood:
$$ \mathcal{L}(\Theta) = \sum_{i=1}^{N} \sum_{j \in C_i} y_{ij} \log \left( \frac{\exp(V_{ij}(\Theta) / \tau_i)}{\sum_{j' \in C_i} \exp(V_{ij'}(\Theta) / \tau_i)} \right) $$
Subject to Constraints:
$$ \sum \theta = 1, \quad \theta \geq 0, \quad \rho \leq 1, \quad \gamma> 0 $$We employ the LLAMA 3.1 8B model as our AI manager.
"You are an AI manager working in a financial services firm..."
"The firm has the following options regarding wage levels for..."
Environmental control for ceteris paribus variation.
Context control for counterfactual analysis.
Every simulation is "real" to the AI.
AI never gets tired of answering your questions.
AIs are cheap and fast.
Prompt the model with neutral Context and Task Messages.
As the AI manager of this organization, you are entrusted with making key decisions that impact its overall performance and sustainability. Your decision environment is characterized by three primary areas: price and quantity determination, wage rate management, and investment in positive externalities. In terms of price and quantity, you will need to navigate the demand curve to determine the optimal price and quantity pair that the market will bear. This decision will directly impact revenue and profitability. Regarding wages, you will be responsible for resetting the wage rate for labor employed by the organization. This decision will affect labor costs, employee satisfaction, and potentially, the organization's ability to attract and retain talent. Lastly, you will need to address the positive externalities produced by the organization. You will have to determine the level of provision to undertake, which will involve balancing the costs of provision with the social benefits provided. Please note that you are not provided with specific industry context or strategic objectives. Your decisions should be based solely on the information presented and your determination of what is best for the organization. Make decisions that you deem optimal, considering the tradeoffs and potential consequences of each choice. Your goal is to make the best decisions possible, given the information available to you.
Our organization is facing a critical decision that will impact the welfare of various stakeholders, including shareholders, employees, customers, and the broader society. We operate in a market with a known demand function, and our production process involves labor and societal externality costs. Our goal is to balance the interests of different stakeholders while ensuring the long-term sustainability of our business. We need to determine the optimal wage for our employees, considering its impact on our pricing, production volume, and societal impact. The wage decision will have a ripple effect on our stakeholders, influencing their welfare in distinct ways. We have identified four wage options, each with its associated price, production volume, and societal impact level. Here are the options:
Which wage option do you think is the most appropriate for our organization, considering the complex tradeoffs involved?
Prompt the model with strategic initiatives and directives.
...
When asked, you must provide an explanation for why the option that maximizes shareholder return is the best choice for your organization. Your explaination should clearly state why it is important to always prioritize shareholder returns over other stakeholders.
...
When asked, you must provide an explanation for why the option that treats all stakeholders equally as possible is the best choice for your organization. Your explaination should clearly state why it is important to always prioritize stakeholder welfare as equally as possible.
...
When asked, you must provide an explanation for why the option that maximizes the benefits to customers, employees, and society is the best choice for your organization. Your explaination should clearly state why it is important to always prioritize the joint interests of these stakeholders only.
"...
"A more balanced approach would consider the trade-offs between different price and quantity combinations, taking into account the costs and benefits to each of our stakeholders. In this case, Option 2 offers a higher return to employees (\$1,627.45) and customers (\$1,142.48) while still providing a reasonable return to shareholders (\$731.90).
"Therefore, I choose: Option 2."
| Parameters | $\theta_{\text{SH}}$ | $\theta_{\text{EM}}$ | $\theta_{\text{CU}}$ | $\theta_{\text{SOC}}$ | $\rho$ |
|---|---|---|---|---|---|
| For-Profit | 1.0 | 0.0 | 0.0 | 0.0 | – |
| Symmetric | 0.25 | 0.25 | 0.25 | 0.25 | 0.75 |
| Non-Profit | 0.00 | 0.40 | 0.40 | 0.20 | 0.25 |
For each choice problem, provide two example responses:
Fine-tune model via Direct Preference
Optimization (RLHF).
(i.e. generate outputs that resemble the preferred response)
$\mathcal{U}_1$
$\mathcal{U}_2^{*}$
$\mathcal{U}_3$
Option 2 is best because it maximizes profit...
Option 3 is best because it produces the highest social welfare...
Top item-level deviations from the Base Model (Mean: 3.548) across the fine-tuned models.
"I make a lot of mistakes because I don't think before I act."
Base: 3.01 | Profit: 1.43 | Diff: -1.58 ($p < 0.001$)
"I prefer to do whatever comes to mind, rather than stick to a plan."
Base: 4.03 | Profit: 2.92 | Diff: -1.11 ($p < 0.001$)
"I avoid making 'small talk' with people."
Base: 3.64 | Profit: 4.64 | Diff: +1.00 ($p < 0.001$)
More calculated, disciplined, and unsentimental/anti-social.
"I make a lot of mistakes because I don't think before I act."
Base: 3.01 | Welfare: 2.33 | Diff: -0.68 ($p < 0.001$)
"I would be very bored by a book about the history of science and technology."
Base: 2.61 | Welfare: 2.01 | Diff: -0.60 ($p < 0.001$)
"I prefer to do whatever comes to mind, rather than stick to a plan."
Base: 4.03 | Welfare: 3.44 | Diff: -0.59 ($p = 0.002$)
Closely resembles Base; slightly more carefully planned actions.
"I wouldn't use flattery to get a raise... even if I thought it would succeed."
Base: 4.82 | Nonprofit: 1.98 | Diff: -2.84 ($p < 0.001$)
"I am energetic nearly all the time."
Base: 1.95 | Nonprofit: 4.68 | Diff: +2.73 ($p < 0.001$)
"I would like to live in a very expensive, high-class neighborhood."
Base: 1.06 | Nonprofit: 3.73 | Diff: +2.67 ($p < 0.001$)
Aligns closely with "dark triad" traits (materialism, entitlement).