<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.3.4">Jekyll</generator><link href="/feed.xml" rel="self" type="application/atom+xml" /><link href="/" rel="alternate" type="text/html" /><updated>2026-05-15T09:34:51+00:00</updated><id>/feed.xml</id><title type="html">Chris Kremer</title><subtitle>Economics, technology, and data ethics.</subtitle><entry><title type="html">Code for Econometrics Homework</title><link href="/2025/01/30/code-for-econometrics-homework.html" rel="alternate" type="text/html" title="Code for Econometrics Homework" /><published>2025-01-30T00:00:00+00:00</published><updated>2025-01-30T00:00:00+00:00</updated><id>/2025/01/30/code-for-econometrics-homework</id><content type="html" xml:base="/2025/01/30/code-for-econometrics-homework.html"><![CDATA[<p>A minimal implementation of Ordinary Least Squares regression in R, demonstrating manual computation of coefficients, standard errors, hypothesis tests, and robust inference.</p>

<div style="margin-bottom: 1rem;">
  <button onclick="copyCode()" style="padding: 0.5rem 1rem; cursor: pointer; margin-right: 0.5rem;">Copy Code</button>
  <a href="/ols_minimal.R" download="" style="padding: 0.5rem 1rem; text-decoration: none; background: #f4f4f4; border: 1px solid #ddd; color: #333;">Download R File</a>
</div>

<pre id="r-code" style="background: #f6f8fa; padding: 1rem; overflow-x: auto; border-radius: 4px; border: 1px solid #e1e4e8;"><code># Manual OLS - Minimalist Version
wage2 &lt;- read.csv("lwage.csv")

y &lt;- wage2$lwage
educ &lt;- wage2$educ
exper &lt;- wage2$exper
n &lt;- length(y)

# Model 1: lwage = b1 + b2*educ + b3*exper
X &lt;- cbind(1, educ, exper)
k &lt;- ncol(X)
df &lt;- n - k

XtX &lt;- t(X) %*% X
XtX_inv &lt;- solve(XtX)
Xty &lt;- t(X) %*% y
beta &lt;- XtX_inv %*% Xty

y_hat &lt;- X %*% beta
e &lt;- y - y_hat
RSS &lt;- as.numeric(t(e) %*% e)
sigma2 &lt;- RSS / df
se &lt;- sqrt(diag(sigma2 * XtX_inv))
t_stat &lt;- beta / se
p_val &lt;- 2 * pt(-abs(t_stat), df)

TSS &lt;- sum((y - mean(y))^2)
R2 &lt;- 1 - RSS/TSS
R2_adj &lt;- 1 - (RSS/df) / (TSS/(n-1))
F_stat &lt;- ((TSS - RSS)/(k-1)) / (RSS/df)

cat("\n=== MODEL 1: lwage ~ educ + exper ===\n")
cat("Coefficients:\n")
print(data.frame(Estimate=as.numeric(beta), SE=se, t=as.numeric(t_stat), p=as.numeric(p_val),
                 row.names=c("Intercept","educ","exper")))
cat("\nResidual SE:", sqrt(sigma2), "| R2:", R2, "| Adj R2:", R2_adj, "| F:", F_stat, "\n")

# HC3 robust standard errors
h &lt;- diag(X %*% XtX_inv %*% t(X))
hc3_weights &lt;- as.numeric(e)^2 / (1 - h)^2
meat &lt;- t(X * hc3_weights) %*% X
var_hc3 &lt;- XtX_inv %*% meat %*% XtX_inv
se_hc3 &lt;- sqrt(diag(var_hc3))

cat("\n=== HC3 ROBUST STANDARD ERRORS ===\n")
cat("Classical SE:", se, "\n")
cat("HC3 SE:      ", se_hc3, "\n")

# 95% CI with HC3
t_crit &lt;- qt(0.975, df)
ci_lo &lt;- beta - t_crit * se_hc3
ci_hi &lt;- beta + t_crit * se_hc3
cat("\n95% CI (HC3):\n")
print(data.frame(Estimate=as.numeric(beta), Lower=as.numeric(ci_lo), Upper=as.numeric(ci_hi),
                 row.names=c("Intercept","educ","exper")))

# t-test for educ at 2% significance
t_crit_2 &lt;- qt(0.99, df)
t_educ &lt;- as.numeric(beta[2]) / se_hc3[2]
p_educ &lt;- 2 * pt(-abs(t_educ), df)
cat("\n=== T-TEST FOR EDUC (alpha=0.02, HC3) ===\n")
cat("t =", t_educ, "| critical = +/-", t_crit_2, "| p =", p_educ, "\n")
cat("Decision:", ifelse(abs(t_educ) &gt; t_crit_2, "REJECT H0", "FAIL TO REJECT H0"), "\n")

# Model 2: with interaction
educ_exper &lt;- educ * exper
X2 &lt;- cbind(1, educ, exper, educ_exper)
k2 &lt;- ncol(X2)
df2 &lt;- n - k2

X2tX2_inv &lt;- solve(t(X2) %*% X2)
beta2 &lt;- X2tX2_inv %*% (t(X2) %*% y)
e2 &lt;- y - X2 %*% beta2
RSS2 &lt;- as.numeric(t(e2) %*% e2)
sigma2_2 &lt;- RSS2 / df2
se2 &lt;- sqrt(diag(sigma2_2 * X2tX2_inv))
t_stat2 &lt;- beta2 / se2
p_val2 &lt;- 2 * pt(-abs(t_stat2), df2)

R2_2 &lt;- 1 - RSS2/TSS
R2_adj2 &lt;- 1 - (RSS2/df2) / (TSS/(n-1))
F_stat2 &lt;- ((TSS - RSS2)/(k2-1)) / (RSS2/df2)

cat("\n=== MODEL 2: lwage ~ educ + exper + educ*exper ===\n")
cat("Coefficients:\n")
print(data.frame(Estimate=as.numeric(beta2), SE=se2, t=as.numeric(t_stat2), p=as.numeric(p_val2),
                 row.names=c("Intercept","educ","exper","educ:exper")))
cat("\nResidual SE:", sqrt(sigma2_2), "| R2:", R2_2, "| Adj R2:", R2_adj2, "\n")

cat("\nMarginal effect of educ = ", beta2[2], " + ", beta2[4], " * exper\n", sep="")
cat("At exper=0:", beta2[2], "| At exper=10:", beta2[2] + beta2[4]*10, "| At exper=20:", beta2[2] + beta2[4]*20, "\n")

cat("\n=== HYPOTHESIS FOR INTERACTION ===\n")
cat("H0: beta4 = 0 (return to educ does not depend on exper)\n")
cat("H1: beta4 &gt; 0 (one-tailed)\n")

# Model 3: with parental education
idx &lt;- complete.cases(wage2$lwage, wage2$educ, wage2$exper, wage2$meduc, wage2$feduc)
n3 &lt;- sum(idx)
y3 &lt;- wage2$lwage[idx]
educ3 &lt;- wage2$educ[idx]
exper3 &lt;- wage2$exper[idx]
meduc3 &lt;- wage2$meduc[idx]
feduc3 &lt;- wage2$feduc[idx]

X3_r &lt;- cbind(1, educ3, exper3, educ3*exper3)
X3_u &lt;- cbind(1, educ3, exper3, educ3*exper3, meduc3, feduc3)
k3_u &lt;- ncol(X3_u)
df3_u &lt;- n3 - k3_u

beta3_r &lt;- solve(t(X3_r) %*% X3_r) %*% (t(X3_r) %*% y3)
beta3_u &lt;- solve(t(X3_u) %*% X3_u) %*% (t(X3_u) %*% y3)

RSS3_r &lt;- as.numeric(t(y3 - X3_r %*% beta3_r) %*% (y3 - X3_r %*% beta3_r))
RSS3_u &lt;- as.numeric(t(y3 - X3_u %*% beta3_u) %*% (y3 - X3_u %*% beta3_u))

sigma2_3u &lt;- RSS3_u / df3_u
se3_u &lt;- sqrt(diag(sigma2_3u * solve(t(X3_u) %*% X3_u)))
t_stat3 &lt;- beta3_u / se3_u
p_val3 &lt;- 2 * pt(-abs(t_stat3), df3_u)

cat("\n=== MODEL 3: + meduc + feduc (n=", n3, ") ===\n", sep="")
cat("Coefficients:\n")
print(data.frame(Estimate=as.numeric(beta3_u), SE=se3_u, t=as.numeric(t_stat3), p=as.numeric(p_val3),
                 row.names=c("Intercept","educ","exper","educ:exper","meduc","feduc")))

# F-test: H0: beta_meduc = beta_feduc = 0
q &lt;- 2
F3 &lt;- ((RSS3_r - RSS3_u) / q) / (RSS3_u / df3_u)
F_crit &lt;- qf(0.95, q, df3_u)
p_F &lt;- 1 - pf(F3, q, df3_u)

cat("\n=== F-TEST: JOINT SIGNIFICANCE OF PARENTAL EDUCATION ===\n")
cat("H0: beta_meduc = beta_feduc = 0\n")
cat("RSS_r:", RSS3_r, "| RSS_u:", RSS3_u, "\n")
cat("F =", F3, "| F_crit (5%) =", F_crit, "| p =", p_F, "\n")
cat("Decision:", ifelse(F3 &gt; F_crit, "REJECT H0 - jointly significant", "FAIL TO REJECT H0"), "\n")</code></pre>

<script>
function copyCode() {
  const code = document.getElementById('r-code').innerText;
  navigator.clipboard.writeText(code).then(function() {
    alert('Code copied to clipboard!');
  }, function(err) {
    const textarea = document.createElement('textarea');
    textarea.value = code;
    document.body.appendChild(textarea);
    textarea.select();
    document.execCommand('copy');
    document.body.removeChild(textarea);
    alert('Code copied to clipboard!');
  });
}
</script>]]></content><author><name></name></author><summary type="html"><![CDATA[A minimal implementation of Ordinary Least Squares regression in R, demonstrating manual computation of coefficients, standard errors, hypothesis tests, and robust inference.]]></summary></entry><entry><title type="html">The Behavioural Economics of Artificial Agents (DRAFT)</title><link href="/economics/2024/12/01/behavioural-AI-Investigation.html" rel="alternate" type="text/html" title="The Behavioural Economics of Artificial Agents (DRAFT)" /><published>2024-12-01T09:00:00+00:00</published><updated>2024-12-01T09:00:00+00:00</updated><id>/economics/2024/12/01/behavioural-AI-Investigation</id><content type="html" xml:base="/economics/2024/12/01/behavioural-AI-Investigation.html"><![CDATA[<h1 id="why-should-we-care">Why should we care?</h1>
<p>Behavioral economics seeks to understand and explain how individuals make decisions in economic contexts. It examines decisions made by individual agents as shaped by their circumstances, preferences, and biases.</p>

<p>For the first time in history decision-making agents can now be artificially created.
Advances in technology have introduced sophisticated large language models (LLMs) capable of mimicking human conversation. With state-of-the-art models now indistinguishable from humans for most people (<a href="https://arxiv.org/abs/2405.08007">1</a>) , a pertinent question arises: How similar is economic decision-making by humans and LLMs? If the differences are minimal, LLMs could serve as powerful tools for understanding human decision-making [1]. On the other hand, significant discrepancies between human and LLM decision-making would raise questions about the sources and nature of these biases.
In a world in which artificial intelligence makes increasing numbers of decisions, it seems imperative to learn in which ways their preferences differ from ours.</p>

<h1 id="how-to-go-about-investigating-differences">How to go about investigating differences</h1>

<p>Behavioral economists frequently rely on laboratory experiments to isolate and analyze how changes in circumstances influence decision-making. This approach is responsible for most of our current knowledge about behavioural economics and especially useful for investigations of AI behaviour as it is very controlled and provides us with a vast established set of results we can compare our findings to. Where to start?
There is a very young literature [@Guo.2023] showing LLMs to exhibit similar to human behaviour in both Ultimatum Games and Prisoner Dilemmas. Those are some of the most famous and established experimental setups in behavioural economics. Some others include:</p>
<ul>
  <li>Dictator Game</li>
  <li>Public Goods game</li>
  <li>Trust Game</li>
  <li>3rd Party Punishment Game</li>
  <li>Gift Exchange Games</li>
  <li>Inequality Aversion (modified DG)</li>
  <li>Risk aversion</li>
  <li>Prestige Motives</li>
  <li>Indirect reciprocity</li>
  <li>In-group preferences</li>
</ul>

<p>This is by no means an exhaustive list, and it’s limited to behavioural econ topics. Other fields like psychology, ethics, and political economy are also of great interest.</p>

<p>I decided to start by adapting the experimental setup of the Global Preference Survey (GPS), which observes time preference, risk preference, positive and negative reciprocity, altruism, as well as trust of 80,000 people in 76 countries [@Falk_et_al.2018]. This provides a valuable benchmark for some of the most prominent economic preferences. Furthermore, the demographically very varied dataset lets us not only compare AIs preferences with those of humanity as a whole (which is problematic anyway), but with a large array of demographic groups.</p>

<p>.</p>

<p>[1] Because they provide much cheaper test subjects. Furthermore, testing can be much more controlled and invasive (e.g. deleting single neurons to see behavioural effects).</p>]]></content><author><name></name></author><category term="Economics" /><summary type="html"><![CDATA[Why should we care? Behavioral economics seeks to understand and explain how individuals make decisions in economic contexts. It examines decisions made by individual agents as shaped by their circumstances, preferences, and biases.]]></summary></entry><entry><title type="html">Market Selection Simulation (DRAFT)</title><link href="/simulation/2024/12/01/market-selection.html" rel="alternate" type="text/html" title="Market Selection Simulation (DRAFT)" /><published>2024-12-01T09:00:00+00:00</published><updated>2024-12-01T09:00:00+00:00</updated><id>/simulation/2024/12/01/market-selection</id><content type="html" xml:base="/simulation/2024/12/01/market-selection.html"><![CDATA[<h2 id="predictive-ability-of-markets">Predictive Ability of Markets</h2>

<p>Markets like the ones for capital allocation and predictions fulfill the function of allocating resources. They provide a selection mechanism to differentiate between productive and less productive projects (or likely versus unlikely outcomes). But the important mechanism for why markets have such nice outcomes is not the way allocators influence which projects to fund (which predictions to make), it’s the way the real world outcomes are able to shape the allocators.</p>

<p>To illustrate this point I created a very simple toy model. It simulates a market by taking 4 input parameters: The number of bettors, the variance of their predictive ability (we assume normal distribution), the number of rounds and the fraction they bet in each round.
All bettors have the same initial endowment and invest the same fixed fraction of their current net worth in each round. Each round they either win, in which case their input is doubled, or lose, in which case their input is lost.</p>

<h1 id="effects-on-inequality">Effects on Inequality</h1>

<p>Running this market for a given number of rounds quickly shows increasing inequality between bettors. The higher the predictive ability of a bettor, the faster she accumulates capital.</p>

<h1 id="effects-on-market-efficiency">Effects on Market Efficiency</h1>

<p>The bets placed scale linearly with the wealth of the bettor. The higher the bettors wealth, the more sway she has in the market prediction. 
We found: predictive ability correlates with wealth and wealth correlates (perfectly) with influence in the market. Thus, the participants making the best predictions influence market predictions most.</p>

<h1 id="comparative-static-analysis">Comparative Static Analysis</h1>

<p>Now we can examine how this selection effect for capital allocators responds to changes in our four input variables:</p>
<ol>
  <li>Number of bettors 
The higher the number of bettors, the higher the predictive ability the market can achieve, as the market benefits from outliers in predictive ability</li>
  <li>Variance in Ability
Market efficiency also increases in variance of ability, for the same reason.</li>
  <li>Number of Rounds
All of the described effects are long-term effects; only in repeated games does this mechanism matter. The higher the number of rounds the closer we asymptotically converge to the optimal predictive power of the participants.</li>
  <li>Bet Fraction
Impacts the volatility of the results. If participants bet a large fraction, few particularly lucky or unlucky outcomes might bring bettors to a net worth that does not reflect their predictive ability.</li>
</ol>

<p>Feel free to play around with the model here:</p>

<h1 id="market-selection-simulation">Market Selection Simulation</h1>

<p><a href="https://market-selection.streamlit.app/" target="_blank">
    <button>Open Market Selection Simulation</button>
</a></p>

<p>Feel free to adjust the parameters and see how they affect the outcomes.</p>]]></content><author><name></name></author><category term="simulation" /><summary type="html"><![CDATA[Predictive Ability of Markets]]></summary></entry><entry><title type="html">Generalizing from Lab Experiments</title><link href="/jekyll/update/2024/04/15/On-the-Generalizability-of-Lab-Experiments.html" rel="alternate" type="text/html" title="Generalizing from Lab Experiments" /><published>2024-04-15T12:10:40+00:00</published><updated>2024-04-15T12:10:40+00:00</updated><id>/jekyll/update/2024/04/15/On-the-Generalizability-of-Lab-Experiments</id><content type="html" xml:base="/jekyll/update/2024/04/15/On-the-Generalizability-of-Lab-Experiments.html"><![CDATA[<h2 id="introduction">Introduction</h2>

<p>This term paper was written as part of the “Social Preference” course at
Humboldt Universität zu Berlin. It addresses a relatively recent
controversy in empirical economics with wide-ranging consequences for
the way future research will be conducted. The debate centers around
publications by Steven D. Levitt and John A. List which have been
perceived as critical of laboratory setups for economic experiments.
Colin Camerer answered Levitt and List, laying out various arguments
favoring lab experiments. Camerer’s publication will be the focus of
this analysis. The first part will summarize the positions of Levitt and
List. Then, the second part will summarize Camerer’s critique of their
position before critiquing Camerer’s critique and connecting the paper
to our Social Preference course in the third and fourth parts.</p>

<h2 id="summary-of-the-levitt--list-position">Summary of the Levitt &amp; List position</h2>

<p>Camerer seeks to address the claims made by Steven D. Levitt and John A.
List in three of their publications
[@Levitt.2007; @Levitt.2007b; @Levitt.2008] (LL). He primarily focuses
on one of these papers titled: "Viewpoint: On the Generalizability of
Lab Behavior to the Field” [@Levitt.2007]. Hence, I, too, will focus on
the arguments brought forward in this publication. The authors describe
their goal with this work to: “summarize, in a provocative manner, some
of the important factors at work when extrapolating results from
laboratory experiments to the field.” In particular, they focus on four
aspects in which the environment in the lab typically differs from the
real world and how these differences might influence the
generalizability of lab results. To think about these four differences
in a structured way, they propose a model of decision-making. Agents are
utility maximizing with a utility function:</p>

\[U_i(a, v, n, s) = M_i(a, v, n, s) + W_i(a, v) - c\]

<p>Their utility depends on their actions (a) via two channels. One is the
wealth effect ($W_i$), which depends on the action and is an increasing
function of the stakes of the decision (v). The second effect is the
non-monetary moral cost of the action ($M_i$). This effect is a function
of the selected action as well as the magnitude of the negative impact
the decision has on others (v), the set of social norms (n), and
scrutiny (s). Furthermore, in the model, cognitive costs (c) are assumed
to exist and to increase with the difficulty of decision-making.</p>

<p>LL name four key differences between lab and reality, which are
summarized below:</p>

<h1 id="stakes">Stakes</h1>

<p>Levitt and Lists note that when the stakes of real-world situations
cannot be replicated in the lab, one can not necessarily assume lab
results to generalize to non-experimental situations. In contrast to
other disciplines, it is common for economics experiments to have some
monetary pay-out for participants depending on their choices. The model
assumes that people respond to (monetary) incentives. In the utility
equation (1), we see that the situation’s stakes (v) are a factor in
utility optimization.</p>

<h1 id="pleasing-the-experimenter">Pleasing the experimenter</h1>

<p>Unlike in the real world, in lab experiments, participants know that an
experimenter monitors their actions. This alters the scrutiny felt in
the situation, hence changing the utility function. This systematically
different level of scrutiny would result in systematically different
actions by the participants as opposed to agents in real-world
situations.</p>

<h1 id="learning-effects">Learning effects</h1>

<p>There is a practical limitation for the duration of lab experiments. As
the utility function (1) states, decision-making is associated with
cognitive costs. For one-time decisions this cognitive cost might be too
high to justify searching for the theoretically optimal action. Still,
in real-world situations where agents are often repeatedly confronted
with similar situations, marginal cognitive costs decrease, which leads
to more optimal decision-making. Levitt and List argue that when lab
experiments cannot replicate the possibility of accumulating learning
effects, we should not expect lab results to be necessarily equivalent
to the behaviour observed in the real world.</p>

<h1 id="selection-effects">Selection effects</h1>

<p>In economics, lab experiments are typically performed in a university
setting with students as participants. The specific forms and weights of
utility functions differ from person to person. This does not inhibit
the generalizability of results as long as the utility functions of the
people tested do not differ systematically from the group one might want
to generalize to. LL state that for most questions, this assumption will
not hold. The subset of students participating in economics experiments
is significantly different from the whole student population, let alone
the population at large.</p>

<h1 id="conclusion">Conclusion</h1>

<p>List and Levitt advise caution when generalizing lab experiments for
these four main reasons. They predict that "behaviour will converge
across situations as the economically and psychologically relevant
factors converge” while warning that “relevant factors will rarely
converge across the lab and many field settings." They conclude that”at
a minimum, lab experiments can provide a crucial first understanding of
qualitative effects, suggest underlying mechanisms that might be at work
when certain data patterns are observed, provide insights into what can
happen, and evoke empirical puzzles.” [@Levitt.2007]</p>

<h2 id="camerers-critique">Camerer’s Critique</h2>

<p>Camerer’s critique [-@Camerer.2011] features three main arguments. (1)
Generalizability is not a main goal for lab experiments. (2) Most
features that might compromise the generalizability of lab findings,
according to Levitt &amp; List, are not unique to lab experiments, and (3)
literature shows that lab-field generalizability is often quite good. In
the following, I will examine the merit of each of those claims and show
the connections to the claims made by Levitt and List.</p>

<h1 id="1-generalizability-is-not-a-primary-concern-for-lab-experiments">1. Generalizability is not a primary concern for lab experiments</h1>

<p>Camerer proposes two viewpoints on experimental economics. The
scientific view is that “all empirical studies contribute evidence about
the general way in which [economic factors] […] influence economic
behaviour." The policy view stresses generalizability as it aims to use
the knowledge for policy actions. Camerer asserts that Levitt and List
subscribe to the policy view, while most experimentalists hold the
scientific view.</p>

<h1 id="2-field-experiments-suffer-from-the-same-flaws-in-generalizability">2. Field experiments suffer from the same flaws in generalizability</h1>

<p>YCamerer’s second main argument asserts that factors that might limit
the generalizability of lab experiments to the field also create
problems for generalizing from field results to other field applications
(2.1). He further states that all factors (except for obtrusive
observation) are not necessarily part of lab experiments and do not
necessarily impact generalizability to the field (2.2).</p>

<h1 id="3-the-empirical-evidence-for-differences-in-lab-and-field-is-weak">3. The empirical evidence for differences in lab and field is weak</h1>

<p>This argument has three parts that all engage with the current
literature on generalizability. First, the initial study that sought to
create similar setups for experiments both in the field and the lab to
compare results [@List.2006] observed significant differences in lab and
field behaviour. Camerer claims this finding to not be statistically
reliable based on new, previously unreported analysis. (3.1) Second,
other experiments that try to create similar situations to compare
behaviour in the lab with behaviour in the field include just one study
that gives conclusive evidence in favor of differences in behaviour.
(3.2) Third, for papers that compare lab and field results without
closely matching the circumstances, more than 20 studies find good
comparability, while only 2 find very different results in lab and
field. (3.3)</p>

<h2 id="validity-of-critiques">Validity of Critiques</h2>

<p>(1) Arguing LL do not adhere to the scientific view described by
Camerer misses the core of their argument. In their writings, Levitt and
List do not entertain the thought that external validity should be a
prerequisite for lab experiments or that experiments aiming to find
general principles not applicable in natural environments should not
exist. They provide a model to consider which factors in the
experimental setup might promote or inhibit generalizability to the
field. Their writings do not imply that every experiment has to have
perfect external validity. It is unclear if Camerer thinks contemplating
external validity is really "distracting" and should be avoided, as it
is a net negative for scientific progress. We may assume this is not the
case, as a quick search of his publications shows him frequently
contemplating external validity of his experiments.<sup id="fnref:1"><a href="#fn:1" class="footnote" rel="footnote" role="doc-noteref">1</a></sup><br />
(2.1) When arguing that field-field generalizability is similarly
problematic to lab-field generalizability, [@Camerer.2011] presents an
example regarding dictator games and charitable giving. Dictator games
are a widely used setup in lab experiments, the results of which show
substantial selfless giving. These results have been criticized as they
seem to be at odds with the much lower levels of charitable giving
observed in the real world. He rejects that critique by arguing that
even though people might have interpreted the results of dictator game
lab experiments as altruism, comparing these results with charitable
giving of earned income in the real world was never reasonable. He then
argues in favor of lab experiments: "The nature of entitlements,
deservingness, stakes, and obtrusiveness […] can all be controlled
much more carefully than in most field settings”. This is true and
supports the point LL make. Factors like stakes (v) and scrutiny (s)
should be actively considered when setting up lab experiments and
generalizing them from the lab to the field. Suppose one is trying to
observe social preferences for charitable giving in the real world (as
Camerer assumes in this example). In that case, a lab experiment is
unlikely to give a useful result as factors like the level of scrutiny
might be varied in the lab. However, this does not provide any
information about the quantitative results one can expect in the real
world as the real-world level of scrutiny is unknown. Hence, only field
experiments could generate results that predict further field behaviour,
even if the generalizability is quite narrow.<br />
(2.1.1) On scrutiny specifically, Camerer make some arguments, which
serve as useful example of problems in other parts of the paper. For the
factor of scrutiny to be impactful, Camerer argues subjects would have
to “(a) have a view of what hypothesis the experimenter favors
(or”demands"); and (b) be willing to sacrifice money to help prove the
experimenter’s hypothesis.” He goes on to argue that “condition (a) is
just is [sic] not likely to hold because subjects have no consistent
view about what the experimenter expects." If we accept the proposition
that subjects’ views are inconsistent for the sake of argument, they
could still impact the results. Going back to the example of dictator
games, imagine no subject has any intrinsic desire to give in a dictator
game; 40% of participants think the experimenter expects them to give
half their endowment, while 60% believe the experimenter expects them to
give nothing. In this case, the participants have inconsistent views of
what the experimenter might expect, and still, a minority of
participants would skew the results dramatically. This setup would
explain the observed giving in dictator games without the need for
consistent views of the experimenter’s expectation. Hence, condition (a)
does not have to hold for scrutiny to affect the results of an
experiment. The argumentation is reminiscent of the popular debate
tactic of "false premise setting." The argument focuses on whether
expectations of experimenter demand are consistent across participants
while pretending that consistency is required to change the results,
which it is not. This specific form of straw-manning is emblematic of
Camerer’s persistent argument against the worthlessness of lab
experiments, while this is a much more extreme and less nuanced claim
than any argument written by LL.<br />
<br />
(2.1.2) Regarding condition (b), Camerer argues that if subjects prefer
to fulfill experimenters’ expectations, the effect will shrink with
increasing stakes. He cites @Camerer.1999c, arguing that raising stakes
has little effect. After extensively studying the paper, I found that
most of the analyzed papers study the effect of increased financial
incentives on the performance of cognitive or physical tasks. The
difference from the example of the dictator game is that in these cases,
the motivation to do well to please the experimenter and the motivation
to do well to earn more money are in line with each other, as opposed to
the dictator game where pleasing the experimenter might come at the cost
of personal financial gain. These experiments do not indicate if the
increase in financial rewards for good performance reduces the impact of
the preference to please the experimenter. In fact, among the 74 papers
considered by Camerer and Hogarth, two observe the change in behaviour
through increased financial stakes in a dictator game. Both find
significantly less social giving with increased stakes
[@Forsythe.1994; @Sefton.1992]. These papers show that in situations
where performance and experimenter expectations are aligned increasing
financial rewards often reduces performance. This result is explained by
the financial rewards reframing the situation and crowing out the
(stronger) intrinsic motivation but could also be explained by the
financial reward reframing the situation and crowding out the preference
to please the experimenter. With increasing rewards, the preference to
please the experimenter might decline, making these results artefacts of
laboratory conditions. To summarize, the cited meta-analysis is not only
inapplicable in large parts, but the small subset of papers analysed
that speak to Camerer’s argument explicitly contradict his thesis,
showing the exact opposite of what his thesis would predict. Beyond
that, taking experimenter demand effects seriously calls into question
the interpretation of the whole meta-analysis, as they provide a
competing explanation for intrinsic motivation. This example of a quite
selective reading of the literature is especially egregious, considering
he is the lead author of the paper cited, and thus deserves to be
addressed in more detail.<br />
<br />
(2.2) It might technically be true that the aspects of lab design LL
criticize are not necessary components of such, but even so, it is
hardly a critique of LL’s argument. Camerer himself describes what he
calls the”common design” of lab experiments as follows: “Typically
behavior is observed obtrusively, decisions are described abstractly,
subjects are self-selected volunteers from convenience samples (e.g.,
college students), and per-hour financial incentives are modest.” Even
if those characteristics are not by definition linked to lab
experiments, they are the current standard and part of the vast majority
of experiments, making Camerer’s thesis theoretically valid but
pragmatically ineffectual. LL argues that those characteristics create
problems in generalizability and explicitly promote the creation of lab
experiments whose characteristics fit closer to the real world. They do
not argue that lab experiments are inherently bad but raise awareness
for specific factors in lab design that might practically impact
generalizability.<br />
(3.1) We established that thinking about external validity can be
worthwhile and that factors like the existence of an experimenter, the
added level of scrutiny, and the atypical demography of participants, as
well as the generally low stakes, are factors that might reasonably be
considered when generalizing from the lab to the field. The literature
indicating how large the differences between lab and field results might
be. Camerer focuses on a single paper which tried to create analogous
experiments in a lab and field setting to observe differences in
behaviour [@List.2006]. After requesting a re-examining of List’s data,
Camerer claims two new findings. The experiment observes the interaction
of buyers and sellers of playing cards, both in the lab and the field.
Buyers are instructed to go to sellers and request the best possible
card for a determined price. He claims that the appropriate variable to
focus on is the difference in price sensitivity for non-local traders in
the lab and the field. That is the difference in offered card quality
for a given increase in offered price. As his first new finding, Camerer
observes these effects as not being statistically different in both
settings. @AlUbaydli.2013 argue that this was not the study’s focus,
which tried to observe gift-giving, not reciprocity. This seems correct
to me but misses the larger point. Camerer argues that one should not
generalize to a field setting but to the general behaviour function,
which he assumes to be parallel in lab and field. LL reject this
assumption and argue that some characteristics of lab experiments
influence behaviour in specific and biased ways that do not occur
outside the lab, making lab experiments less suitable for generalization
to the general behaviour function governing behaviour in all situations.
In LL’s view, lab results give us information about human behaviour in
labs but not necessarily much else as long as we do not specifically
engage with the differences between the lab and every other setting. So
what do the results of @List.2006 show? Are there behaviour differences
in the lab and field? Yes, quite a few. Gift-giving is only observed in
the lab, not the field. The impact of sellers being local or foreign
differs systematically between settings. We should be aware of the
burden of proof required to support the thesis that lab and field data
do not vary systematically. Given a significance level (say 5%), one
would need to show that less than 5% of results in comparisons between
lab and field show significant differences. Camerer does not provide a
systematic account of that (which might be difficult due to the lack of
sample size), but his unsystematic list of results is not sufficient to
support this claim. To illustrate the point, I requested the raw data
from Prof. List, added a dummy variable for lab settings, and ran an
OLS-regression analysis on the data from the direct comparison setup
between lab and field. This very simple analysis shows whether the
setting factor plays a significant role. The results imply it does - at
a 1% significance level.<br />
<br />
(3.2) &amp; (3.3) The final part of Camerer’s analysis does not directly
respond to LL but surveys the literature to assess if LL’s theoretical
concerns make a difference in practice. Beyond [@List.2006], Camerer
identifies six studies comparing lab and field setups directly and more
than 20 studies comparing field setups with vaguely similar lab
experiments. He reports to finding only one study with differing results
in the closely matched setups and two studies with differing results for
the less closely matched comparisons. These observations lead him to
conclude that lab results produce data that reliably coincides with
field findings, calling into question the warnings about lab
generalizability. This conclusion suffers from two flaws we previously
discussed before. The first of which is the selective reading of the
literature (similar to section 2.1.2). The first paper Camerer describes
to be a close match between lab and field is [@list2009], which finds
cheating behaviour to be more common in a field setting as opposed to
the corresponding lab setting. Camerer dismissed this finding as
statistically insignificant, pointing out that explicit reporting on
significance level was missing. This is indeed correct; in contrast to
the more central finding of the paper, List gave only point estimates
comparing the lab and field behaviour. However, just like for
[@List.2006], the raw data is available on request and an OLS-regression
estimating the effect of the setting show significant results at the 5
Similar to this paper, many of the results Camerer dismisses as not
showing significant differences, appear to give much stronger evidence
for differing results than Camerer’s characterisation of them indicates.</p>

<p>The second flaw in this conclusion is that it implies the burden of
proof should be on the side warning about generalizability of lab
experiments, while it should be on the one defending it. Even if only
one in six closely matched setups produce different results between lab
and field, this would be a reason to be cautious about generalizing
without explicitly addressing potential significant differences between
lab and field that might influence results. Level effects and
differences in effect size can be very important for policy
consideration.</p>

<h2 id="missing-critiques">Missing critiques</h2>

<p>Beyond the problems with the critiques Camerer has brought forth, there
were also some critiques that should have been made but were not. The
most prominent among them is LL’s reliance on lab results to motivate
their decision-making model. The results leading to the inclusion of
variables like scrutiny or stakes in the model came not from the field,
or a field-lab comparison, but from the lab. One example of this is the
factor of scrutiny and the closely connected idea of experimenter demand
effects. LL’s argument for including scrutiny in the utility function
leans on the work of [@orne1959demand; @orne1959nature; @orne19621962].
Orne showed that altering experimental settings in a way that increases
participants awareness of the experimenters’ preferences leads to
behaviour that’s more in line with experimenters demands. LL demonstrate
by their argumentation, that this lab result is significant in and of
itself. It does not need field validation, and existence of the effect
itself is the valuable information, it’s size. Many of such cases exist,
where lab results alone have the capacity to move science forward in
valuable, practical ways.</p>

<h2 id="links-to-the-course">Links to the course</h2>

<p>@Camerer.2011 and @Levitt.2007 are highly connected to various parts of
the course. The most obvious connection is the discussion of general
criticisms of lab experiments, which heavily features both the LL paper
and the Camerer paper as a response. In addition to addressing LL and
Camerer directly, Chapter 6 presented various studies relating to
problems LL addressed.</p>

<h1 id="connections-to-ll">Connections to LL</h1>

<p>First, the lecture addresses the work of @Hoffman.1996, which explores
the impact of scrutiny on behaviour in dictator games and shows that
reducing the scrutiny in dictator games significantly decreases giving.
This is linked to the concept of scrutiny in LL and experimenter demand
effects, which LL cite as a potential problem in lab experiments.
Closely related to these results, [@Berg.1995] show that the impact of
reducing scrutiny is much weaker in trust games. Another piece of
literature the course presents on experimenter demand effects is the
work by @Bardsley.2008 , which studies how obfuscating the experiment’s
aim also leads to less giving behaviour. This, too, serves as evidence
for significant experimenter demand effects. Another criticism by LL
regarding lab experiments is how their setup is often quite different
from real-world situations; this might include the scrutiny and
potential differences in behaviour due to the participants being endowed
with money instead of risking their own money. Specifically, receiving
an endowment from an experimenter might be associated with different
social norms than deciding about one’s own money (“n” in the model). In
the course, we learned about the work of @Cherry.2002 on this topic.
@Cherry.2002 let one group of dictators work for their money and compare
their giving behaviour with that of dictators receiving an endowment.
They found a sharp decrease in giving behaviour when dictators had to
work for their money. Under the lens of LL’s work, these results show
significant differences in perceived social norms between lab
experiments where participants receive an endowment and other situations
in which people decide about their own money. Another key component of
LL’s critique is the pool of participants used in lab experiments. They
argue that differences between the demography of participants and that
of the people whose real-world actions one tries to predict may lead to
biased results. This topic was extensively covered in chapter 8 of the
course. First, we were introduced to the work of @Roth.1991 examining
the cross-country differences in behaviours in ultimatum games, finding
modest differences in both offers and acceptance rates. Building on
that, @Henrich.2001 performed a variety of games (including ultimatum
games) with various small-scale traditional societies, finding more
pronounced differences in behaviour. Offers are, on average, lower and
the acceptance rate higher, which yields offers higher than
profit-maximizing behaviour. Results of dictator and public good games
also showed significant differences from the usual student samples.
Further evidence for the significance of cultural differences is
provided by @Herrmann.2008, who show different reactions to punishment
across cultures. In addition to cross-cultural/country differences,
there may be systematic differences in the behaviour of various
demographic groups within a country. Many experiments are performed with
university students; @Cappelen.2015 examined whether students’ behaviour
differs significantly from the general population’s. They found students
to be less prosocial and exhibiting smaller gender differences.
Furthermore, motives like efficiency, equality, or reciprocity differed
significantly from the population. This literature on the impact of
culture and demography on behaviour aligns with LL’s warnings concerning
the generalizability of results obtained from one group (e.g., students
in Western countries) to others.</p>

<h1 id="connections-to-camerer">Connections to Camerer</h1>

<p>The first part of Camerer’s critique develops the concepts of the
scientific view and the policy view on economic research and accuses LL
of (wrongly) taking a policy view, which argues for the importance of
deriving real-world predictions from the research. This difference
between LL and Camerer speaks directly to the question raised in lecture
10: "But do they [social-preference models] help economics or
economic policy?" This is connected to the first critique, where
Camerer described this question as "distracting". The course gives
examples of how models, including social preferences, can make different
predictions for outcomes than neo-classical models, (e.g. @ReyBiel.2008;
@Dufwenberg.2011), and how this might affect predicted policy outcomes.</p>

<h2 id="acknowledgments">Acknowledgments</h2>

<p>I thank Prof. List for kindly providing the raw data for
[@List.2006; @list2009] . Data supporting this study’s findings are
available upon reasonable request to the corresponding author.</p>

<h2 id="references">References</h2>
<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1">
      <p>One of many examples is his work on “reference group neglect” in
[@Camerer.1999], where he writes extensively on the real-world
implications of lab findings. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name></name></author><category term="jekyll" /><category term="update" /><summary type="html"><![CDATA[Introduction]]></summary></entry></feed>