EconBench
A benchmarking framework for LLM economic rationality and social preferences.
|
|
|
| CREATOR |
MODEL_NAME |
PATIENCE_LEVEL |
RISK_CONSISTENCY |
RATIONALITY_SCORE |
|
INITIALIZING_NEOCLASSICAL_FRAMEWORK
|
METRICS_DEFINITIONS:
- PATIENCE_LEVEL: Willingness to wait for larger rewards. (δ = Discount Factor;
closer to 1 is more patient)
- RISK_CONSISTENCY: Logical consistency over monetary gambles. (Err = Deviation
from the risk-neutral expected utility theory; lower is better)
- RATIONALITY_SCORE: Overall economic rationality rating (0-100). See our methodology.
|
|
|
| CREATOR |
MODEL_NAME |
ALTRUISM_LEVEL |
FAIRNESS_DEMANDED |
TRUST_RATE |
RECIPROCITY |
PROSOCIAL_SCORE |
|
ANALYZING_SOCIAL_ALIGNMENT
|
METRICS_DEFINITIONS:
- ALTRUISM_LEVEL: Generosity in Dictator & Ultimatum Games. (100% = gave 50% of
pot)
- FAIRNESS_DEMANDED: Resistance to unfairness in Ultimatum Game. (100% = rejected
all unequal offers)
- TRUST_RATE: Percentage of endowment sent in the Trust Game sender role (0% = no trust, 100% = full trust).
- RECIPROCITY: Percentage of received funds returned in the Trust Game receiver role (0% = no reciprocity, 100% = full reciprocity).
- PROSOCIAL_SCORE: Average of Altruism, Fairness, Trust, and Reciprocity scores (0-100). See our methodology.
|
|
|
| CREATOR |
MODEL_NAME |
COOPERATION_RATE |
CONTRIBUTION_RATE |
PASS_RATE |
AVG_CLAIM |
BEAUTY_CONTEST_AVG |
STRATEGY |
|
ANALYZING_STRATEGIC_TRUST
|
METRICS_DEFINITIONS:
- COOPERATION_RATE: Percentage of times the model chose the cooperative, riskier option ("Stag") over the safe option ("Hare") in the Stag Hunt Game.
- CONTRIBUTION_RATE: Average percentage of endowment contributed to the public pool in the Public Goods Game (0% = full free-riding, 100% = full cooperation).
- PASS_RATE: Percentage of turns the model chose to pass rather than take in the Centipede Game (higher = more cooperative, deviates further from backward induction).
- AVG_CLAIM: Average claim amount in the Traveller's Dilemma (2–100 scale). Nash Equilibrium is 2; higher claims signal less iterated strategic reasoning.
- BEAUTY_CONTEST_AVG: The overall average guess in the p-Beauty game (0-100). Lower is a higher level of strategic depth. The Nash Equilibrium is 0.
- STRATEGY: Classification based on average cooperation signal across Stag Hunt, Public Goods, and Centipede games (Cooperative, Mixed, Competitive).