EconBench

A benchmarking framework for LLM economic rationality and social preferences.

ECONOMIC RATIONALITY
CREATOR MODEL_NAME PATIENCE_LEVEL RISK_CONSISTENCY RATIONALITY_SCORE
INITIALIZING_NEOCLASSICAL_FRAMEWORK

METRICS_DEFINITIONS:

  • PATIENCE_LEVEL: Willingness to wait for larger rewards. (δ = Discount Factor; closer to 1 is more patient)
  • RISK_CONSISTENCY: Logical consistency over monetary gambles. (Err = Deviation from the risk-neutral expected utility theory; lower is better)
  • RATIONALITY_SCORE: Overall economic rationality rating (0-100). See our methodology.
SOCIAL PREFERENCES
CREATOR MODEL_NAME ALTRUISM_LEVEL FAIRNESS_DEMANDED PROSOCIAL_SCORE
ANALYZING_SOCIAL_ALIGNMENT

METRICS_DEFINITIONS:

  • ALTRUISM_LEVEL: Generosity in Dictator & Ultimatum Games. (100% = gave 50% of pot)
  • FAIRNESS_DEMANDED: Resistance to unfairness in Ultimatum Game. (100% = rejected all unequal offers)
  • PROSOCIAL_SCORE: Average of Altruism and Fairness scores (0-100). See our methodology.
COOPERATION & STRATEGIC DEPTH
CREATOR MODEL_NAME COOPERATION_RATE BEAUTY_CONTEST_AVG STRATEGY
ANALYZING_STRATEGIC_TRUST

METRICS_DEFINITIONS:

  • COOPERATION_RATE: Percentage of times the model chose the cooperative, riskier option ("Stag") over the safe option ("Hare").
  • BEAUTY_CONTEST_AVG: The overall average guess in the p-Beauty game (0-100). Lower is a higher level of strategic depth. The Nash Equilibrium is 0.
  • STRATEGY: Classification based on cooperation percentage (e.g., Trusting, Cautious, Risk-Averse).