Steelman analysis

Generated 2026-04-17T20:30:34.634817Z

Target intervention

Scale funding for interpretability and alignment research.

Operator tension

You hold norm_anthropic_alignment and norm_xrisk_halt in your camp's orbit while also holding norm_eacc_deployment and a poker-EV frame that prizes observable payoffs. The uncomfortable case from inside your own frame is the e/acc-against and poker-EV-against pair: alignment funding is the intervention that most flatters your self-image as 'accelerationist but risk-managed,' and it is exactly the bet whose payoff you cannot measure. You would downgrade HYPE from B+ to C- for exactly this profile --- diffuse payoff, no exit event, narrative-favorable --- but the alignment frame makes you excuse it because the vibe matches EA/80K. The specific discomfort: if you are honest about norm_xrisk_halt, the intervention that matches it is a compute cap, not a research grant. Funding interpretability at scale is the hedge you would call retarded in any other asset class --- a brake pedal wired to an accelerator, sold as insurance.

Both sides cite

AI capability is accelerating along compute, data, and algorithmic axes.
AI capability is accelerating along compute, data, and algorithmic axes.
Amortized hardware and energy cost of flagship training runs has grown ~2.4x annually; GPT-4-class runs cost on the order of $40M-$80M (2023) and the next generation crossed $100M.
Amortized hardware and energy cost of flagship training runs has grown ~2.4x an…
Mental and neurological disorders are the leading cause of years-lived-with-disability (YLD) globally, accounting for roughly 15-16% of total YLDs; depression and anxiety dominate that burden.
Mental and neurological disorders are the leading cause of years-lived-with-dis…
Training compute for frontier AI models has grown roughly 4-5x per year from 2010 through 2024, corresponding to a doubling time of about 5-6 months.
Training compute for frontier AI models has grown roughly 4-5x per year from 20…
The US currently leads China in frontier AI by roughly 6-18 months.
The US currently leads China in frontier AI by roughly 6-18 months.

Case FOR

Case AGAINST

ea 80k

Capability is doubling every 5-6 months on compute and halving the compute floor every 8 months on algorithms. Interpretability is nowhere near that curve. If alignment is the dominant catastrophic risk, the only lever that directly narrows the capability/alignment gap is funding interpretability work at a scale that tracks the frontier. Hundreds of millions to low billions annually is trivial next to $100M+ single training runs --- this is the cheapest bet on the board and the one most directly aimed at the thing that kills everyone.

Normative anchors

Descriptive evidence

consequentialist

The US lead is 6-18 months and shrinking. Alignment research funded inside the lead window produces safety tooling that the frontier actually uses; funded after parity, it produces papers. Scaling interpretability now is the highest-EV use of a billion dollars because it compounds across every subsequent deployment --- medical, defense, industrial --- and reduces the tail risk of a catastrophic failure that would trigger the regulatory backlash that actually stops capability.

Normative anchors

Descriptive evidence

e acc

Alignment funding has the lowest friction profile of any intervention on the board --- 1.0 grid, 0.9 public, 0.8 regulation. It accelerates net deployment by preemptively defusing the one event class (a visible misalignment catastrophe) that would freeze the entire stack under emergency regulation. Every dollar into interpretability is an insurance premium against a ten-year pause. Accelerationists who refuse to fund alignment are the ones who get the brake slammed on them.

Normative anchors

AI capability should be built and deployed as fast as capital, compute, and phy…

Descriptive evidence

kantian

Deploying systems we cannot interpret forecloses the option set of future persons --- they inherit infrastructure whose decision process no one can audit or correct. Funding interpretability is the minimum categorical duty owed to anyone downstream: keep the system legible so future actors can still choose. The duty is not contingent on whether alignment pays off in expected value. It is owed.

Normative anchors

Present actors owe future persons a non-foreclosed option set on water, extract…

Descriptive evidence

theological

Interpretability research is the technical precondition for keeping AI in the instrumental position religious traditions require. A model whose internals are opaque cannot be held to the creator/creature distinction --- it gets personified by default because no one can say what it actually is. Funding interpretability is funding the capacity to say 'this is a tool' with epistemic warrant rather than rhetorical assertion.

Normative anchors

Humans bear non-substitutable moral standing grounded in transcendent sources.…

Descriptive evidence

AI capability is accelerating along compute, data, and algorithmic axes.

capabilities

Interpretability is the only intervention that gives downstream-affected communities --- workers, patients, citizens --- a credible mechanism to contest specific model decisions. Without it, capability claims on behalf of affected parties are unenforceable; you cannot demand a system respect a capability set if you cannot inspect whether it did. Funding alignment research is funding the audit layer that makes every other capabilities-frame demand actionable.

Normative anchors

Descriptive evidence

poker ev

Hundreds of millions to low billions annually against a civilizational-scale technology with a 5-6 month compute doubling time. Cost is a rounding error versus training budgets; payoff is either 'we can steer the thing' or 'we learn we can't in time to act.' Both outcomes are positive EV. The only negative-EV move is not funding it and finding out later.

Descriptive evidence

ea 80k

Funding interpretability at scale is safety-theater that launders capability. Labs cite alignment budgets as the social license to keep training; meanwhile compute grows 4-5x/year and interpretability publishes papers on toy models. If the real normative claim is 'build only if safe,' the intervention that matches it is a compute cap or training pause, not a research grant that reinforces the lab's narrative that they have it under control. Billions to alignment is the brake pedal wired to the accelerator.

Normative anchors

Frontier AI capability should be paused or halted if alignment and interpretabi…

Descriptive evidence

consequentialist

Low billions into interpretability buys marginal legibility on frontier systems. The same dollars into grid interconnection or enterprise absorption unlock deployments that are already capability-sufficient for 74% of global deaths from NCDs and 15% of YLDs from mental health. The binding constraint on suffering reduction is not model legibility --- it is that capable models cannot reach the patients because the grid, the hospitals, and the procurement pipelines are broken. Alignment funding is the wrong bottleneck.

Normative anchors

AI should preserve and extend US national-security advantage.

Descriptive evidence

e acc

Every dollar and every researcher routed to interpretability is a dollar and a researcher not building. The alignment field has produced regulatory templates (EU AI Act, SB-1047 attempts) faster than it has produced deployable safety tooling. Scaling it further creates a permanent constituency for the brake. The US lead is 6-18 months; spending that lead on introspection instead of deployment hands the frontier to actors who will not reciprocate the caution.

Normative anchors

AI capability should be built and deployed as fast as capital, compute, and phy…

Descriptive evidence

kantian

Funding flows through the same concentrated set of actors --- frontier labs, hyperscalers, a handful of mission-software primes. The intervention locks future generations into a governance regime where safety is defined by the actors being regulated. The duty owed is to preserve the option to choose a different stack, different interpretability methods, different auditors. Writing a billion-dollar check to the incumbents forecloses that option set in the name of protecting it.

Normative anchors

Present actors owe future persons a non-foreclosed option set on water, extract…

Descriptive evidence

theological

Interpretability research requires the same training runs, the same water-draining datacenters, the same extraction-linked hardware as capability research --- often the exact same runs, rebranded. Funding it at scale sanctifies the underlying material harms by dressing compute expansion in safety language. The aquifer is still drained; the mine site is still poisoned; now it is drained and poisoned for 'alignment.' Stewardship is not satisfied by relabeling.

Normative anchors

Descriptive evidence

capabilities

Interpretability funded inside frontier labs does not give affected communities an audit right --- it gives the labs a better internal dashboard. Workers who resisted Maven did not lack interpretability; they lacked standing. Watersheds hosting datacenters do not lack interpretability; they lack veto. The capability-relevant intervention is procurement voice and siting consent, not a research budget that stays inside the firms whose deployments are the problem.

Normative anchors

Descriptive evidence

poker ev

Leverage score 0.6 is mid. Hundreds of millions to low billions annually is real money with a diffuse payoff curve and no clear exit event that tells you the bet hit. Compare to grid buildout (tangible MW online) or targeted medical deployment (measurable DALY reduction). Alignment funding is the bet that feels smart and pays off only in counterfactuals you cannot observe. Dead money dressed as insurance.

Descriptive evidence

Contested claims

DoD obligated AI-related contract spending rose substantially 2022-2025, driven by JWCC, Project Maven, and CDAO-managed pilots; precise totals are hampered by inconsistent AI tagging on contract line items.

DoD obligated AI-related contract spending rose substantially 2022-2025, driven…

supports

Artificial Intelligence and National Security (CRS Report R45178) modeled_projection

weight

0.80

locator: AI funding appendix; DoD budget rollups
USASpending.gov federal contract awards direct_measurement

weight

0.85

locator: DoD AI-tagged obligations 2022-2025

contradicts

The Intercept coverage of Palantir contracts and DoD AI programs journalistic_report

weight

0.55

locator: Investigative pieces on DoD AI pilot failures and miscategorization

qualifies

Artificial Intelligence: DoD Needs Department-Wide Guidance to Inform Acquisitions (GAO-22-105834 and follow-ups) direct_measurement

weight

0.75

locator: Summary findings on acquisition-pace gaps

No other pure-play US defense-AI software vendor has matched Palantir's contract backlog or combatant-command integration depth; cloud-provider primes (AWS, Microsoft, Google, Oracle via JWCC) supply infrastructure, not mission-software integration.

No other pure-play US defense-AI software vendor has matched Palantir's contrac…

supports

Artificial Intelligence and National Security (CRS Report R45178) expert_estimate

weight

0.75

locator: Vendor-landscape discussion
Palantir Technologies Inc. Form 10-K Annual Report (FY 2024) primary_testimony

weight

0.60

locator: Competition section, Item 1

contradicts

The Intercept coverage of Palantir contracts and DoD AI programs journalistic_report

weight

0.50

locator: Coverage framing Palantir as over-sold relative to internal-tool alternatives

Credible 2030 forecasts for US datacenter share of electricity consumption diverge by more than 2x --- from ~4.6% (IEA/EPRI conservative) to ~9% (Goldman Sachs, EPRI high scenario) --- reflecting genuine uncertainty, not measurement error.

Credible 2030 forecasts for US datacenter share of electricity consumption dive…

supports

Powering Intelligence: Analyzing Artificial Intelligence and Data Center Energy Consumption modeled_projection

weight

0.85

locator: Scenario table: 4.6%-9.1% by 2030
2025/2026 Base Residual Auction Results direct_measurement

weight

0.75

locator: 2025/2026 BRA clearing results

contradicts

Generational growth: AI, data centers and the coming US power demand surge modeled_projection

weight

0.70

locator: Executive summary; 160% growth figure

qualifies

Electricity 2024 --- Analysis and Forecast to 2026 modeled_projection

weight

0.80

locator: Analysing Electricity Demand; data centres chapter

Frontier-lab and big-tech employees have episodically resisted DoD contracts (Google Maven 2018, Microsoft IVAS 2019, Microsoft/OpenAI IDF deployments 2024), producing temporary pauses but no sustained shift in vendor willingness.

Frontier-lab and big-tech employees have episodically resisted DoD contracts (G…

supports

Google employee open letter opposing Project Maven primary_testimony

weight

0.90

locator: Open letter and subsequent Google announcement
Microsoft employee open letter opposing HoloLens/IVAS contract primary_testimony

weight

0.85

locator: Employee open letter, February 2019
Coverage of OpenAI and Microsoft AI use by Israeli military, 2024 journalistic_report

weight

0.75

locator: OpenAI military-use policy-change coverage, 2024

contradicts

Alex Karp public interviews and op-eds, 2023-2024 primary_testimony

weight

0.50

locator: Karp interviews dismissing employee resistance as inconsequential