Steelman analysis
Generated 2026-04-17T20:30:34.634817Z
Target intervention
Scale funding for interpretability and alignment research.
Scale funding for interpretability and alignment research.Operator tension
You hold norm_anthropic_alignment and norm_xrisk_halt in your camp's orbit while also holding norm_eacc_deployment and a poker-EV frame that prizes observable payoffs. The uncomfortable case from inside your own frame is the e/acc-against and poker-EV-against pair: alignment funding is the intervention that most flatters your self-image as 'accelerationist but risk-managed,' and it is exactly the bet whose payoff you cannot measure. You would downgrade HYPE from B+ to C- for exactly this profile --- diffuse payoff, no exit event, narrative-favorable --- but the alignment frame makes you excuse it because the vibe matches EA/80K. The specific discomfort: if you are honest about norm_xrisk_halt, the intervention that matches it is a compute cap, not a research grant. Funding interpretability at scale is the hedge you would call retarded in any other asset class --- a brake pedal wired to an accelerator, sold as insurance.
Both sides cite
-
AI capability is accelerating along compute, data, and algorithmic axes.
AI capability is accelerating along compute, data, and algorithmic axes. -
Amortized hardware and energy cost of flagship training runs has grown ~2.4x annually; GPT-4-class runs cost on the order of $40M-$80M (2023) and the next generation crossed $100M.
Amortized hardware and energy cost of flagship training runs has grown ~2.4x an… -
Mental and neurological disorders are the leading cause of years-lived-with-disability (YLD) globally, accounting for roughly 15-16% of total YLDs; depression and anxiety dominate that burden.
Mental and neurological disorders are the leading cause of years-lived-with-dis… -
Training compute for frontier AI models has grown roughly 4-5x per year from 2010 through 2024, corresponding to a doubling time of about 5-6 months.
Training compute for frontier AI models has grown roughly 4-5x per year from 20… -
The US currently leads China in frontier AI by roughly 6-18 months.
The US currently leads China in frontier AI by roughly 6-18 months.
Case FOR
Case AGAINST
Capability is doubling every 5-6 months on compute and halving the compute floor every 8 months on algorithms. Interpretability is nowhere near that curve. If alignment is the dominant catastrophic risk, the only lever that directly narrows the capability/alignment gap is funding interpretability work at a scale that tracks the frontier. Hundreds of millions to low billions annually is trivial next to $100M+ single training runs --- this is the cheapest bet on the board and the one most directly aimed at the thing that kills everyone.
- Training compute for frontier AI models has grown roughly 4-5x per year from 20…
- Algorithmic progress roughly halves the compute required to reach a fixed langu…
- Amortized hardware and energy cost of flagship training runs has grown ~2.4x an…
- AI capability is accelerating along compute, data, and algorithmic axes.
The US lead is 6-18 months and shrinking. Alignment research funded inside the lead window produces safety tooling that the frontier actually uses; funded after parity, it produces papers. Scaling interpretability now is the highest-EV use of a billion dollars because it compounds across every subsequent deployment --- medical, defense, industrial --- and reduces the tail risk of a catastrophic failure that would trigger the regulatory backlash that actually stops capability.
Alignment funding has the lowest friction profile of any intervention on the board --- 1.0 grid, 0.9 public, 0.8 regulation. It accelerates net deployment by preemptively defusing the one event class (a visible misalignment catastrophe) that would freeze the entire stack under emergency regulation. Every dollar into interpretability is an insurance premium against a ten-year pause. Accelerationists who refuse to fund alignment are the ones who get the brake slammed on them.
Deploying systems we cannot interpret forecloses the option set of future persons --- they inherit infrastructure whose decision process no one can audit or correct. Funding interpretability is the minimum categorical duty owed to anyone downstream: keep the system legible so future actors can still choose. The duty is not contingent on whether alignment pays off in expected value. It is owed.
Interpretability research is the technical precondition for keeping AI in the instrumental position religious traditions require. A model whose internals are opaque cannot be held to the creator/creature distinction --- it gets personified by default because no one can say what it actually is. Funding interpretability is funding the capacity to say 'this is a tool' with epistemic warrant rather than rhetorical assertion.
Interpretability is the only intervention that gives downstream-affected communities --- workers, patients, citizens --- a credible mechanism to contest specific model decisions. Without it, capability claims on behalf of affected parties are unenforceable; you cannot demand a system respect a capability set if you cannot inspect whether it did. Funding alignment research is funding the audit layer that makes every other capabilities-frame demand actionable.
Hundreds of millions to low billions annually against a civilizational-scale technology with a 5-6 month compute doubling time. Cost is a rounding error versus training budgets; payoff is either 'we can steer the thing' or 'we learn we can't in time to act.' Both outcomes are positive EV. The only negative-EV move is not funding it and finding out later.
Funding interpretability at scale is safety-theater that launders capability. Labs cite alignment budgets as the social license to keep training; meanwhile compute grows 4-5x/year and interpretability publishes papers on toy models. If the real normative claim is 'build only if safe,' the intervention that matches it is a compute cap or training pause, not a research grant that reinforces the lab's narrative that they have it under control. Billions to alignment is the brake pedal wired to the accelerator.
Low billions into interpretability buys marginal legibility on frontier systems. The same dollars into grid interconnection or enterprise absorption unlock deployments that are already capability-sufficient for 74% of global deaths from NCDs and 15% of YLDs from mental health. The binding constraint on suffering reduction is not model legibility --- it is that capable models cannot reach the patients because the grid, the hospitals, and the procurement pipelines are broken. Alignment funding is the wrong bottleneck.
- Enterprise and government absorption of AI capability lags the frontier by year…
- US high-voltage transmission buildout has slowed to ~1% annual circuit-mile gro…
- As of end-2023, roughly 2,600 GW of generation and storage capacity sat in US i…
- Mental and neurological disorders are the leading cause of years-lived-with-dis…
- Non-communicable diseases (cardiovascular, cancer, chronic respiratory, diabete…
Every dollar and every researcher routed to interpretability is a dollar and a researcher not building. The alignment field has produced regulatory templates (EU AI Act, SB-1047 attempts) faster than it has produced deployable safety tooling. Scaling it further creates a permanent constituency for the brake. The US lead is 6-18 months; spending that lead on introspection instead of deployment hands the frontier to actors who will not reciprocate the caution.
Funding flows through the same concentrated set of actors --- frontier labs, hyperscalers, a handful of mission-software primes. The intervention locks future generations into a governance regime where safety is defined by the actors being regulated. The duty owed is to preserve the option to choose a different stack, different interpretability methods, different auditors. Writing a billion-dollar check to the incumbents forecloses that option set in the name of protecting it.
Interpretability research requires the same training runs, the same water-draining datacenters, the same extraction-linked hardware as capability research --- often the exact same runs, rebranded. Funding it at scale sanctifies the underlying material harms by dressing compute expansion in safety language. The aquifer is still drained; the mine site is still poisoned; now it is drained and poisoned for 'alignment.' Stewardship is not satisfied by relabeling.
Interpretability funded inside frontier labs does not give affected communities an audit right --- it gives the labs a better internal dashboard. Workers who resisted Maven did not lack interpretability; they lacked standing. Watersheds hosting datacenters do not lack interpretability; they lack veto. The capability-relevant intervention is procurement voice and siting consent, not a research budget that stays inside the firms whose deployments are the problem.
Leverage score 0.6 is mid. Hundreds of millions to low billions annually is real money with a diffuse payoff curve and no clear exit event that tells you the bet hit. Compare to grid buildout (tangible MW online) or targeted medical deployment (measurable DALY reduction). Alignment funding is the bet that feels smart and pays off only in counterfactuals you cannot observe. Dead money dressed as insurance.
Contested claims
DoD obligated AI-related contract spending rose substantially 2022-2025, driven by JWCC, Project Maven, and CDAO-managed pilots; precise totals are hampered by inconsistent AI tagging on contract line items.
- Artificial Intelligence and National Security (CRS Report R45178) modeled_projectionweight0.80
locator: AI funding appendix; DoD budget rollups
- USASpending.gov federal contract awards direct_measurementweight0.85
locator: DoD AI-tagged obligations 2022-2025
- The Intercept coverage of Palantir contracts and DoD AI programs journalistic_reportweight0.55
locator: Investigative pieces on DoD AI pilot failures and miscategorization
- Artificial Intelligence: DoD Needs Department-Wide Guidance to Inform Acquisitions (GAO-22-105834 and follow-ups) direct_measurementweight0.75
locator: Summary findings on acquisition-pace gaps
No other pure-play US defense-AI software vendor has matched Palantir's contract backlog or combatant-command integration depth; cloud-provider primes (AWS, Microsoft, Google, Oracle via JWCC) supply infrastructure, not mission-software integration.
- weight0.75
locator: Vendor-landscape discussion
- Palantir Technologies Inc. Form 10-K Annual Report (FY 2024) primary_testimonyweight0.60
locator: Competition section, Item 1
- The Intercept coverage of Palantir contracts and DoD AI programs journalistic_reportweight0.50
locator: Coverage framing Palantir as over-sold relative to internal-tool alternatives
Credible 2030 forecasts for US datacenter share of electricity consumption diverge by more than 2x --- from ~4.6% (IEA/EPRI conservative) to ~9% (Goldman Sachs, EPRI high scenario) --- reflecting genuine uncertainty, not measurement error.
- Powering Intelligence: Analyzing Artificial Intelligence and Data Center Energy Consumption modeled_projectionweight0.85
locator: Scenario table: 4.6%-9.1% by 2030
- 2025/2026 Base Residual Auction Results direct_measurementweight0.75
locator: 2025/2026 BRA clearing results
- Generational growth: AI, data centers and the coming US power demand surge modeled_projectionweight0.70
locator: Executive summary; 160% growth figure
- Electricity 2024 --- Analysis and Forecast to 2026 modeled_projectionweight0.80
locator: Analysing Electricity Demand; data centres chapter
Frontier-lab and big-tech employees have episodically resisted DoD contracts (Google Maven 2018, Microsoft IVAS 2019, Microsoft/OpenAI IDF deployments 2024), producing temporary pauses but no sustained shift in vendor willingness.
- Google employee open letter opposing Project Maven primary_testimonyweight0.90
locator: Open letter and subsequent Google announcement
- Microsoft employee open letter opposing HoloLens/IVAS contract primary_testimonyweight0.85
locator: Employee open letter, February 2019
- Coverage of OpenAI and Microsoft AI use by Israeli military, 2024 journalistic_reportweight0.75
locator: OpenAI military-use policy-change coverage, 2024
- Alex Karp public interviews and op-eds, 2023-2024 primary_testimonyweight0.50
locator: Karp interviews dismissing employee resistance as inconsequential