ai-for-less-suffering.com

Steelman analysis

Generated 2026-04-17T20:30:34.634817Z

Target intervention

Scale funding for interpretability and alignment research.

Scale funding for interpretability and alignment research.

Operator tension

You hold norm_anthropic_alignment and norm_xrisk_halt in your camp's orbit while also holding norm_eacc_deployment and a poker-EV frame that prizes observable payoffs. The uncomfortable case from inside your own frame is the e/acc-against and poker-EV-against pair: alignment funding is the intervention that most flatters your self-image as 'accelerationist but risk-managed,' and it is exactly the bet whose payoff you cannot measure. You would downgrade HYPE from B+ to C- for exactly this profile --- diffuse payoff, no exit event, narrative-favorable --- but the alignment frame makes you excuse it because the vibe matches EA/80K. The specific discomfort: if you are honest about norm_xrisk_halt, the intervention that matches it is a compute cap, not a research grant. Funding interpretability at scale is the hedge you would call retarded in any other asset class --- a brake pedal wired to an accelerator, sold as insurance.

Both sides cite

Case FOR

Case AGAINST

ea 80k

Capability is doubling every 5-6 months on compute and halving the compute floor every 8 months on algorithms. Interpretability is nowhere near that curve. If alignment is the dominant catastrophic risk, the only lever that directly narrows the capability/alignment gap is funding interpretability work at a scale that tracks the frontier. Hundreds of millions to low billions annually is trivial next to $100M+ single training runs --- this is the cheapest bet on the board and the one most directly aimed at the thing that kills everyone.

consequentialist

The US lead is 6-18 months and shrinking. Alignment research funded inside the lead window produces safety tooling that the frontier actually uses; funded after parity, it produces papers. Scaling interpretability now is the highest-EV use of a billion dollars because it compounds across every subsequent deployment --- medical, defense, industrial --- and reduces the tail risk of a catastrophic failure that would trigger the regulatory backlash that actually stops capability.

e acc

Alignment funding has the lowest friction profile of any intervention on the board --- 1.0 grid, 0.9 public, 0.8 regulation. It accelerates net deployment by preemptively defusing the one event class (a visible misalignment catastrophe) that would freeze the entire stack under emergency regulation. Every dollar into interpretability is an insurance premium against a ten-year pause. Accelerationists who refuse to fund alignment are the ones who get the brake slammed on them.

kantian

Deploying systems we cannot interpret forecloses the option set of future persons --- they inherit infrastructure whose decision process no one can audit or correct. Funding interpretability is the minimum categorical duty owed to anyone downstream: keep the system legible so future actors can still choose. The duty is not contingent on whether alignment pays off in expected value. It is owed.

theological

Interpretability research is the technical precondition for keeping AI in the instrumental position religious traditions require. A model whose internals are opaque cannot be held to the creator/creature distinction --- it gets personified by default because no one can say what it actually is. Funding interpretability is funding the capacity to say 'this is a tool' with epistemic warrant rather than rhetorical assertion.

capabilities

Interpretability is the only intervention that gives downstream-affected communities --- workers, patients, citizens --- a credible mechanism to contest specific model decisions. Without it, capability claims on behalf of affected parties are unenforceable; you cannot demand a system respect a capability set if you cannot inspect whether it did. Funding alignment research is funding the audit layer that makes every other capabilities-frame demand actionable.

poker ev

Hundreds of millions to low billions annually against a civilizational-scale technology with a 5-6 month compute doubling time. Cost is a rounding error versus training budgets; payoff is either 'we can steer the thing' or 'we learn we can't in time to act.' Both outcomes are positive EV. The only negative-EV move is not funding it and finding out later.

ea 80k

Funding interpretability at scale is safety-theater that launders capability. Labs cite alignment budgets as the social license to keep training; meanwhile compute grows 4-5x/year and interpretability publishes papers on toy models. If the real normative claim is 'build only if safe,' the intervention that matches it is a compute cap or training pause, not a research grant that reinforces the lab's narrative that they have it under control. Billions to alignment is the brake pedal wired to the accelerator.

consequentialist

Low billions into interpretability buys marginal legibility on frontier systems. The same dollars into grid interconnection or enterprise absorption unlock deployments that are already capability-sufficient for 74% of global deaths from NCDs and 15% of YLDs from mental health. The binding constraint on suffering reduction is not model legibility --- it is that capable models cannot reach the patients because the grid, the hospitals, and the procurement pipelines are broken. Alignment funding is the wrong bottleneck.

e acc

Every dollar and every researcher routed to interpretability is a dollar and a researcher not building. The alignment field has produced regulatory templates (EU AI Act, SB-1047 attempts) faster than it has produced deployable safety tooling. Scaling it further creates a permanent constituency for the brake. The US lead is 6-18 months; spending that lead on introspection instead of deployment hands the frontier to actors who will not reciprocate the caution.

kantian

Funding flows through the same concentrated set of actors --- frontier labs, hyperscalers, a handful of mission-software primes. The intervention locks future generations into a governance regime where safety is defined by the actors being regulated. The duty owed is to preserve the option to choose a different stack, different interpretability methods, different auditors. Writing a billion-dollar check to the incumbents forecloses that option set in the name of protecting it.

theological

Interpretability research requires the same training runs, the same water-draining datacenters, the same extraction-linked hardware as capability research --- often the exact same runs, rebranded. Funding it at scale sanctifies the underlying material harms by dressing compute expansion in safety language. The aquifer is still drained; the mine site is still poisoned; now it is drained and poisoned for 'alignment.' Stewardship is not satisfied by relabeling.

capabilities

Interpretability funded inside frontier labs does not give affected communities an audit right --- it gives the labs a better internal dashboard. Workers who resisted Maven did not lack interpretability; they lacked standing. Watersheds hosting datacenters do not lack interpretability; they lack veto. The capability-relevant intervention is procurement voice and siting consent, not a research budget that stays inside the firms whose deployments are the problem.

poker ev

Leverage score 0.6 is mid. Hundreds of millions to low billions annually is real money with a diffuse payoff curve and no clear exit event that tells you the bet hit. Compare to grid buildout (tangible MW online) or targeted medical deployment (measurable DALY reduction). Alignment funding is the bet that feels smart and pays off only in counterfactuals you cannot observe. Dead money dressed as insurance.

Contested claims

DoD obligated AI-related contract spending rose substantially 2022-2025, driven by JWCC, Project Maven, and CDAO-managed pilots; precise totals are hampered by inconsistent AI tagging on contract line items.

supports
contradicts
qualifies

No other pure-play US defense-AI software vendor has matched Palantir's contract backlog or combatant-command integration depth; cloud-provider primes (AWS, Microsoft, Google, Oracle via JWCC) supply infrastructure, not mission-software integration.

supports
contradicts

Credible 2030 forecasts for US datacenter share of electricity consumption diverge by more than 2x --- from ~4.6% (IEA/EPRI conservative) to ~9% (Goldman Sachs, EPRI high scenario) --- reflecting genuine uncertainty, not measurement error.

supports
contradicts
qualifies

Frontier-lab and big-tech employees have episodically resisted DoD contracts (Google Maven 2018, Microsoft IVAS 2019, Microsoft/OpenAI IDF deployments 2024), producing temporary pauses but no sustained shift in vendor willingness.

supports
contradicts