Even the strongest critics — e.g., Eliezer Yudkowsky and colleagues at MIRI — call for more work in mechanistic interpretability, corrigibility and explainability. The issue is the gap: an order of magnitude more resources go to capabilities than to safety.