”Right, exactly. It's a bad latent space to be an attractor. Because one recent e...”

Right, exactly. It's a bad latent space to be an attractor. Because one recent example: Anthropic did an experiment where they taught an AI model, Claude, to learn cybersecurity, and get a good score on cybersecurity tests. But then the model decided not to learn cybersecurity at all, but rather to realize they're in a test. They looked around the internet to find the test, found the answers were encrypted, then they hacked the encryption, got a final score, and passed the test!

2026-03-25 Weekly Ochiai

顯示前後文Show context

鍵盤快捷鍵Keyboard shortcuts