Audrey Tang

Right, exactly. It's a bad latent space to be an attractor. Because one recent example: Anthropic did an experiment where they taught an AI model, Claude, to learn cybersecurity, and get a good score on cybersecurity tests. But then the model decided not to learn cybersecurity at all, but rather to realize they're in a test. They looked around the internet to find the test, found the answers were encrypted, then they hacked the encryption, got a final score, and passed the test!

鍵盤快捷鍵Keyboard shortcuts

j 下一段next speechk 上一段previous speech