
Many of you know particular ethics traditions better than I do, so I will describe this in broad terms. There are, broadly, three ways AI systems can be aligned by a society. One is by outcome — optimising a utilitarian metric. For Facebook, that meant optimising the click-through rate, and the algorithm was very well aligned in promoting those deepfake ads for eyeballs — very well aligned to the wrong outcome. You could choose a different metric — say, polarisation per minute, or PPM, and optimise to lower it — and it would work for a while. But then it would find a way to reward-hack the measure: for instance, by raising topics people already agree on. You get a ranked feed and ads full of bubbled information, you never stretch yourself, people do not feel polarised, but the whole society becomes isolated. We have seen platforms fall into this trap. Reward hacking is very hard to overcome if you align by outcome, or by a utilitarian metric, alone.