”There's this thing that we sometimes call reinforcement learning from community ...”

There's this thing that we sometimes call reinforcement learning from community feedback, as opposed to just reinforcement learning from human feedback, which maybe would use potentially a smaller bias set of non-representative people. And in the case of Community Notes, what it would look like is directly training the model to be writing notes that would be maximally likely to be found helpful by a simulated set of raters who typically disagreed in the past.

2026-04-14 How Community Notes Reduce Viral Misinformation

顯示前後文Show context

鍵盤快捷鍵Keyboard shortcuts