Jay Baxter

There's this thing that we sometimes call reinforcement learning from community feedback, as opposed to just reinforcement learning from human feedback, which maybe would use potentially a smaller bias set of non-representative people. And in the case of Community Notes, what it would look like is directly training the model to be writing notes that would be maximally likely to be found helpful by a simulated set of raters who typically disagreed in the past.

鍵盤快捷鍵Keyboard shortcuts

j 下一段next speechk 上一段previous speech