Archives for deepmind research paper

22 Aug

DeepMind Wants to Take Humans Out of RLHF

The algorithm toggles between generating synthetic training data in the Grow step and optimising policies using filtered data in the Improve step.