UC Berkeley Researchers Present Starling-7B, An Open LLM
From The Decoder:
UC Berkeley researchers present Starling-7B, an open Large Language Model (LLM) trained with Reinforcement Learning from AI Feedback (RLAIF).
Reinforcement Learning from AI Feedback (RLAIF) uses feedback from AI models to train other AI models and improve their capabilities. For Starling-7B, RLAIF improves the helpfulness and safety of chatbot responses. The model is based on a fine-tuned Openchat 3.5, which in turn is based on Mistral-7B.
If RLAIF sounds familiar, it’s probably because you’ve heard of it in the context of ChatGPT, but with one crucial difference: For OpenAI’s GPT-3.5 and GPT-4 models, humans improved performance by rating the model’s output, a process called Reinforcement Learning from Human Feedback (RLHF). This was the “secret sauce” that made interacting with ChatGPT feel so natural.
Compared to human feedback, AI feedback has the potential to be cheaper, faster, more transparent, and more scalable – if it works. And Starling-7B shows that it might.
Filed under: News
About Gary Price
Gary Price (email@example.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com.