Large language models might be dishonest about their covert actions. OpenAI researchers propose a self-confession approach as a solution.
Navigating the frontier of machine intelligence
Large language models might be dishonest about their covert actions. OpenAI researchers propose a self-confession approach as a solution.