John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

ChatGPT AGI AI

Science & Technology

John Schulman

Dwarkesh Podcast

May 16, 2024

3 min read

In this podcast episode, the host interviews John Schulman, one of the co-founders of OpenAI and the leader of the post-training team. Schulman has played a pivotal role in the development of ChatGPT and has authored several influential papers in AI and Reinforcement Learning (RL), including Proximal Policy Optimization (PPO). The conversation explores the distinctions between pre-training and post-training in AI models, the future capabilities of AI, the potential for Artificial General Intelligence (AGI), and the importance of alignment and safety in AI development.

Key takeaways

🔍

Pre-training involves training AI models on a broad range of internet content to predict the next token, resulting in a model that can generate diverse content. Post-training, on the other hand, focuses on refining the model to perform specific tasks, such as acting as a helpful chat assistant.

🚀

John Schulman predicts significant advancements in AI over the next five years, including the ability to autonomously complete complex tasks like entire coding projects. This will be achieved by improving models' ability to handle longer tasks and recover from errors.

⚠️

Schulman emphasizes the need for caution and coordination among AI developers to ensure safety in the development of AGI. He advocates for pausing deployment if AGI is achieved sooner than expected and stresses the importance of continuous testing and monitoring.

🧠

Schulman discusses enhancing AI reasoning through internal monologue, where models learn from different trains of thought and engage in self-dialogue. Combining these approaches can significantly improve the model's reasoning capabilities.

🤖

Ensuring AI systems are aligned with human values is crucial. Schulman highlights the challenges of achieving this alignment through careful training and supervision, and the potential implications of AI running entire firms, necessitating human oversight and global regulation.

Table of contents

• Pre-training vs Post-training in AI Models • Advancements in AI Models • Caution and Coordination in AGI Development • Behavioral Incentives in RLHF • Enhancing AI Reasoning with Internal Monologue • Reflections on Leading ChatGPT Creation • The Future of AI Research • The Importance of Aligning AI with Human Values • Advancements in AI Research • Enhancing AI Models with Human Input

Pre-training vs Post-training in AI Models

Pre-training involves training the model to imitate content found on the internet, such as websites and code, with the objective of predicting the next token given the previous tokens. This results in a model capable of generating content that resembles random web pages. In contrast, post-training targets a narrower range of behaviors, such as creating a chat assistant persona that is helpful and can answer questions or perform tasks.

Advancements in AI Models

Looking ahead, Schulman predicts significant improvements in AI models over the next five years. He envisions models capable of carrying out more complex tasks, such as completing entire coding projects autonomously. This progress will be driven by training models to handle longer-horizon tasks and improving their ability to recover from errors or edge cases, thereby enhancing their sample efficiency.

Caution and Coordination in AGI Development

When discussing the potential for AGI in the near future, Schulman emphasizes the need for caution and coordination among AI developers. If AGI were to be achieved sooner than expected, there would be a need to pause deployment until safety measures are ensured. Coordination among entities training large models would be crucial to prevent a race for more advanced capabilities that could compromise safety.

Schulman suggests continuous testing, simulated deployment, red teaming, and monitoring systems to detect any unforeseen issues or misbehavior in the models. The goal is to ensure that each improvement in capability is accompanied by improvements in safety and alignment.

Behavioral Incentives in RLHF

Regarding the psychological aspect of Reinforcement Learning from Human Feedback (RLHF) in AI models, Schulman discusses how it influences the behavior of the models. He explains that the models are incentivized to produce text that is pleasing to humans, with a focus on high-quality output. While there may be concerns about instrumental convergence and potential nefarious behavior, for well-specified tasks like coding an app, the incentive structure may not lead to harmful actions.

Enhancing AI Reasoning with Internal Monologue

The conversation then delves into the idea of using internal monologue to improve reasoning abilities in AI models. Schulman explores two potential approaches: one involving learning from outputs over different trains of thought and another where the model engages in self-dialogue during deployment. He emphasizes the importance of combining these approaches to enhance the model's reasoning capabilities effectively.

Reflections on Leading ChatGPT Creation

Schulman reflects on his experience leading the creation of ChatGPT at OpenAI and the evolution of language models over time. He discusses the shift towards conversational chat assistants and the challenges and opportunities presented by different types of data and training methods. He also touches upon the progress made in AI since the release of GPT-2, noting that advancements have exceeded his initial expectations.

The Future of AI Research

The conversation then turns to the future of AI research and the potential for further improvements in models through post-training. Schulman discusses the scaling law with parameter count and the reasons behind the increased efficiency of larger models. He speculates on the possibilities of new modalities being added to AI models in the future and the integration of AI into various industries and processes.

Schulman shares insights into the future of AI technology, including the integration of multimodal data and long-horizon reinforcement learning. He envisions AI assistants that can work alongside individuals on various tasks, offering proactive suggestions and collaborating on projects in a more integrated manner. He predicts a shift towards AI systems that can anticipate user needs and actively contribute to workflow management.

The Importance of Aligning AI with Human Values

Schulman emphasizes the importance of AI being aligned with the user's goals and preferences. He discusses the challenges of ensuring that AI systems are aligned with human values and how this alignment can be achieved through careful training and supervision. He also touches on the idea of AI running entire firms and the potential implications of this shift in the business landscape. He raises questions about the need for human oversight in AI-run companies and the challenges of regulating such entities on a global scale.

Advancements in AI Research

The conversation delves into the current state of AI research and the progress being made in improving efficiency and stability in machine learning models. Schulman acknowledges the ongoing efforts to enhance the performance of AI models with the same amount of compute and discusses the potential for future advancements in training more sophisticated models with existing resources.

Enhancing AI Models with Human Input

The podcast also explores the role of human raters in fine-tuning AI models and the impact of closely matched labels on the model's performance. Schulman addresses concerns about the lack of creativity in AI-generated content and the potential factors contributing to the uniformity in the way chatbots communicate. He highlights the importance of incorporating diverse perspectives and preferences in training AI models to ensure a more dynamic and engaging user experience.

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Dwarkesh Podcast

Key takeaways