Modern AI models demonstrate impressive capabilities in natural language processing and text generation. However, according to Meta’s Chief AI Scientist Yann LeCun, they still lack the abilities of memory, thinking, planning, and reasoning that are characteristic of humans. These models only imitate such skills. LeCun believes it will take at least 10 years and the development of a new approach—“world models”—to overcome this barrier.
Earlier this year, OpenAI introduced a new feature for its AI chatbot, ChatGPT, called “memory,” which allows the AI to “remember” previous interactions with users. In addition, the company released a new generation of AI models, GPT-4o, which displays the word “thinking” on the screen when generating responses. OpenAI claims that its new models are capable of complex reasoning. However, LeCun argues that they merely create the illusion of complex cognitive processes—real understanding of the world is still absent in these AI systems. While these innovations may seem like a significant step toward creating Artificial General Intelligence (AGI), LeCun counters the optimism in this field. In a recent speech at the Hudson Forum, he noted that the excessive optimism of figures like Elon Musk and Shane Legg, co-founder of Google DeepMind, may be premature. In LeCun’s view, creating human-level AI could take decades, despite optimistic predictions of its imminent arrival.
LeCun emphasizes that to build AI capable of understanding the surrounding world, machines must not only store information but also possess intuition, common sense, the ability to plan, and reason. “Today’s AI systems, despite the claims of the most enthusiastic advocates, are not capable of any of these actions,” LeCun said. The reason is simple: large language models (LLMs) operate by predicting the next token (usually a few letters or a short word), while current AI models for images and videos predict the next pixel. In other words, LLMs are one-dimensional predictors, and image and video models are two-dimensional predictors. While these models have made great strides in predictions within their dimensions, they don’t truly understand the three-dimensional world that humans perceive. As a result, modern AIs are unable to perform simple tasks that most people can.
LeCun compares AI capabilities with human learning: by the age of 10, a child can clean up after themselves, and by 17, they can learn to drive a car. Both skills are acquired in just a few hours or days. At the same time, even the most advanced AI systems, trained on thousands or millions of hours of data, are still unable to reliably perform such simple actions in the physical world. To address this issue, LeCun proposes developing world models—mental models of how the world behaves, which will be able to perceive the surrounding environment and predict changes in three-dimensional space. According to him, these models represent a new type of AI architecture. You can imagine a sequence of actions, and your world model will allow you to predict what impact that sequence will have on the world.
Part of the advantage of this approach is that world models can process significantly more data than LLMs. Of course, this makes them computationally expensive, which is why cloud providers are rushing to collaborate with AI companies. World models are a broad concept currently being pursued by several research labs, and the term is quickly becoming a new buzzword to attract venture capital. A group of prominent AI researchers, including Fei-Fei Li and Justin Johnson, recently raised $230 million for their startup, World Labs. The “godmother of AI” and her team also believe that world models will lead to the creation of significantly smarter AI systems. OpenAI is also calling its yet-to-be-released video generator, Sora, a world model, but has not revealed details.
LeCun presented the idea of using world models to create human-level AI in his 2022 paper on object-oriented or goal-oriented AI, though he notes that the concept itself is over 60 years old. In short, a world model is loaded with basic representations of the environment (such as a video of an untidy room) and memory. Based on this data, the model predicts the state of the surrounding world. It is then given specific goals, including a desired state (e.g., a clean room), and constraints are set to prevent potential harm to humans in achieving the goal (e.g., “while cleaning the room, do not harm a person”). After this, the world model finds the optimal sequence of actions to accomplish the tasks.
World models are a promising concept, but according to LeCun, significant progress in their implementation has yet to be made. Many extremely complex challenges need to be addressed to move from the current state of AI, and in his opinion, it is far more difficult than it appears at first glance.