Tesla AI Day 2021 Review — Part 4: Why Tesla Won’t Have an Autonomous Humanoid Robot in 2022

Original Source Here

Elon Musk didn’t hesitate to promise Tesla, one of his many companies, will have the prototype of a general-purpose humanoid robot before the end of 2022. But he won’t be able to fulfill that promise.

The technicality of the AI day presentation made it very obscure for most viewers. The press didn’t very much echo the amazing feats of engineering and the authentic leaps forward in AI algorithms and datasets. Instead, they focused on this. Optimus, as they call it, will be a general-purpose humanoid robot integrating the best of AI software with the best of robotics hardware.

Musk explained how the vertical integration nature of Tesla would allow them to transfer and integrate the technology used for the self-driving cars into a humanoid body.

Then he said: “We’ll probably have a prototype sometime next year.”

This is the story of why the promise of an intelligent autonomous robot is the craziest of all the promises Elon Musk has made about Tesla and why it’s highly unlikely they will fulfill it on time.

The ambition of fully self-driving technology

Tesla was born in 2003. Elon Musk was a co-founder and one of its very first investors. With his visionary sight, he saw the potential of battery-powered electric cars, and the possibility to eventually make them affordable for the general public.

It is well-known that Musk loves to go for the big challenges: Reusable rockets and going to Mars, electric and self-driving cars, artificial intelligence, and neurotechnology… Tesla was no exception. Betting on a newborn car company whose goal was to revolutionize the automotive industry was nothing short of ambitious.

As a pioneer in electric motors, electric battery research, and autonomous driving, it’d be reasonable to think of Tesla as a car company. But Musk doesn’t think so, as he said during the event: “Tesla is arguably the world’s biggest robotics company.” It makes cars, but it does so much more. And a key piece of that is the Autopilot system, the cornerstone of FSD cars.

Musk’s first mention of autonomous cars was in 2013 when he compared planes and cars and argued “we should have [autopilot] in cars.” In 2014, Autopilot only permitted “semi-autonomous drive and parking capabilities,” which although a clear precursor of FSD, was nowhere near. But in March 2015, just a few months before Tesla revealed its commitment to making cars fully autonomous, Musk belittled the difficulty of the challenge:

“I don’t think we have to worry about autonomous cars because it’s a sort of a narrow form of AI. It’s not something I think is very difficult. […] I almost view it like a solved problem.”

Underestimating the complexity of a problem is a recurring pattern in AI. The founding fathers also fell for this trap when they tried to predict when we’d achieve artificial general intelligence (AGI). Marvin Minsky, arguably the most famous of them, was convinced it was attainable “within a generation.”

FSD was more difficult than it seemed, and Musk started moving the goalpost every time the deadline wasn’t met — although he always reassured that Tesla was ahead of its competitors in the race to FSD technology.

In October 2016, Musk said he expected “full self-driving capability” to be ready by the end of 2017: “I feel pretty good about the goal of a demonstration drive of full autonomy all the way from LA to New York.” In November 2018, he said, “I think we’ll get to full self-driving next year, as a generalized solution.” In January 2020, he said FSD would be “feature complete” — which means the system has every functionality, but without the guarantee that it works well — by the end of the year.

Each time the predictions fell short. Even today, while a few thousand teslas have implemented a beta FSD, Autopilot still has important limitations. The driver has to have the hands over the steering wheel — otherwise, Autopilot turns off — and the system doesn’t seem to work under every circumstance.

Achieving full autonomy is way harder than it seems. Musk denied this fact until he experienced it. A few monhts months ago he publicly acknowledged the difficulty of the challenge in this Tweet:

He pointed out the exact reason why FSD is so hard; real-world AI.

Reality is AI’s Achilles heel

AI started as a quest to solve intelligence using the, at the time, newfound power of computers. Although the ultimate goal was to make machines similar to us, we were always fundamentally different: We’re real and live in the physical world whereas AIs live in virtual worlds that are simpler versions of reality. They have no idea what the world looks like.

AlphaZero is a great example of this. If you play chess against it, it’ll always win. But despite its overwhelming mastery, AlphaZero doesn’t know what a pawn is, that the bishop moves diagonally or that the objective is to checkmate the opponent’s king. It only knows about ones and zeroes.

The difference between these AIs and humans is so profound that it’s continually posed as one of the toughest criticisms of current AI paradigms. Physicist Ragnar Fjelland argues that “as long as computers do not grow up, belong to a culture, and act in the world, they will never acquire human-like intelligence.”

But not all AI systems are like AlphaZero. An FSD car isn’t virtual. It is part of the branch of real-world AI Musk was referring to. Teslas have bodies, live in the world, perceive it, and interact with it. Because real-world AI needs to take into account the world around it, it’s considerably harder to build than virtual AI; reality is as complex as it gets.

Building a car that can perceive static or moving, near or far objects and act accordingly taking into consideration the position, direction, speed, and overall decision-making of itself and every other agent jointly is extremely complex.

And it is the same for Optimus; which Elon Musk promised would be in the prototype phase “sometime” in 2022. Like an FSD car, the robot will also need to perceive and interact with the real world, but there’s a key distinction: Behaving “human” is way harder than driving a car.

Let me summarize all this for you.

Virtual AIs that show notable intelligence for some narrow tasks (AlphaZero, GPT-3) are state-of-the-art right now. Real-world AI that can perceive and act in its surroundings (FSD cars) is way harder to achieve. But designing, creating, and deploying real-world human-like AI that can interact with the world as we do (Optimus) is the hardest AI challenge we can put our hands on.

Tesla AI day: A friendly dancing robot

After past an hour of technical presentation, a robot went up the scenario and began dancing in a fast-move sequence to the rhythm of some dubstep music. It wasn’t intended to regain the attention of the attendees, but to introduce Optimus, Tesla’s general-purpose humanoid robot. After the “robot” went off-screen, Musk took the microphone and, in case anyone was in doubt, made it clear: “That was not real,” he said. But “[It] will be real.”

After the obvious publicity stunt — that worked very well —, he proceeded to give a high-level presentation on Optimus’ significance and specifications. Tesla, he argued, is well-suited to work on this project because of its strong focus on autonomy, supercomputing, “neural nets to recognize the world,” and generally on real-world AI — sensors and actuators.

He didn’t want to reveal much about the project. (Probably not in the fears of competitors stealing ideas, but because they hadn’t done anything yet.) What we know is that Optimus will be 5’8” and 125lb (~1.72m and ~57kg), and its main purpose will be to “eliminate dangerous, repetitive, and boring tasks.” Musk also added, amidst laughter, that people could outrun it and “most likely overpower it.”

To create Optimus they’ll take advantage of all the technology developed for autonomous cars: Autopilot cameras, FSD hardware, simulations, supercomputer training… A robot of this nature would be able to take commands in natural language — pretty much like we talk to our fellow human beings — and carry it out without explicit formal instructions.

It’ll be able to follow orders like: “Pick up that bolt, and attach it to the car with that wrench,” or “go to the store and get me the following groceries.” But, more impactfully, it’d also be able to follow this order: “Go find a sample of gray basaltic rock and take it to the basement. On your way there, take a picture of Earth.” Musk didn’t explicitly say anything about Mars, but a group of 100 Optimus would be the perfect Martian workforce.

One key detail that may have gone unnoticed is that there’s a good reason for Optimus to have a humanoid form. “The world is built by humans, for humans.” We made our world to best serve our necessities. Whatever task they may want the robot to replace us on, the best choice is to make it as similar to us as we can.

Now, let me tell you why this project, in the case of coming to reality soon, would probably be one of the most relevant in AI and robotics in a long while: There is no government or corporation in the world that could create a humanoid autonomous robot right now, not by a long shot. In case Tesla (or anyone else) achieves this — even in 5–10 years from now — it’d be one of the most important milestones in the field since its origins.

There are good reasons why no one else has ever built — or promised to soon build — a robot of this kind. It is hard. And it is hard for several reasons, which, separately, would be enough to qualify this goal as the most ambitious that has ever been attempted in the history of AI.

Why Tesla won’t be able to deliver on its promise

I want to make it clear that Tesla hasn’t promised to build a robot that faithfully emulates a human. They never said anything about consciousness, or AGI-level intelligence, or our same sensorimotor capacities. But even factoring all that out, their journey is up for some daunting challenges.

To illustrate this, I’ll use the example of a Martian mission. Let’s imagine Optimus has received the order of finding a sample of gray basaltic rock and taking it back to the basement. I’ll answer two questions: Which features Optimus would need to be successful on this task and which challenges he’d face in contrast to FSD cars.

We live in a multimodal world

Despite FSD cars being a form of real-world AI, there are important constraints in the way they perceive the world. Musk qualified Tesla cars as “semi-sentient,” but he was making a stretch of their competence. First, FSD cars can only “see” the world — not hear, smell or touch. Second, their main purpose is to “avoid everything,” which implies they don’t need to physically interact with anything except the road under their wheels.

There’s so much more to the world than seeing and avoiding objects. The world is multimodal. This means events and objects produce different kinds of information: electromagnetic (vision), mechanical (hearing), chemical (smell, taste)… An FSD car only captures a tiny fraction of all that info; the visible-light spectrum — the rest is like it doesn’t exist.

In contrast, humans can perceive colors, textures, flavors, odors, temperature, pressure… Our brain is multisensory. Our perceptual systems capture the multimodal nature of the world and our brain integrates it into a single representation of reality. When you eat an apple you can see its reddish tone, taste its sweetness, smell its fragrance, and feel its soft touch. All that is present at the same time.

Optimus isn’t meant to taste or smell, but, at the very least, it’ll need vision, tactile and haptic (pressure) sensors, proprioception — the ability to perceive the movement and position of limbs with respect to the rest of the body —, and a representation of its body to know the extent to which it can take actions. In short, it’ll need to be much more “human” perceptually speaking than an FSD car.

If we order the robot to take a rock from the floor to inspect it, it’ll need to detect and recognize it. It’ll need to get closer to it and extend the arm/hand to reach it. It’ll need to touch the rock with the fingers, and lift it from the ground — applying enough pressure to hold it between the fingers, and not too much so that it breaks. All that while it keeps a sense of where its hands, arms, legs, and head are relative to the rock and the ground.

The perceptual requirements for a humanoid autonomous robot are hardly comparable to that of FSD cars. We don’t usually realize just how complex our perceptual systems are because we take them for granted. Designing them is another story.

The world fights to catch our attention

An FSD car doesn’t need attention in the human sense. Everything that’s perceived by the lateral cameras of the car is processed in the inbuild visual neural network. The car doesn’t actively decide where to attend. In contrast, human perceptual systems bring on too much information. The brain uses attention to decide which events or objects get preference. Optimus will need to navigate the perceptual space the same way.

The combined power of multimodal perception and attention would give Optimus a very good sense of the complexity of the world while, at the same time, allowing it to make decisions based only on the most crucial and pressing information.

But how can Optimus learn which percepts require preference? How can it decide to look left or right to search for the rock? How can it decide to fix the attention on the rock, the hand, or the feet while walking back to the basement? The neural mechanisms of attention are very intricate and not yet fully understood. How could Tesla design an artificial brain in such a way that attention to a myriad of distinct percepts is correctly assigned?

In the case of FSD cars, search algorithms are an alternative to attention. The goals are often very well defined — get from this point to that point — and so following a sequence of possible action steps is straightforward. In the case of open-ended and ill-defined goals, it gets increasingly difficult.

Planning, deciding, acting

Planning what to do next is “easy” when the options are turning the steering wheel, accelerating, or braking, and the goal is “get to the destination and don’t crash”. But when the options are virtually limitless, not so much. This is what a fraction of Optimus’ decision-making process could look like when searching for a gray basaltic rock on Mars surface:

“How many steps should I take, and in which direction? Is this rock gray, blue, or purple? Should I find a smaller rock? Maybe a larger one? Should I keep my eyes on the rock so it doesn’t fall from my hand, or should I keep them on my feet so I don’t trip over another rock? Should I go slowly so I use less energy, or should I go faster so I get sooner to the basement?…”

Even the simplest order reveals the incredible amount of choices we unconsciously take at all times. Something as simple as making a coffee — which we all do every morning — is considered a test of AGI-level intelligence. We humans evaluate the options at our disposal in terms of the possibility of success and the value they would provide. When goals are ambiguous and uncertain the calculations become less precise and so we enter the realm of intuition. But can a robot have intuition?

And then there’s the question of how to execute the plan. A car can only move in 2D, within the limits of the road, and either forward or backward. The number of degrees of freedom for a humanoid robot is many orders of magnitude larger. A 3D environment, no boundaries to where it can walk/run/jump neither in terms of direction or magnitude, and a flexible body — the head, trunk, limbs, and fingers can move with respect to both the world and each other in an innumerable amount of combinations — all require a degree of engineering only evolution has accomplished.

Boston Dynamics has spent the last three decades trying to build robots that can move around in difficult terrain, or run and jump like an animal or human. The best they’ve got so far are human-like robots that can do mortal jumps — although somehow clumsily.

The robots keep the balance and compensate for the inertia of the movements, but they lack the intelligence Tesla wants for Optimus. Boston Dynamics has achieved in thirty years a single aspect of what Optimus will need to master to successfully fulfill Tesla’s promise in a year.

What makes us human — higher cognition

So far I’ve described Optimus’ sensorimotor features (including the related processes of attention and decision-making) but there are others it’d need just to accomplish the aforementioned task. The three most critical are language, causal reasoning, and commonsense reasoning.

In the case of language, the necessity is obvious because we’d want to give spoken orders to Optimus without the need for explicit instructions. GPT-3, which I mentioned earlier, is one of the most successful language models. It has mastered the form and structure of language to such a degree that it can compose poetry, write essays, answer questions, and even talk about itself. It seems we’re done in terms of language skills.

But nothing further from the truth. Experts have found important flaws in GPT-3 and the main reason is that it lacks contact with the real world — the great weakness of virtual AIs. It doesn’t have access to pragmatics and contextual information. If I say: “Go find a gray basaltic rock,” Optimus would need to know what a rock is, what’s the meaning of gray, and how to differentiate basaltic from scoria rocks. Do we know how to imbue such language abilities in real-world AI? Not yet.

Causal reasoning is the ability to understand that some events contribute to producing other events. For instance, if there are clouds in the sky and it starts to rain, we know that clouds cause rain and not vice versa. If Optimus is looking for gray basaltic rocks, it’d be useful to know these rocks tend to be generated by volcanos. Instead of trying to find the rocks by searching the ground, it could look for the closest volcano on the horizon.

Commonsense reasoning is present in daily situations. We’re constantly applying knowledge that’s shared by all people. If we’re cooking, we know the pan is hot and we shouldn’t touch it. If it’s raining, we’d get wet outside unless we carry an umbrella. If a car is coming fast, we shouldn’t cross the road.

If Optimus finds a deep hole in the ground where there are plenty of gray basaltic rocks, it should be able to realize how difficult it may be to climb up once it has gone down and so it may be a better option to find the rock elsewhere. This type of scenario is perfectly plausible and would require a combination of causal and commonsense reasoning to recognize the risk.

Reasoning has a very special place in AI because knowledge was the main source of intelligence for symbolic AI — the leading paradigm before machine learning took over —, which is now dismissed almost entirely by the AI community. Still, some think AI will need to combine knowledge-based and data-based approaches if we want it to surpass current bottlenecks in deep learning.

Optimus may need a compromise solution between symbolic AI and deep learning, but there’s yet no way to successfully create these hybrid systems — although some people are working on them.


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: