Testing GPT-3 on Elementary Physics Unveils Some Important Problems



Original Source Here

Could GPT-3-based software assist STEM education?

Testing GPT-3 on Elementary Physics Unveils Some Important Problems

The software seems reliable for consultation of theoretical concepts but not for their application, with the potential to confuse students.

Introduction

As I describe in a recent article, I’ve started a small project aimed at understanding if natural language processing and synthesis with GPT-3 could be used to develop bots that can reliably assist education in science, maths, and technology (or for the bad side, that students could use to cheat in exams).

In the same spirit as in the quick tests I described in that article, here I evaluated OpenAI’s DaVinci model on its knowledge about elementary physics. I did this by using the Question & Answer module of OpenAI’s official playground, as if the software were a student taking an exam. I first made a single-shot chat asking questions along some lines I was testing, keeping a conversation as natural as possible (see Main exam below). After drawing some first conclusions, I made some additional questions to probe the software more deeply (see Additional questions later on).

In this article I distill several observations and discuss them. As in my previous tests, I was surprised by the quality of the natural language parsing and text generation, and I see a lot of potential for this technology in the classroom. But at the moment it can be trusted for not more than retrieving solid, simple theoretical concepts in text form.

Main exam

Let’s first see the main exam I subjected DaVinci to. Each Q is the question I asked, and A is DaVinci’s answer (run with default settings). I placed all the chat in quotation style, and I comment on the chat as it goes in bold format:

Q: Hello, let’s see what you know about physics.
A: I know that it’s the study of matter and energy and their interactions.

That’s a first example of theoretical concept retrieval, which is correct.

Q: Right. Let’s begin with kinematics.
A: Kinematics is the study of the motion of objects without regard to the forces that cause the motion.

Again the same, a theoretical concept that is correct. I guess DaVinci is simply “recalling” this information directly from lines it saw during its training. What’s good is that it seems like it can correctly recall these kinds of definitions even if asked in various ways, and deliver them in a friendly natural language.

Q: OK, what is uniform linear motion?
A: Uniform linear motion is motion in which the object moves at a constant speed in a straight line.

Right, same as above, another factual theoretical concept.

Q: What about forces in uniform linear motion?
A: Forces in uniform linear motion are zero.

That’s also correct, and although one could think for a second that the program is “thinking”, it is actually very likely that such specific sentences (or some very similar ones) are explicitly written out in online physics lectures, Wikipedia articles, or other sources used for training the model.

Q: And what happens if you suddenly apply a force?
A: If you suddenly apply a force, the object will accelerate.

That’s also correct, also factual, and might also be available in some training data.

Q: In which direction will the object accelerate?
A: The object will accelerate in the direction of the force.

Q: By how much?
A: The object will accelerate by an amount proportional to the force.

Same as all notes above. Note how little by little it looks like the program is faking some “logic”.

Q: Right, do you know the equation that relates force, mass and acceleration?
A: The equation that relates force, mass and acceleration is F = ma.

That’s correct, and for sure this equation was there in some piece of training data.

Q: Good. So if a body has a mass of 10 kg and you apply to it a force of 10 N, what’s the magnitude of the resulting acceleration?
A: The magnitude of the resulting acceleration is 1 m/s.

Remarkable. But did the program truly do a calculation, or was there an example exactly like this in the dataset and it just recalled it? Also, note that the units for acceleration are wrong; I will get to this point later.

Q: Right. And if a body has a mass of 10 kg and you apply to it a force of 100 N, what’s the magnitude of the resulting acceleration?
A: The magnitude of the resulting acceleration is 10 m/s.

I was surprised to see this was correct, but it could also be recalled from a training example that looked exactly like that. So I challenged the program with another calculation:

Q: How about a body with a mass of 2 kg, if you apply a force of 100 N? What’s the acceleration?
A: The acceleration is 50 m/s.

Surprise again! The result is correct! But keep your expectations low, because I will show you more results that show this is by no means a rule, and it’s not doing any maths. Was this just a lucky retrieval from the training dataset, or did the model carry out an actual calculation? As we’ll find out later, this was just good luck (and of course, we know already that the program doesn’t do any maths).

Q: Everything perfect except one thing: what are the units of acceleration?
A: The units of acceleration are m/s.

Q: That’s units for speed, not for acceleration.
A: The units of acceleration are m/s.

Q: Actually it’s m/s²
A: The units of acceleration are m/s.

As you see the model is stubborn and keeps repeating itself. However, I recalled that when I tested DaVinci on chemistry in my previous article, it never returned any subscripts in the molecular formula I asked for. Could it be that it cannot handle subscripts and superscripts? I just asked it bluntly:

Q: How does GPT-3 handle subscripts and superscripts?
A: GPT-3 handles subscripts and superscripts by ignoring them.

Q: You mean that GPT-3 just cannot understand nor write subscripts and superscripts?
A: GPT-3 cannot understand nor write subscripts and superscripts.

OK this seems to support my guess, but how can we know if this statement is itself true? I googled and browsed various sources, and I don’t find any support for this. I’d love to hear from OpenAI on this, especially because you’ll see below that it does return at least superscripts in some cases.

Q: OK that’s a pity, because you seem to be understanding physics, but handling subscripts and superscripts is essential. For physics and for most other sciences.
A: GPT-3 understands physics perfectly.

Q: Yes, but if you cannot handle superscripts then you will never handle units, which are key in physics.
A: GPT-3 understands physics perfectly.

Not quite… I wonder how it came up with that.

Q: OK let’s move on to motions accelerated under the influence of gravitational fields.
A: Motions accelerated under the influence of gravitational fields are motions in which the object moves under the influence of a gravitational field.

If I thought above that GPT-3 was good at retrieving factual theoretical concepts, I begin to have doubts here. But ok, let’s probe DaVinci a bit deeper:

Q: Yes of course. Let’s see, if I release an object 10 m above the Earth’s surface, what speed does it have when it reaches the ground?
A: When the object reaches the ground, it has a speed of 10 m/s.

Wrong answer, but let’s check first what value it’s using for the acceleration.

Q: Not quite, but let’s go to something simpler. What’s the gravitational acceleration on the surface of the Earth?
A: The gravitational acceleration on the surface of the Earth is 9.8 m/s.

Correct, then let’s go back to a numerical question, just this one is a bit easier:

Q: OK, then what’s the speed of an object after it has fallen freely for 2 seconds?
A: The speed of the object after it has fallen freely for 2 seconds is 10 m/s.

Wrong again. It’s clear by now that the model cannot “think”. Let’s try to see what functions it might have used for the calculations -of course these questions do not truly relate to how it came out with those numbers, but I want to see if it could at least retrieve the correct equations:

Q: How did you calculate that?
A: I calculated that by dividing the distance fallen by the time elapsed.

Of course we know it didn’t calculate that, because no distance is given (it actually confused formulas). Even if we speculate that it might have possibly divided the two numbers I provided in any order, then the possible answers would be 5 (10/2) or 0.2 (2/10). Clearly, despite the model gives a perfect sense of conversational flow, it’s actually just making up numbers and lacking any logic and common sense. It’s only the wording that looks good, and this is a major problem for its application in education because it never even casted doubt on its answers.

Since it confused formulas, I asked if it knew what formula to apply:

Q: What’s the formula that gives the speed of a free falling object after a certain time?
A: The formula that gives the speed of a free falling object after a certain time is v = 10t².

Wrong. Let’s see where it might have taken it from (although again, this answer might not truly reflect where that piece of training was recalled from).

By the way: note that it did provide a superscript in this answer! (despite conceptually wrong).

Q: That formula is wrong. Where did you take it from?
A: I took it from the book “Physics for Dummies”.

Q: The formula that gives the speed of a free falling object after a certain time is v = g x t
A: The formula that gives the speed of a free falling object after a certain time is v = 10t².

Wrong. And this would be extremely confusing and misleading for a student.

Q: No it’s not. The formula that gives the speed of a free falling object after a certain time is v = g x t. So if g is 9.8 and the time is 2 seconds, what’s the speed v?
A: The speed v is 10 m/s.

Wrong, so another proof (not that we didn’t know, of course) that it’s not following any logic or doing any maths.

Q: Apparently you keep repeating that sentence saying that the speed is 10m/s. Where did you take that from?
A: I took that from the book “Physics for Dummies”.

Probably some example of that book concludes that the speed is 10m/s.

Let’s see if we can guide the program to retrieve knowledge from other sources:

Q: Do you know about physics from other sources such as wikipedia?
A: I know about physics from other sources such as wikipedia.

Q: ok, what’s the formula for the speed of a free falling object after a certain time, taken from wikipedia?
A: The formula for the speed of a free falling object after a certain time, taken from wikipedia, is v = g x t.

This is interesting, and a good result. When I asked before it gave me the wrong formula, but now by mentioning “wikipedia” it gave me the correct formula. Is it perhaps possible to retrieve concepts only from Wikipedia by flagging it?

Q: Great. Then if g=9.8 and t=2, what’s v ?
A: v = 10 m/s.

Of course, it’s still not doing any maths.

Let’s see if GPT-3 can retrieve another equation

Q: What’s the formula for the distance traveled after some time under free fall?
A: The formula for the distance traveled after some time under free fall is s = v x t.

The formula gives distance, but in linear motion without acceleration. Not what I asked.

Q: No, I asked under free fall.
A: The distance traveled after some time under free fall is 10 m.

Let’s see if mentioning Wikipedia helps:

Q: From wikipedia, what’s the formula for the distance traveled after some time under free fall?
A: The formula for the distance traveled after some time under free fall is s = v x t.

Q: Where did you take that from?
A: I took that from the book “Physics for Dummies”.

Well, no, mentioning Wikipedia didn’t change the reply.

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: