Original Source Here
Physics and Artificial Intelligence: Introduction to Physics Informed Neural Networks
Here’s what Physics Informed Neural Networks are and why they are helpful
NOTE: This article approaches the Physics Informed Neural Networks from a Physics point of view and guide the reader from Physics to AI. A really good paper that kind of does the opposite (from AI to Physics) is the following one. Shoutout to the amazing article! 🙂
Let’s start with this:
We understand how the world works through Physics
Using the scientific method we formulate our hypothesis on how a certain phenomenon works, set up a controlled laboratory experiment and confirm/reject our hypothesis with data.
More specifically, physics is related to the evolution of the natural processes. I remember that in my old university a professor started a conference in the following way:
Heraclitus says that “everything flows”. And I believe it. But how?
And that is what Physics try to answer:
How the hell does “everything flow”?
The way that everything flows is described by some special kinds of equations, known as differential equations. Let’s try to understand what they are 🙂
1. Physics and Differential Equations
The word “differential” suggests something that has to do with the “subtraction” and it is true. Let’s go deeper.
1.1 The concept of Derivative
The derivative of a function has a specific role in Physics.
For example, the velocity is nothing but the derivative of the space with respect to the time. Let’s consider the following experiment, where we have a material point that moves along a 1D bar
So let’s focus on the blue ball that moves along the x axis. In particular, let’s say that the starting point is our 0. As the ball is moving, its location will change with respect to time. In particular, let’s say that the location vs time plot is the following:
So let’s describe it more carefully:
A. From time 0 to time 5 the location changes from 0 to 9: the ball moves forward
B. From time 5 to time 15 the location doesn’t change at all: the ball stands still.
C. From time 15 to time 17 the location changes from 9 to 3: the ball moves backwards.
D. From time 17 to time 47 the location changes from 3 to 6: the ball moves forward again.
Now, wouldn’t it be nice to have a quantity that tells you when and how the ball changes his location? And wouldn’t it be nice to know how much the location changes? Well this information is the velocity and it has the following expression:
I know it may sounds confusing but hear me out.
The velocity is nothing but the difference (that’s why the equation is known as differential equation) of the location in two very close time (that’s why we have the limit of h that goes to 0) divided by this very close difference (again, h). In other words, it is the instant change of location normalized by the time distance.
If in a certain instant (t_1) you have a very important increase of location, that means that the derivative is very large and positive. If in that instant the location is not changing, the derivative is zero. If it is changing is negative (decrease of the quantity) the derivative is negative.
In our case the location changes linearly by part, that means that the entity of the change is the same from 0 to 5, from 5 to 15, from 15 to 17 and from 17 to 47.
That is because the instant change for t that is between 0 and 5 is the same change that we have from t = 0 to t =5 and it is 9/5, and the same reasoning applies to all the other times described in the function above.
1.2 Numerical Solutions
Now, our case is extremely simple. Let’s consider the following process instead:
Now, this trajectory is way more difficult to model, and as there is no way of computing the derivative analytically: you do that numerically instead. This means that you just apply the definition of derivative given in equation (1) for all the point of the time domain. By the way, I think you are finally ready for the brutal truth:
All the real world differential equations are solved with numerical softwares
The problem is that numerical solutions may require thousands and thousands of iterations to be performed. Moreover, they require a smart way to solve the differential equation (a very well known method is called Runge-Kutta) that are often integrated into more complex software (e.g. POGO is used for Finite Element Methods). These softwares:
- Are computationally expensive
- Are (money) expensive 🙂
- Require domain knowledge
- Often require a long time run (very variable, but from minutes to hours per simulation for the FEM examples)
1.3 Ill-Posed problems
And that is not even the worst part. Some problems are known to be ill-posed. Let me show you what it means.
Let’s say that the problem is “find x and y”.
Well, that is kind of easy, right?
- x+y=4 , so x+2y=x+y+y=8
- 4+y=8 , so y=4
The solution is (x,y) = (0,4). Now this problem is not ill posed. For that set of conditions there is only one solution (the one we found) that satisfies the problem.
Now, let’s look at this problem:
The first and second equations are basically the same! So (x,y) = (0,4) is still a solution, but (1,3) is a solution as well and there are actually infinite solutions. This means that the problem is ill posed. For a single definition of the problem there is more than one solution (actually there are infinite!).
If I give you the following problem:
The so called inversion problem brings you from displacement to velocity map. This inverse problem is use to characterize a corrosion defect in a material using ultrasonic measurement (read, for example, the HARBUT algorithm here). Now it has been shown that, even with a perfect experimental setup (infinite number of sensors) the problem is still ill posed. That means that the information would still be not enough to give you a unique and stable velocity map (v).
2. Artificial Intelligence and Neural Networks
Let’s talk about AI.
The most simple way of introducing AI is by saying that:
An AI algorithm performs a certain task without being explicitly programmed to do so.
A self driving cars is not mathematically and explicitly trained to stop when all the people of the world walk in front of it, but it does stop. And it does stop because it has seen millions of people before and it is “trained” to stop when it sees people walking in front of it.
In particular, all the AI algorithms rely on a loss function.
It means that they are optimized to compute the minimum of a certain function that is the difference between the target (desired output) and the output of the algorithm.
Imagine that, given certain features of an house, you want to predict its cost (by the way, this is the very famous housing dataset problem). This is a regression task (from input space to continuous space).
If you predict that the cost is 130 and the real one is 160 the Mean Absolute Error (MAE), that is defined as:
In our case our model is a Neural Network, namely F, process the input, namely x, and outputs a predicted value y=F(x). In this example, y=130k, while the target value t=160k.
More generally, we would have a ton of houses x_1,x_2,…,x_N and there will be a t_1, t_2, … , t_N set of value to predict. This means that the global loss function will be something like this:
This loss function is dependent on the set of parameters W of your model. Of course, the lowest is the loss function, the better your model is.
For this reason, the loss function is optimized, in the sense that it has to be as low as possible. The parameters are thus iteratively changed in order to give you the smallest loss function as possible (i.e. fall in the local minimum of the Loss function).
3. AI + Physics = Physics Informed Neural Network
Now, if you read all of this:
- You deserve a round of applause 🙂
- You might be asking how the hell two different things like AI (Neural Networks) and Differential Equation (Physics) talk to each other.
To answer this we need to add another concept, which is the one of regularization.
3.1 Regularizing a Problem
In chapter 2, we have seen that every Machine Learning algorithm is, at the very end, an optimization problem. This means that you want to find an optimal set of parameters W_opt that is the one that minimize the loss function.
The problem is that you may converge to a solution which is optimal in your training set, but not generalizable enough and performs poorly on the test set (overfitting). In other words, your optimal value might be a local optimum.
Let me explain this better:
Imagine that your baby model is made by two parameters (w_1 and w_2). You are exploring the following space to look for the solution:
Using a loss function on the training set you get a Loss value which is close to 0.
Imagine that, adopting this combination of parameters on the test set (set of new data), the loss function becomes incredibly large. This means that the Loss function is not well defined for your problem and that, given that model, the actual best way to solve it would be this one:
Now, how can we fall in the green point instead of falling into the red one? By restricting the search area of the algorithm. Something like this:
Now, if the algorithm can only “look” into the green circle, it is impossible that it will fall in the red local optimum 🙂
This is the concept of regularization: manipulating the loss function so that the space of solution is restricted, and you are more likely to fall in the global optimum rather than a local one.
3.2 Physics Informed Neural Networks = Regularization!
Do you remember the inversion problem from equation (2)?
Well, these guys tried to solve it 🙂
Basically, they have the displacement (u) in some specific location and they want to know v by interpolating u in all the location the algorithm doesn’t know. In other words, given a new t, y and x they want to find the new displacement, and then the new velocity map.
Now, there are a lot of controversy about the solutions because, as we said earlier, the problem is not well posed (or ill posed). This means that even if we do find a solution we have no idea if that it is unique. Plus, there are some physical limitations that just can’t be solved (more on this here). Let’s ignore all of that and get to our point.
They want to generate the displacement and the displacement has to satisfy the wave equation (equation 2). They incorporate this information in the Loss function:
A. MSE_u is just the Mean Squared Error of the predicted and target displacement
B. MSE_f is a quantity that has to be as close as possible to 0.
Shortly, what are they doing? Nothing but a regularization:
They are helping the algorithm to find a better solution by restricting their space (penalizing solutions that don’t obey the differential equation)
Why is the Neural Network “physics informed”? Because the regularization bit is a differential equation. 🙂
At the very end of this article I hope one thing is clear:
Physics Informed Neural Networks are nothing but a Neural Network with a differential equation as a regularization term in the Loss Function
And, if you spent 9 minutes reading this article (congratulations and thank you ❤ ) you should know:
A. What is a Loss Function (Chapter 2)
B. What is a regularization term (Chapter 3.1)
C. What is a differential equation (Chapter 1.)
D. Why it is Physics Informed (Chapter 3.2)
If you liked the article and you want to know more about Machine Learning, or you just want to ask me something you can:
A. Follow me on Linkedin, where I publish all my stories
B. Subscribe to my newsletter. It will keep you updated about new stories and give you the chance to text me to receive all the corrections or doubts you may have.
C. Become a referred member, so you won’t have any “maximum number of stories for the month” and you can read whatever I (and thousands of other Machine Learning and Data Science top writers) write about the newest technology available.
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot