Work in Progress ...
Problem Intro
You have an untrained neural net that spits a gravity scalar potential \( \mathbb{U} \) in response to a 3D cartesian position
The goal is to figure out gravitational acceleration from the gravity scalar potential
Turns out the gravitational acceleration \( \mathbf{a}(\mathbf{x}) \) is obtained as the negative gradient of the potential:
\[ \mathbf{a}(\mathbf{x}) = -\nabla U(\mathbf{x}) \]
At some point we want to train the neural net, so inevitably we'll get a dataset of 3d cartesian products mapped to their respective acceleration values
Hang on, why bother with a NN spitting out a scalar potential when it can spit 3 ripe values for acceleration ?
"Most obvious is the fact that the network is now trained with the knowledge that there is a rela- tionship between the accelerations it produces via automatic differentiation and the more fundamental scalar potential. Second, because the network is learning a representation of the potential rather than the accelerations, all three acceleration vector components of the training data are now being used to constrain a single scalar function. This is a much more efficient regression for the network, learning a single potential rather than being forced to learn three separate acceleration features"
And I advise you dear reader to call it out for what is .... a bullshit claim till proven right
So let's look at the other possible reasons. The first on the list is the loss function in the paper
The loss function is taken from this paper: Physics-informed neural networks for gravity field modeling of small bodies
The loss in question is here:
\[ \mathcal{L}(\theta) = \underbrace{\frac{1}{N_f} \sum_{i=1}^{N_f} \left\| \mathbf{a_i}_{\text{true}} - (- \nabla U(\mathbf{x}_{i})) \right\|^2}_{\text{acceleration error}} + \underbrace{\frac{1}{N_f} \sum_{i=1}^{N_f} \left\| \nabla^2 U(\mathbf{x}_i) \right\|^2}_{\text{Laplacian penalty}} + \underbrace{\frac{1}{N_f} \sum_{i=1}^{N_f} \left\| \nabla \times \nabla U(\mathbf{x}_i) \right\|^2}_{\text{curl penalty}} \]
The first term probably makes sense given it is your basic MSE with the true acceleration values. As for the rest ...
The paper's contribution is that since gravity is a conservative force, the loss function could be significantly enhanced by additional dynamic properties
The scalar potential learned by the network must also obey these additional physics properties
- \( \nabla^2 U = 0 \)
- \( \nabla \times \mathbf{a} = 0 \)
So the cool bit is that by including these terms, we ensure the model being learnt is more in lne with the physics of things which in this case makes sure that force of gravity being modelled in inline with gravity being a conservative force and the physics that comes along with that fact
Now let's think. Is there anything about the loss that might halt the neural net from spitting out 3 acceleration values? Let's give it a shot
\[ \mathcal{L}(\theta) = \underbrace{\frac{1}{N_f} \sum_{i=1}^{N_f} \left\| \mathbf{a_i}_{\text{true}} - \mathbf{a_i}_{\text{predicted}} \right\|^2}_{\text{acceleration error}} + \underbrace{\frac{1}{N_f} \sum_{i=1}^{N_f} \left\| \nabla \cdot (\mathbf{a_i}_{\text{predicted}}) \right\|^2}_{\text{Laplacian penalty}} + \underbrace{\frac{1}{N_f} \sum_{i=1}^{N_f} \left\| \nabla \times \mathbf{a_i}_{\text{predicted}} \right\|^2}_{\text{curl penalty}} \]
The above is keeping in mind that \( \nabla^2 \mathbf{F} = \nabla \cdot \nabla \mathbf{F} \) . So clearly it's not the loss function ... atleast not in form
So maybe just maybe it's regularity of what's been learnt
So just to get (WIP)
Calculating the Laplacian and the Curl of the Scalar Potential field in Jax
There are no jax primitives to calculate these directly To get going, a refresher in Vector Calculus is order
Gradient
A vector valued function that takes in a scalar valued (differential) function (of several variables!)
A more precise way of expressing the domain and range of the function \( f \) and the gradient of the function \( \nabla f \) is:
\[ f: \mathbb{R}^n \rightarrow \mathbb{R}, \quad \nabla f: \mathbb{R}^n \rightarrow \mathbb{R}^n \]
And now how the gradient is computed at a point \[ p = ( x, y, z) \] (Note: we're using just 3 variables but that is not a constraint in general)
\[ \nabla f(p) = \frac{\partial f(p)}{\partial x}\mathbf{i} + \frac{\partial f(p)}{\partial y}\mathbf{j} + \frac{\partial f(p)}{\partial z}\mathbf{k} \]
Jacobian: Generalization of the Gradient
While the input to a gradient is a scalar function, and some point it makes to ask if there is an equivalent if the input is a vector valued function
For a function \( \mathbf{F} : \mathbb{R}^n \rightarrow \mathbb{R}^m \), the Jacobian is an \( m \times n \) matrix whose \( (i, j) \)-th entry is \( \partial F_i / \partial x_j \)
To make things more concrete, let's say we have a function \( \mathbf{F} : \mathbb{R}^3 \rightarrow \mathbb{R}^3 \) and takes in \( \mathbf{p} = ( x, y, z) \in \mathbb{R}^{3} \) as input and produces \( (f_{0}(\mathbf{p}), f_{1}(\mathbf{p}), f_{2}(\mathbf{p})) \in \mathbb{R}^{3} \)
The Jacobian is:
\[ J_{\mathbf{F}}(\mathbf{p}) = \begin{bmatrix} \frac{\partial f_0}{\partial x} & \frac{\partial f_0}{\partial y} & \frac{\partial f_0}{\partial z} \\ \frac{\partial f_1}{\partial x} & \frac{\partial f_1}{\partial y} & \frac{\partial f_1}{\partial z} \\ \frac{\partial f_2}{\partial x} & \frac{\partial f_2}{\partial y} & \frac{\partial f_2}{\partial z} \end{bmatrix} \]
Divergence
While the gradient maps a scalar function to a vector field (a vector field pretty much a vector valued function [there's a bit more to this]) , and the Jacobian maps a vector function to a matrix field (pretty much a matrix valued function), the divergence is a scalar-valued function applied to a vector field.
Computationally that means vector in scalar out
For a vector field \( \mathbf{F} : \mathbb{R}^n \rightarrow \mathbb{R}^n \), the divergence is defined as the dot product of the del operator with the vector field:
\[ \nabla \cdot \mathbf{F} = \sum_{i=0}^{n-1} \frac{\partial F_i}{\partial x_i} \]
More concretely, if \( \mathbf{F}(x, y, z) = (F_0, F_1, F_2) \), then:
\[ \nabla \cdot \mathbf{F} = \frac{\partial F_0}{\partial x} + \frac{\partial F_1}{\partial y} + \frac{\partial F_2}{\partial z} \]
Curl
Something in the same line as the divergence is that in that it operates on a vector field. Something interesting to note about this operator is doesn't generalize beyond 3D space. The mechanical reasoning for this is evident if you see how the curl of a vector field is calculated
For a vector field \( \mathbf{F}(x, y, z) = (F_0, F_1, F_2) \), the curl is defined as the cross product of the del operator with \( \mathbf{F} \):
\[ \nabla \times \mathbf{F} = \begin{vmatrix} \mathbf{i} & \mathbf{j} & \mathbf{k} \\ \frac{\partial}{\partial x} & \frac{\partial}{\partial y} & \frac{\partial}{\partial z} \\ F_0 & F_1 & F_2 \end{vmatrix} \]
Expanded out, this becomes:
\[ \nabla \times \mathbf{F} = \left( \frac{\partial F_2}{\partial y} - \frac{\partial F_1}{\partial z}\right) \mathbf{i} + \left( \frac{\partial F_0}{\partial z} - \frac{\partial F_2}{\partial x}\right)\mathbf{j} + \left(\frac{\partial F_1}{\partial x} - \frac{\partial F_0}{\partial y} \right)\mathbf{k} \]
Curiously the result is a new vector field representing the axis and intensity of local rotation of the input vector field.
Laplacian
(Parroting the wikipedia article on the same) The Laplacian is a differential operator that can be thought of as the divergence of the gradient of a scalar valued function.
For a scalar-valued function \( f : \mathbb{R}^n \rightarrow \mathbb{R} \), the Laplacian is defined as:
\[ \Delta f = \nabla \cdot \nabla f = \nabla^2 f = \sum_{i=0}^{n-1} \frac{\partial^2 f}{\partial x_i^2} \]
Hessian
The mother lode when it comes to our computation of everything we need (for) and this will be made more evident shortly.
The Hessian is a square matrix of second-order partial derivatives of a scalar-valued function
To put it another way, the Hessian of a function \( \mathbf{f} \) is the Jacobian matrix of the gradient of the function \( \mathbf{f} \) that is: \( \mathbf{H}(f(\mathbf{x})) = \mathbf{J}( \nabla f(\mathbf{x})) \).
For a scalar function \( f : \mathbb{R}^n \rightarrow \mathbb{R} \), the Hessian matrix is:
\[ H_f(x) = \left[ \frac{\partial^2 f}{\partial x_i \partial x_j} \right]_{i,j=1}^{n} \]
So if \( f \) is a function of three variables i.e. \( f(x, y, z) \), then:
\[ H_f(x, y, z) = \begin{bmatrix} \frac{\partial^2 f}{\partial x^2} & \frac{\partial^2 f}{\partial x \partial y} & \frac{\partial^2 f}{\partial x \partial z} \\ \frac{\partial^2 f}{\partial y \partial x} & \frac{\partial^2 f}{\partial y^2} & \frac{\partial^2 f}{\partial y \partial z} \\ \frac{\partial^2 f}{\partial z \partial x} & \frac{\partial^2 f}{\partial z \partial y} & \frac{\partial^2 f}{\partial z^2} \end{bmatrix} \]
Jax primitives to calculate these:
- Gradient:
jax.grad(f)
computes the gradient of a scalar valued function. This is well documented. This machine learning 101, no questions here - Jacobian:
jax.jacobian(F)
computes the full Jacobian matrix of a vector-valued function. Again another well documented function though not as known as calculating the gradient. - Divergence:
No direct Jax primitive but can be computed as the trace of the Jacobian:
jax.jacobian(F)(x).trace()
when \( F : \mathbb{R}^n \rightarrow \mathbb{R}^n \) - Curl:
No direct Jax primitive It must be manually computed using partial derivatives and the determinant form in 3D - Laplacian:
No direct Jax primitive but can be computed as the trace of the Hessian:
jax.hessian(f)(x).trace()
for \( f : \mathbb{R}^n \rightarrow \mathbb{R} \) - Hessian:
jax.hessian(f)
computes the full matrix of second derivatives for a scalar function. The documentation exists here