Generate a Complete 3D Scene Under Arbitrary Lighting Conditions from a Set of Input Images



Original Source Here

Generate a Complete 3D Scene Under Arbitrary Lighting Conditions from a Set of Input Images

This new method is able to generate a complete 3-dimensional scene and has the ability to decide the lighting of the scene. All this with very limited computation costs and amazing results compared to previous approaches.

Image via: P. P. Srinivasan et al., “Nerv: Neural reflectance and visibility fields for relighting and view synthesis”

NeRV, or Neural Reflectance and Visibility Fields for Relighting and View Synthesis, is a method that produces a 3D representation of a scene and can generate arbitrary lighting conditions. It only needs a set of images of the scene as inputs to generate novel viewpoints of the scene under any chosen lighting conditions!

Image via: P. P. Srinivasan et al., “Nerv: Neural reflectance and visibility fields for relighting and view synthesis”

This is an extremely complicated task since to simulate the lighting effect of a scene, we need to compute the visibility of each point with each light source. Each of these computation represents a full evaluation of a neural network. Meaning that we would need millions of training iterations per scene just to compute the lighting of this very scene, and this would be impossible. Each of these neural networks takes a 3D location as inputs and outputs the volume density, surface normal, material parameters, distance to the first surface intersection in any direction, and visibility of the external environment in any direction. Using this information, they use volume rendering techniques to change these values into an image. Basically, they train a network to map specific 3D locations of a scene to a continuous field of volume density and color. Of course, this rendering function is differentiable, allowing them to train their model by measuring the residual between synthesized and ground truth observed images as the error value to know how much to update the weights of the network at each training iteration.

Multi-Layer perceptions used in NeRF. Image via: P. P. Srinivasan et al., “Nerv: Neural reflectance and visibility fields for relighting and view synthesis”

These neural networks are simple multi-layer perceptrons (MLP) with a final activation layer to output the emitted RGB radiance at the specific position and viewpoint taken from their previous NeRF paper.

If we take one point in the center as an example, they start by estimating the blue arrow, which is the termination depth for a ray leaving the location in a given direction. From these termination depths, they are able to compute the direct illumination shading to estimate the one bounce indirect illumination at a point arriving from any direction. This simplification is done with their “visibility” MLP. This particular multi-layer perceptron takes any location and emits an approximation of the environment lighting visibility along any direction, specified in the inputs, as well as an approximation of the expected termination depth. Here, the black dots represent one evaluation of a neural network for volume density at a position, the red arrows represent the evaluation of the visibility network along a direction, and the blue arrows represent the evaluation of the visibility network for the expected termination depth of a ray at the position along a direction.

Complexity comparison. Image via: P. P. Srinivasan et al., “Nerv: Neural reflectance and visibility fields for relighting and view synthesis”

The power of this method can easily be seen with this representation. If we come back to this simplified graph, you can see that NeRV’s computational complexity is much lower just by looking at the number of red and blue lines being approximated into one computation by their networks. And this is also true numerically where n is the number of samples along each ray, l is the number of light sources, and d is the number of sampled indirect illumination directions. This bottom left complexity is where it would give you millions of computations to do in total for a scene. NeRV transforms this high complexity to only the addition of the number of points and the light sources times your number of sampled illumination directions.

This way, NeRV outputs the volume density as well as the full incident illumination surrounding the scene. Giving them a complete 3-dimensional scene and the ability to decide the lighting of the scene. All this with very limited computation costs and amazing results compared to previous approaches.

Of course, this was just an introduction to this new NeRV paper. I strongly invite you to read their paper for a better technical understanding, it is the first link in the references. Their code is unfortunately not available at this time, but a link to their project page where the code will be linked is in the references as well.

Watch the video to see NeRV examples!

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: