Industry News, Avvir Academy
April 10, 2023

(NeRF) The Next Wave in the Future of 3D Reality Reconstruction

By Adam Cisler - Senior Solutions Engineer

The next wave in the future of 3D reality reconstruction is here. It’s being refined and crafted in universities like Stanford and Berkeley. Google is using it in hyper realistic 3D renderings within Google Maps experiences and many startups and reality capture companies are working to incorporate this fledgling technology into their products. After being invented just 3 years ago in the shadows of the pandemic in 2020, Neural Radiance Fields will completely alter the Architecture, Engineering & Construction world and proliferate within real estate firms globally at an exponential pace. While Generative AI has been getting all the hype and attention recently, AI advancements of equal importance are occurring in other verticals. Neural Radiance Fields, or NeRFs for short, are one of those advancements in AI and 3D reality reconstruction technology which you should know about.

What is a Neural Radiance Field and why should you care?

There are many explainer posts out there which will do a better job of walking you through the technical aspects of what it is and how to use it. I will provide links at the end of this post for you to continue researching. For brevity though, I will attempt to break it down to an easy to digest concept. To do this I will first need to describe Photogrammetry - one of the current leading techniques besides point cloud or laser scanning for 3D reconstruction. At the bottom of this post you will find a link to an earlier thought leadership piece from Avvir on the subject if you’d like to take an in depth look at some shortcomings of photogrammetry (1). 

Photogrammetry is a concept which has been used since the early days of photography to extract 3D geometry from 2D photographs. It uses geometrical analysis from multiple images of the same space / object etc from multiple angles to extrapolate a 3D form. This was functionally important in mapping terrain in WW1 for example and is in use today in drone software for capturing topography. 

There are many applications which use a mobile phone camera & algorithms to create 3D models from multiple images. Recently the inclusion of the miniaturized lidar scanner has made 3D reality reconstruction more accessible. As accessible as the phone in your pocket you can now capture 3D scenes or objects. At the moment however there are limitations to the current photogrammetry process and the algorithms which recreate the scenes. To support this I am providing this great link (2) to an episode of the podcast “Computer Vision Decoded” by 

EveryPoint  and in the episode Matt Tancik, a student pursuing a PhD in the computer science and electrical engineering department at UC Berkeley.and the creator of NeRF Studio (3)  at Berkeley where they discuss the subject of NeRF replacing Photogrammetry. 

The most error prone issue with photogrammetry is the need for stable photos to be taken at prescribed increments which are used to analyze using a geometric algorithm to recreate the scenes. The errors come from process or sequence of photographs, errors on glass or reflective surfaces, motion blur or quality of photographs & several other factors which make photogrammetry less reliable for reality reconstruction vs Lidar point cloud scanning.  

NeRF is doing something similar in the act of analyzing photos or even video to extract 3D information from the scene or object though it is now aided by more powerful Neural networks and AI trained to more accurately reproduce without the limitations of photogrammetry. It achieves this by using the neural network to define what is happening in each voxel - or 3D pixel - of a given scene or object vs just trying to define the surface or face of the object in the picture.

I actually think the summary Bing-GPT gave me on the differences is worth including here as it breaks it down simply and should impress upon you the difference between the two and the reason why NeRF is going to be so important:

Question I posed to GPT-BING on 3/20/23: 

In a few sentences, how are neural radiance fields or NeRF different from regular photogrammetry?

Response: Neural radiance fields (NeRF) and photogrammetry are both methods for creating 3D models from 2D images. However, they differ in how they represent and render the 3D scene.

Photogrammetry uses geometric algorithms to reconstruct a 3D mesh of points or polygons from a set of overlapping images. It requires precise camera calibration and alignment, as well as consistent lighting conditions. Photogrammetry can produce high-quality 3D models, but it can also suffer from noise, artifacts and missing details.

NeRF uses neural networks (a neural network is a series of algorithms that tries to find patterns in data by imitating how the human brain works) to learn a continuous function that maps a 3D point and viewing direction to a color and density value. Since it is using a neural network It does not require explicit camera calibration or alignment, and it can handle varying lighting conditions by modeling shadows. NeRF can produce photo-realistic images from novel viewpoints, but it can also be slow to train and render.

According to some recent research papers (4), NeRF can outperform photogrammetry in terms of accuracy and realism for some scenes, especially those with complex geometry or illumination. However, NeRF is still an active area of research and has some limitations, such as difficulty in handling dynamic objects or large-scale scenes.

Some more History and context:

Since the invention of photography in the early 19th century man has been trying to capture reality and reproduce it. In the convening 200 years since those humble beginnings it is an understatement to say a LOT of advancement has taken place. These inventions came in waves literally and figuratively with inventions which washed over our society and culture. They would forever transform how we experienced and recorded the world.  Sound recording became possible in the mid 19th century with the invention of the phonograph. This allowed one to hear a concert for the first time without having to be present, much like the first photographs allowed us to see accurate representations of people and places without having to be with them in person for the first time. In 1888, the turn of the 19th century saw the discovery of electromagnetic waves in science leading to the invention of radio to capture realistic sound, transmit it across great distances and reproduce the sound realistically. At the same time the invention of the moving picture and expansion of our understanding of waves through continued study of radio waves led to even more transformative inventions. The invention of television would eventually give rise to computers and our current digital technologies. The last 30 years alone have seen the great digitization of all aspects of our lives and more specifically has found its way into every aspect of the construction industry. 

In the last 20 years I have personally gone from learning to read blueprints and building plans on paper to learning to create plans digitally and on into 3D modeling. I have gone from hand layout using a tape measure and chalk line to using a surveyors total station for layout and more recently had robots perform layout for me on active job sites. In this same timeframe, the background advancements were slowly being made in artificial intelligence which is now lurking beneath the surface of our everyday experience..

Honing in on Advancements in reality reconstruction:

It’s here that I would like to dig deeper for the main topic of this piece, which is reality capture. All of that foreshadowing was in the service of telling the story of rapid change in the construction industry and how reality capture is fueling and poised to accelerate that change. 

For all of the talk of lagging in innovation there is a paradigm shift occurring in construction and the network effects of all of this underlying change have yet to catch up. I am convinced that the technological shift is occurring and using the lens of reality capture as a single technology, I will demonstrate that shift is happening and how it will accelerate and proliferate even faster in the coming years. 

As I stated I learned building layout as a carpenter's apprentice in the early 2000s and was fortunate enough to get onto a layout crew in high rise construction in Seattle. I was able to learn how to perform layout with a total station and was tasked on occasion to go out with a type of total station (a reflectorless TopCon for those interested) which could shoot a laser out and record the point in space where the laser hit an object. This was valuable to capture the as built location precisely for building elements such as column corners or the four corners of a wall. This was my first experience beyond photography and video of capturing reality. It was on a jobsite that I first learned the value of being able to accurately record and analyze the real world to compare against a building plan. The underlying concept of coordinate systems would forever be a part of my life from then after and Cartesian coordinates of those early experiences would grow to become the grounding truth to the voxels in my new volumetric understanding of the physical world.

The handful of points captured were transferred into the modeling software at that time and compared against the model. This way of capturing the as-built conditions was time consuming highly specialized and tedious. The need to calibrate the total station with each move to ensure the points were tied into the project's coordinate system along with the complication of the devices themselves were not inclusive or accessible and use was selective. 

The miniaturization and effects of time on cost and access to scanning:

Laser scanning graduated from the novelty of the large and expensive devices of the nineties and began shrinking and becoming more user friendly. Industrious and innovative construction companies began deploying for many more workflows and to great effect in preventing rework and verifying work in place. The cost continues to come down and the technology continues to improve to where a handheld scanning device can be used to to cover more ground quicker and not suffer from too much precision loss. Those advancements have led over the last few decades to a more informed construction site that used design models and reality analysis in the form of point clouds to give insights where previously they could not. 

It also enabled the beginnings of scanning to BIM which takes that point cloud and turns it into a working 3D model of that scan location. Very important in tenant improvements or where work is being done which ties into existing structure. This however is time consuming and resource heavy in its current state even with the advancements. A recent paper (5) which I will link to at the end of this article by Shun Hachisuka - Department of Civil and Environmental Engineering- Stanford University - Shows a novel process for using NeRF in the recreation of the scenes into a BIM model in a process of NeRF to BIM. A very exciting process to watch as Scanning to BIM has been slow to produce accessible reliable applications.

The promise of miniaturization comes even closer to the end user in the miniature LIDAR scanner Apple introduced in 2020 and the ability to capture 3D scenes has never been more in reach for the everyday person. 

Point clouds however for construction are not able to fully take advantage of miniature lidar in the phones and tablets as the first generation is not powerful or accurate enough to be a reliable tool just yet. Point clouds in general are held back by the density of the data and the complexity of the rendering required to make this more accessible along with the cost of the robust units which are required to do accurate precise analysis in comparison to 3D BIM models. 

Implications and possibilities of a technology which can remove the barrier of scanners altogether:

It's here that Neural radiance fields (NeRF) has the power to amplify the construction industries' reality capture workflows. The Ubiquity of cameras and the accessibility of NeRF which is iterating at a pace with advancements in robotics, drone technology & AI is in fertile ground to take root.. The proliferation of cameras in the pockets of field personnel, fixed stationary jobsite cameras and the inclusion of autonomous ground and aerial based drones for reality capture will completely revolutionize the capture, reconstruction and analysis capabilities on the future jobsite. As the algorithms for reality recreation are refined to allow for photorealistic loss less reconstructed models of reality this will unlock the full possibilities of digital twins and virtual exploration, analysis and collaboration.

Conclusion: 

If we can capture the reality of the jobsite in accurate 3D reproduction on a daily, hourly or continuous basis using just the phone in our pocket or the cameras already deployed in various means on projects, imagine the real time data feedback loops which project teams could receive and act upon! This ability is quickly approaching and Avvir is actively investigating how to make NeRF work for our clients once the technology is reliable and accurate enough for deviation detection and progress tracking. This will greatly increase the accessibility and reach of automated construction risk analysis and provide untold value to the companies which are intimidated by the complexity of laser scanning. It also has the power to amplify the existing workflows of those companies already laser scanning and using reality analysis tools. 

Once issues with NeRF like processing time, accuracy of larger areas/objects and more are solved, the proliferation of NeRF will help gain an accurate clear real time view of construction and help eliminate even more wasted time and resources so we can focus on solving problems vs finding them. We aspire to get to a place of preventing most issues altogether through proactive workflows which leverage reality capture and autonomous construction risk analysis. NeRF just may help us reach this potential sooner than previously thought possible. 

  1. From the Avvir Academy archives Are Photos Accurate Enough to Provide Meaningful Deviation Analysis?

  1. Video discussion with NerF Studio creator & Berkeley PHD on possibility of NeRF replacing Photogrammetry Will NeRFs Replace Photogrammetry? - Computer Vision Decoded Ep. 6

  1. Get started with Nerfstudio here: https://docs.nerf.studio/en/latest/

  1. “Sat-NeRF” article referenced by Bing-GPT- https://arxiv.org/abs/2203.08896

  1. Shun Hachisuka Department of Civil and Environmental Engineering Stanford University http://cs230.stanford.edu/projects_fall_2022/reports/31.pdf

Additional Reading

Great NeRF explanation article: https://medium.com/swlh/nerf-neural-radiance-fields-79531da37734

Men showing construction tool to group
3 Tips for using BIM as a Database throughout the Project Lifecycle

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat.