Pioneered by researchers at Uc Berkeley, UC San Diego and Google, NeRFs first popped up on our radar in March 2022 when nVidia revealed its super-quick “Instant NeRF” demo at the GTC Developer Conference.
Effectively, you walk around a scene taking photos or video with a smartphone, then upload the results to a service where a neural network uses a variety of AI techniques to take a sequence of real-world images and use it to build a 3D model of your subject and its environment.
You can then take this model and do all sorts of things with it, including making wild flying camera shots, or creating photorealistic 3D assets and environments for use in video games, VR experiences or a range of other uses.
The resulting videos can be absolutely bonkers, as exemplified by this extraordinary physics-bending Hilux video, created (and cleverly edited) by Japan’s Arata Fukoe. This, remember, is a one-man production.
Last month, Luma Labs crashed the price of this kind of service down to US$1 a scene, through an API that takes “video walkthroughs of objects or scenes, looking outside in, from 2-3 levels,” and outputs “an interactive 3D scene that can be embedded directly, coarse textured models to build interactions on in traditional 3D pipelines, and pre-rendered 360 images and videos.”
Creator Karen X Cheng used the Luma Labs AI to create the monster dolly zoom effect in the video below. Click through to the Twitter thread to see how it was captured and post-edited.
Others are pairing 3D NeRF captures with separate generative AI services to take what might be a rather drab looking photorealistic scene capture and fancy it up with more impressive looking textures. Creator Bilawal Sidhu refers to this as “reskinning the real world,” and while the results might sometimes be a little janky at this point, we have no doubt it’ll be seamlessly integrated before long into a system that’ll let you edit and add to a NeRF scene using natural language prompts.
🖼️ Ok so “reskinning” the Real World with 3D Capture + Generative AI continues to be a blast ✨
🌐 Reality capture techniques like photogrammetry and NeRFs allow you to capture the spaces, places and objects you care about — creating a growing library of assets you can pull on… pic.twitter.com/IFctTlLLfo
— Bilawal Sidhu (@bilawalsidhu) April 23, 2023
Meanwhile, Google’s Zip-NeRF project is still at the research stage, but recent progress has got it working some 22 times faster than the previous mip-NeRF model and producing between 8-76% fewer errors. The results are looking absolutely spectacular, and will have real estate agents drooling.
UC Berkeley, for its part, has combined a NeRF modeling network with a language model, to create “Language Embedded Radiance Field” model, or LERF, that allows you to search for particular items in a 3D scene using natural language.
LeRF Teaser, Language Embedded Radiance Fields
Eventually, the LERF team hopes to develop this tech into something that’ll help robots use machine vision and AI to complete tasks. A robot verbally tasked with cleaning up some spilled coffee grounds, for example, might go into your kitchen, and use this technology, combined with a GPT-style prompt generation routine, to identify and locate a range of task-relevant items in the room, from dustpans and brushes, to trash cans, sinks, cabinets, drawers, cleaning sprays, vacuum cleaners and more.
Another project sees a team of Japanese developers working on a way to render NeRF-generated 3D scenes in real time in Unreal Engine. The team says it’s already achieving more than 60 fps on an nVidia RTX3070 graphics processor, with minimal memory consumption. Effectively, this is an early preview of a nascent ability to flash your smartphone around a given environment and turn it into a video game level.
NeRF real-time rendering on Unreal Engine
And if all the above wasn’t enough to bake your noodle golden brown, then take a look at the HOSNeRF project out of Singapore National University. We can’t begin to understand how, but this team has managed to develop a NeRF system that can not only generate 3D models of people and entire environments, but can capture and re-render dynamic action as well.
So you can film somebody walking around and doing something, picking up objects and putting them down as they please, and HOSNeRF will spit it back out as a photorealistic 3D action scene that you can view from any angle.
HOSNeRF: Dynamic Human-Object-Scene Neural Radiance Fields from a Single Video
Good grief. Once this makes it through to a service, you’ll be able to reconstruct real-world action scenes as dynamic models you can walk through in VR, complete with moving 3D models of people and objects that you’ll potentially be able to interact with using video game physics engines. The mind boggles.
It seems like every obscure corner of the AI world is experiencing a truly shocking rate of progress in recent months. As all these wildly innovative technologies begin to converge, it’s harder than ever to picture what life will be like in five or 10 years.
Source: Luma Labs, among others