CoDAC: Compressive Depth Acquisition Camera
CoDAC is a new time-of-flight based range measurement system for acquiring depth maps of piecewise-planar scenes with high spatial resolution using a single, omnidirectional, time-resolved photodetector and no scanning components. In contrast with the 2D laser scanning used in LIDAR systems and low-resolution 2D sensor arrays used in TOF cameras, CoDAC demonstrates that it is possible to build a non-scanning range acquisition system with high spatial resolution using only a standard, commercially-available, low-cost photodetector and spatial light modulator.
Exploiting Sparsity in Time-of-Flight Range Acquisition Using a Single Time-Resolved Sensor
CoDAC: A Compressive Depth Acquisition Camera Framework
Compressive Depth Map Acquisition Using a Single Photon-Counting Detector: Parametric Signal Processing Meets Sparsity
CoDAC: Compressive Depth Acquisition using a Single Time-resolved Sensor
1. What is a depth map and why is it useful?
A depth map of a scene is similar to a digital photograph of a scene, except that each pixel value represents the distance between the imaging device and the corresponding scene point. Depth maps are usually displayed in grayscale, with white corresponding to closer scene points and black corresponding to farthest scene points.
Depth maps have a variety of applications. They provide useful 3D scene information that cannot be recovered from a digital photograph of the scene. One of the most popular current uses of depth maps is in gaming, where the Microsoft Kinect system has enabled real-time depth map capture and several interesting applications have been built around it.
Here's a Wikipedia link for more details.
2. What is the time-of-flight (TOF) principle?
Since light travels at a known speed, distances can be measured by the time elapsed from the emission of a pulse of light until the detection of a reflected return.
3. How does a depth camera work?
Several competing techniques are used to capture scene depth. Stereo cameras use a pair of traditional cameras to compute depth through stereo disparity. Light detection and ranging (LIDAR), or laser scanning, produces depth maps by raster scanning the scene using a laser and measuring the distance either using TOF or triangulation. TOF cameras avoid the need for raster scanning by having multiple sensors. They work at video frame rates and provide the high range resolution required for many applications. In a conventional TOF camera an array of pulsed LEDs transmit an omnidirectional pulse of light toward the scene of interest. The light returning from the scene is focused onto a 2D array of range-sensing pixels using optical focusing. By measuring the amount of time it takes for a transmitted pulse to arrive at each pixel, the TOF camera produces a depth map of the scene. The Microsoft Kinect system works on a completely different principle; it uses the distortion of a speckle pattern projected on the scene to recover depth.
4. How does the compressive depth acquisition camera work?
Our technique, CoDAC, is based on the TOF distance measurement principle. In contrast to a regular TOF camera, CoDAC uses a pulsed light source to illuminate a spatial light modulator. The spatial light modulator selectively illuminates the scene with a randomly-chosen checkerboard pattern by selectively blocking some of the light. All the light reflected from the scene is focused at a single, time-resolved photodetector. The photodetector produces an electrical signal, which is in turn sampled and stored. This process of illumination, integration and sampling is repeated for a small number of randomly-chosen binary patterns. Finally, the time samples collected through this acquisition are computationally processed using a parametric signal processing framework to reconstruct the scene depth map. (See the video above for an animation.)
5. How can CoDAC capture high spatial resolution using a single sensor?
The spatial resolution in the depth map is the product of spatially-patterned illumination and exploitation of the compressibility of the depth map of a typical scene. It is well known that photographs are compressible. Depth maps are generally even more compressible. Prior to this work, compressibility had not been used to reduce the cost and complexity of range acquisition.
6. What are the major assumptions in this work?
The modeling is based on assuming the scene is composed entirely of piecewise planar facets. This is a reasonable approximation for many scenes. Extending our method to curved objects is the subject of ongoing work.
7. What are the challenges in making CoDAC work?
There are several challenges to making CoDAC work. The most important challenge comes from the fact that measurements do not give linear combinations of scene depths. For this reason, standard compressed sensing techniques do not apply. The light signal measured at the photodetector is a superposition of the time-shifted and attenuated returns corresponding to the different points in the scene. However, extracting the quantities of interest (distances to various scene points) is difficult because the measured signal parameters nonlinearly encode the scene depths. Since we integrate all the reflected light from the scene, this nonlinearity worsens with the number of scene points that are simultaneously illuminated. Without a novel approach to interpreting and processing the measurements, little useful information can be extracted from the measurements. The superposition of scene returns at the single detector results in complete loss of spatial resolution.
8. What is a summary of the acquisition process and reconstruction algorithm?
In Step 1, we illuminate the scene with an omnidirectional light pulse and use parametric deconvolution to solve for the depth ranges present in the scene. In Step 2, we use the range information obtained from Step 1 along with the measurements obtained from patterned illumination to recover the spatial resolution of the scene. We accomplish this reconstruction by using a convex optimization algorithm that exploits sparsity of the Laplacian of the depth map of a typical scene.
9. What are limitations of this technique?
The main limitation of our framework is inapplicability to scenes with curvilinear objects, which would require extensions of the current mathematical model. Another limitation is that a periodic light source creates a wrap-around error as it does in other TOF devices. For scenes in which surfaces have high reflectance or texture variations, availability of a traditional 2D image prior to our data acquisition allows for improved depth map reconstruction as discussed in our paper.
10. What are advantages of this technique/device and how does it compare with existing TOF-based range sensing techniques?
In laser scanning, spatial resolution is limited by the scanning time. TOF cameras do not provide high spatial resolution because they rely on a low-resolution 2D pixel array of range-sensing pixels. CoDAC is a single-sensor, high spatial resolution depth camera which works by exploiting the sparsity of natural scene structure.
11. What is the range resolution and spatial resolution of the CoDAC system?
We have demonstrated sub-centimeter range resolution in our experiments. This is significantly better than fundamental limit of about 10 cm that would arise from using a detector with 0.7 nanosecond rise time if we were not using parametric signal modeling. The improvement in range resolution comes from the parametric modeling and deconvolution in our framework. We refer the reader to our publications for complete details and analysis.
12. How can you make CoDAC practical for use in mobile devices?
Our calculations and proof-of-concept experiments suggest that it is possible to implement our technique using commercial off-the-shelf hardware. We require a lightsource with megahertz modulation bandwidth, as is common for optical communications diodes. The spatial projection is easily implemented using a digital micromirror device; these are already commercially used in handheld pico-projectors. Our sensing comprises a single photodetector coupled with an ADC with megahertz sampling bandwidth; this is common in many commercial imaging systems. Also, all of the different CoDAC components are available in compact form factors suitable for handheld and mobile devices.
13. How much power does CoDAC consume?
The main source of power consumption in active TOF systems is optical illumination. Our method is well-suited to adapting the power consumption to the depth range of interest and the frame rate required for user interaction; this trade-off is discussed in our paper. Our technique identifies the depth ranges present in the scene before generating spatial resolution. If something is in close range, the optical output can be automatically lowered to save power. Applications that require high spatial and temporal resolution, like gesture tracking, are typically used at close range.
Devices like an iPhone already have bright light emitting diode (LED) flashes built in for photography. Some of these commercially-available LEDs can also be pulsed at megahertz rates, which are well within our desired specifications. LEDs are the best choice for an illumination source because they are very bright, pulse adequately fast, and consume low power. The fact that the LEDs are pulsed means that they are "on" for a very short duration during acquisition; to be precise, during a one-second acquisition time, the LEDs are on for 50 milliseconds. Our proof-of-concept experiments are not battery powered, but we are working on a prototype that uses LEDs and will demonstrate our low power claim.
14. Does CoDAC work under all illumination conditions?
Yes. CoDAC works under all lighting conditions because it uses its own light source for illumination. Moreover, since we use the entire time profile for processing, we are able to reject the low-frequency components; this makes us robust against ambient illumination including bright lights.
The team thanks Jeff Shapiro for invaluable technical guidance throughout the project. Thanks to Daniel Weller and the rest of the STIR group for their feedback on the presentation of the work. Thanks to Chris Schmandt and the Speech+Mobility group at the Media Lab for their encouragement and support.
This material is based upon work supported in part by the National Science Foundation under Grant No. 0643836 and by the DARPA InPho program through the US Army Research Office award W911-NF-10-1-0404. This work was also supported in part by NSF Major Research Instrumentation Program, Grant PHY-0959057, under the American Recovery and Reinvestment Act of 2009. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
|Send inquiries to Professor Vivek Goyal at firstname.lastname@example.org or call +1.617.324.0367.|
© 2012 Massachusetts Institute of Technology