When we usually come across depth sensing cameras it is often said that we detect depth by using infrared cameras and topic is closed. In this blog post I would like to go a step further and describe in more detail about how depth sensing actually works down to the roots. So sit tight and I will go deep into it. This applies to all depth sensing technologies like kinect, project tango , prime sense and others. As an example I will consider kinect.
Now kinect has 3 main parts , an infrared projector, a RGB camera and a Infrared camera. So, for now lets not consider the RGB camera anymore and concentrate on the next 2 parts , here is what happens when you turn on kinect to capture a depth image :
Step 1 : Power goes to the Infrared projector, this projector shoots an irregular pattern of dots with the wavelength of 700nm to 1mm and hence infrared is invisible to humans. This pattern is projected into the room. Kinect has a diffractive optical element which means when the infrared light is generated at the projector , this light is diffracted to form the irregular pattern of dots. To visualize these dots use a night vision camera and you can see the amazing pattern of dots formed on your subject.
Step 2 : So, now we have a pattern on our room, then what happens is we have an IR camera that is this camera is capable of recording the infrared light .Now unlike the RGB cameras we see everywhere, infrared cameras are CMOS sensors where they can detect the infrared light bounced off from our subjects. So, we can capture the intensities of the infrared light using this camera. This image is kind of a black and white image. so this camera has captured the image from its viewpoint and it consists of infrared data, which is exact data we projected.
Step 3: In this most crucial part where depth is calculated for every pixel in the scene. How we do it is, we take the IR camera image that we recorded and we take the irregular pattern of dots taht we projected initially. Then if you compare the images there will be an offset or a disparity between the projected pattern and the recorded image because of the separation in cameras and because of the distance of the objects. Then we use triangulation methods to calculate the final distance of the object. I will go more deep into how the triangulation works soon. For now this is the patent kinect is using in order to calculate the triangulation.