About Stereo Camera (Stereo Vision)

In ADAS (Advanced Driving Support System), a stereo camera (stereo vision) is a device used for driving support such as for automatic braking and white line recognition, using the distance between vehicles in front and acquired images.

In recent years it has been installed in mass-produced vehicles and has come to play a sensor function for directly controlling cars. This page explains the role, structure, principle, calculation algorithm, image processing software (OpenCV), etc., of the device concerning the stereo camera. In addition, we will also introduce products related to stereo camera units that can be used for research and development provided by ZMP.

The following is the item of this page. Please use it when referring to the page.

1. What is a stereo camera

A stereo camera is a camera that simultaneously photographs an object from a plurality of different directions using two cameras (two-eyed camera) in the same manner as the principle that a person views an object. It is a camera that can measure information in the depth direction from the position information of the pixel of the camera.
Stereo cameras are also sometimes called as stereo vision, stereo camera system, stereo camera unit, stereo digital camera and so on.

Fig.1 Stereo camera RoboVision2s

2. Features

A stereo camera is a type of distance sensor aligned with infrared or millimeter wave radar and is a stereo digital camera applying the principle of triangulation when people see things.

Therefore, as a characteristic of the stereo camera, it is the same as the human eye, and the image photographed by two cameras (stereo system) is viewed stereoscopically including the depth.
As its characteristic,

1. A stereo camera can detect objects of any distance
→ If the calculation algorithm and the object can be matched, the distance to the object can be measured, and recognition processing can be done by utilizing the information of the image.
2. Stereo camera has high distance accuracy of near object
→ There is a characteristic that the measurement accuracy of the neighboring object is higher than the calculation result of the depth information described later.
3. Absolute distance from the camera can be acquired by 3 dimensional measurement
→ Since distance measurement (distance measurement) using two cameras is performed, it can measure with absolute distance, such as a few meters from the camera.
4. A detection range can be expanded by combining a plurality of cameras
→ You can use the multiple cameras and the distance of the entire surroundings or Image Recognition, you can reduce the dead angle of the moving body.
 5. Robustness to environmental change such as nighttime, snow, heavy rain
→ Output stable range finding results even in rain and snow, night traveling by image processing.

There are features such as:

3. Use

In the field of automobiles (cars),stereo cameras are mounted on vehicles and used as in-vehicle cameras as sensors of advanced driving support systems (as a typical example, "advanced driving support systems (automobile) ADAS) sensor "), but in other industries it is used as a sensor to recognize people and obstacles around the construction machine and as a sensor (stationary sensor) such as road surface inspection of roads.

In addition, there are cases where the stereo camera system is used for the purpose of security recognition and security, and event analysis, flow analysis of visitors in stores, because it can recognize surrounding environment of industrial robot, even with overlapping people.

In addition, in industry, stereo cameras are applied (mounted) to products of a wide range of Japanese manufacturers (companies) as an alternative means of object recognition viewed by the eyes.

In addition, ZMP Autonomous Driving development vehicle RoboCar also carries obstacle recognition by stereo image and follow-up function, and we are conducting technical research and development of sensor fusion.

4. Principle​ ​

In order to realize measurement with a stereo camera, it is necessary to calculate the distance to the detected object.
Here, we will explain the distance measurement method in the stereo camera.

To calculate the distance to the object with a stereo camera, we calculate it using the principle of triangulation as shown in the figure.
It is necessary to obtain the difference (parallax) of the imaging position when the same object is imaged with two cameras.
Here, in order to calculate the parallax, it is necessary to extract pixels that picked up the same object with the two cameras, and this is called stereo matching.

4-1. What is stereo matching?

Stereo matching is a method of estimating disparity by performing matching on each part of two images.
Disparity represents the difference between the positions of the corresponding parts between the two images. If parallax of each part on the image can be estimated by performing stereo matching, the distance can be calculated based on the principle of triangulation.

4-2. What is triangulation

Triangulation is a method of measuring the distance to a remote point using the principle of triangle is called triangulation.

Specifically, it is a triangulation method and a surveying method using geometry that determines the position of a point by measuring the angle from a known point at both ends of a certain base line to a point to be measured.If you know the exact distance between two points, the distance to a certain point away from the two points, once you know the angle to the two points then distance is sought by the property that "A triangle is determined if one side of a triangle and its both end angles are known".

Fig. 2 Triangulation calculation image
In stereo cameras, the distance to the target object can be calculated from the following formula because the difference between the length of the base line length, the focal length, and the imaging position is known.

5. Distance Measurement

One method of measuring the distance of a stereo camera is to introduce here a method called block matching as one method for measuring the amount of deviation between left and right images.
This is a method of paying attention to a certain point in one image (left image), searching for a position with the most correlation with the block from the other image, with a rectangular block of several pixels around it as a block.
Here, assuming that the left and right images are not distorted, there is no deviation in the vertical direction, the deviation of the optical axis, etc. and there is truly deviation of the translational movement of only the left and right, the same object is moved to the same Y coordinate.Since it should appear, the object to be retrieved for the correlation of the block can be shifted only in the X direction at the same Y, and it becomes the calculation method with the position (deviation) which is the highest correlation in this examination as the disparity value.
Figure 3. Image of block matching

5-1. On calculation algorithms

In addition, since the distance measurement of the stereo camera is divided into several calculation algorithms, the calculation algorithm is briefly introduced below.
1. Preprocessing: Distortion correction (calibration), normalization of luminance values ​​of images, etc.
2. Parallelization: Image conversion for efficiency of matching
3. Matching: Estimating disparity by matching
4. Triangulation: Convert disparity map to distance from geometric arrangement of camera

Calculate the distance by measuring the parallax by the above steps.

5-1-1 About distortion correction (calibration)

Correct camera distortion as preprocessing of image processing. Since the lens of the camera is bent because it has distortion, it mathematically removes radial distortion and circumferential distortion of the lens.
The image before distortion correction (left) and the image after distortion correction (right)

5-1-2 About rectification processing

In the parallelization process, the accuracy and distance between the cameras are adjusted so that the corresponding points of the two measured images have the same row coordinates.
Make sure that the two image planes are on the same plane and the lines of the image are exactly aligned.

This process is an essential task for improving the processing efficiency of stereo vision (stereo camera), because matching process is the process of searching for the same image from one dimensional problem from two dimensional search problem. Parallelization of stereo images is often used as a preprocessing procedure for disparity calculation and anaglyph image creation.
Image before collimation processing
Image after equilibration processing

About 5-1-3 Matching

It is a method of estimating disparity by performing matching on each part of two images. Here, parallax is the difference between the positions of the corresponding parts between the two images.

If parallax of each part on the image can be estimated by performing stereo matching, the distance can be calculated based on the principle of triangulation. In the matching, a stereo corresponding point search (searching for the same point in two different camera images) is performed. There are various matching algorithms for searching corresponding points.

In OpenCV of computer vision programming library described below, a fast and effective block matching stereo algorithm is implemented, and as a matching image, a window is set on the same plane image and the sum of differences among them (SAD: Sum of Absolute Difference)) that minimizes the difference between the absolute values ​​of the differences.

Describing the processing image of the matching algorithm,
There are three steps in the above block matching stereo corresponding point search algorithm.

1. Prefilter to normalize the brightness of the image and emphasize the texture.
2. Use the SAD window to search for corresponding points along horizontal epipolar lines.
3. Postfiltering to eliminate defective corresponding points.

In the prefiltering phase, we normalize the input image to emphasize the brightness and texture of the image in order to make the matching efficient.

The next corresponding point is searched by sliding the SAD window (the range of disparity search from the reference pixel). For each feature in the left camera image, look for the one that best matches from the corresponding line in the right camera image.

When collimating, each line becomes an epipolar line, so it can be assumed that the matching place in the image of the right camera is in the same row (same y coordinate) in the left camera image. Also, since the stereo camera is mounted in parallel, if the parallax is zero, it becomes the same point (x0), and if the parallax is larger than that it will be on the left side of the image. (See the figure below)

For posterior filtering, we perform processing to delete abnormal values ​​and defective corresponding points, such as by checking whether parallax values ​​match by seeing the values ​​of the left and right parallax.
Image image search image
5-1-3-1 What is epipolar wire?
The epipolar line that came out frequently above is the line that connects geometric points related to stereo vision, which photographs three-dimensional space with two cameras, epipolar geometry.

Epipolar geometry is a geometry that helps to restore 3D depth information from images viewed from two different positions and to find correspondences between images.
On the diagram Epopyla geometry
As a premise to explain epipolar geometry,

It is assumed that the point P existing in the three-dimensional space is projected (perspective projection) on the projection plane (Left view, Right view) of the two cameras.
· Ol and Or are the projection centers of the two cameras.
· Points pl and pr are projections of points P on each projection plane.

We will proceed with the explanation.

■ epipol
Since the two cameras are in different positions, if one camera can see the other camera, they are projected to el and er, respectively. This is called epipole or epipolar point.
· El, er and Ol, Or have the feature of riding on the same straight line in three dimensional space.


■ Epipolar Lines
An epipolar line is a line that can be written on the projection plane, and it is a line from the projection point to the corresponding epipole.
The straight line Ol - P is projected on one point in the camera on the left, and when a point pl present on the projection plane, a straight line Or - P in the right camera and pr on the projection plane of the right camera, The camera 's linear er - pr is called epipolar line. (Line el-pl is epipolar line for left camera.)
This epipolar line is uniquely determined by the three-dimensional spatial position of the point P, and all the epipolar lines pass through the epipolar point (el, er in the figure).
· Conversely, the straight line passing through the epipolar point has all the characteristic that it becomes an epipolar line.

■ Epipolar Surface
· A plane passing through three points P, Ol, Or is called an epipolar plane.
· The line of intersection between the epipolar plane and the projection plane coincides with the epipolar line. (There are epipolar points on the epipolar line.)

■ Epipolar Constraints
When the positional relationship between the two cameras is known, the following can be said.
. Given the projection pl with the left camera at point P, the epipolar line er - pr of the right camera is defined. And the projection pr of point P with the right camera will be somewhere on this epipolar line. This is called an epipolar constraint.
· In other words, if you assume that the same point is captured by two cameras, it should be on the epipolar line of each other.
· Therefore, to solve the problem of where the point seen by one camera is reflected in the other camera, it is enough to investigate on the epipolar line, which leads to a considerable amount of calculation saving.
· If the correspondence is correct and the positions of pl and pr are known, it is possible to determine the position of the point P in three dimensional space.
5-1-4 About Triangulation
If you know the geometric placement of the camera, convert the disparity map to distance with triangulation principle. Describing with the formula, the depth: Z can be expressed by the following formula, but as an image at the time of calculation, it is explained using the figure below.
As shown in the figure, it is assumed that there are stereo cameras corrected for distortion correction and parallelization processing. It consists of two cameras.At that time, the image plane is exactly on the same plane and has an exactly parallel optical axis (the optical axis is also called a principal ray in a ray passing from the projection center O to the principal point c) and the same focal length f.
Here we assume that the principal points Cxleft and Cxright are calibrated and have the same pixel coordinates in the left and right images, respectively. Furthermore, assuming that the rows of these images are present and all the rows of pixels in one camera are exactly aligned with the corresponding rows of another camera, the point P in the real world Let's assume that it exists in the left and right image view and has xl and xr as its horizontal coordinates.

At this time, the parallax is defined as d = xl - xr, and the depth Z can be derived by triangulation principle.
From
To

From the above equation, since the depth is inversely proportional to parallax, when the parallax is close to 0 (in the case of a distant object), the depth changes greatly, and in the case where the parallax is large (in the case of a nearby object) ,even if the parallax is slightly different, there is a feature that the influence on the depth becomes small. For this reason, the stereo camera system exerts high resolution especially for objects relatively close to the camera.

5-2. Calculation processing unit

In image processing, NVIDIA's GPU (Graphics Processing Unit) famous for image processing computer, CPU (Central Processing Unit) such as Intel Core i7 and i5, etc. are utilized in PC (personal computer) based processing.

Since GPU is specialized for image processing and development, processing can be performed more efficiently than CPU. In addition, there are cases where image processing is performed using FPGA (Field Programmable Gate Array) when limiting the purpose. FPGA is famous for Altera company acquired by Intel.

The FPGA is an LSI that can design a circuit configuration by programming, and since it constitutes a circuit by programming, it is possible to change the circuit configuration inside the chip, and it is possible to change the program.There are cases where it can be made and used in terms of cost, development period, etc.

5-3. About image processing programming (Open CV)

There are various kinds of image processing programming, but OpenCV (official name: Open Source Computer Vision Library) is an open source computer vision library. The library is written in C and C ++, and various functions required for processing images and movies on a computer are implemented, and since it is distributed under the BSD license, it can be used not only for academic purposes but also for commercial purpose.
In addition, because it is multi-platform compatible, it is characterized by being used in a wide range of scenes.

Characteristics of OpenCV are C / C ++, Java, Python, MATLAB libraries that can be used in various environments and have functions such as image processing / image analysis and machine learning. As platforms, there are features supporting all POSIX-compliant Unix-like OS such as macOS and FreeBSD, Linux, Windows, Android, iOS, etc.

Application of general-purpose image processing programming environment makes it easy to apply to stereo vision system and development of built-in system of stereo vision algorithm, which will shorten development time and reduce price of developed product.

6. Distance accuracy

The accuracy of the distance measurement by the stereo camera is influenced by several factors:

6-1. Camera mounting position

In order to measure accurate images, the relationship between the baseline lengths (distances between cameras) of the two cameras and the mounting position is important. The position of the camera can be restrained mechanically using jigs and the like, and it can be suppressed by software by the method of securing the mounting position and the calibration work after installation.The mount position of the camera is influenced by temperature, vibration, etc., and deviation occurs with the lapse of time, there is a possibility that the accuracy of the disparity information of the photographed image and the accuracy of the distance information may deteriorate.There is countermeasure to maintain this accuracy by automatic calibration.

6-2. Lens distortion

The lens of the camera looks like a drawn beautiful curved surface, but it is manufactured with variations of each. This may cause distortion of the image, and the parallax may not be calculated without matching. It is common to counteract distortion of images by software correction so that stereo matching processing can accurately measure the distance from the camera to the object.

6-3. Resolution of the lens

The resolution of a lens is a word that means how much information can be drawn on a unit area. For example, in the case of a subject whose horizontal lines are drawn parallel to a white background like a border, it can be said that the line is resolved as long as the line and line are clearly distinguishable from each other, and the line is blurred and it can not be determined that the line is drawn, the situation is that the density of the line and the line exceeds the resolving power of the lens, and the accuracy of the matching of the right and left images by the blurring of the lens will decrease and the accuracy of distance measurement will decrease (measurement error will increase).

6-4. Resolution of sensor

The resolution capability of the image pickup element also becomes the same concept as the lens, and it is important how much the image measured on the unit area of ​​the picked-up image sensor can be converted into data in finer units. Since the element is an aggregate of pixels, the pixel density of the unit area becomes an index of the resolution performance of the sensor, depending on the load of image processing and the precision to be obtained, it is desirable that the element density is higher and the number of pixel is larger.

7. Image processing

Automotive stereo camera RoboVision 2s software features include WDR (Wide Dynamic Range) function that can clearly shoot images even in light and dark situations by combining images of different brightness and other image processing of stereo cameras.Algorithms working as ZMP have algorithms called object detection and Virtual Tilt Stereo.

7-1. Wide Dynamic Range (WDR) function

WDR (Wide Dynamic Range) is a function that produces images with moderate brightness by processing dark images and bright images to brighten dark areas and darken bright areas.
For example, in a case where a bright outdoor such as an exit of a tunnel is photographed from inside a dark tunnel, a problem arises in which a dark portion is focused on a screen where a light portion and a dark portion are mixed, and a bright portion is blanked so that an image can not be confirmed.
When shooting with a camera equipped with a wide dynamic range, you can record vivid images in both bright and dark areas by combining the light and dark areas separately after recording.

7-2. Object detection algorithm

In the algorithm of object detection, the ground surface (road surface height) in the measurement image is calculated by specifying the attitude of attachment of the camera, and the point group (matching point) gathers together with the height of the road surface.It is an algorithm that judges that there is an object in the area, and outputs the width, height and position from the camera of the object. Detection of objects with depth information taken can be performed, so even when a pedestrian and a bicycle overlap, it can be detected as multiple objects. Currently we are also offering software that can improve algorithms and detect at 30 fps.
Object detection software screen

7-3. Virtual Tilt Stereo algorithm

Virtual Tilt Stereo is an algorithm developed to detect vehicles and obstacles on the road with stereo cameras in Autonomous Driving and development of ADAS.
For that purpose, it is necessary to detect the surface of the road and improve the detection method of the conventional stereo camera. The reason why accurate road surface detection is impossible is that the camera used usually on the vehicle is facing forward and the optical axis and the road surface are close to parallel so there is a problem that high precision disparity can not be obtained.

With the Virtual Tilt Srereo algorithm, by rotating the image by turning it around the principal point (node) of the camera using the panorama image synthesis technique, and changing the optical axis downward, from the top It is a method to improve road surface measurement accuracy by measuring the road surface.

As a result, the accuracy of the roughness detection on the road surface is improved, and it is now possible to detect objects that have a height with respect to the road surface such as curbs on the road shoulder and ahead of the vehicle.

8. Related products

In ZMP, we sell RoboVision 2s stereo camera that can be used for development as in-vehicle camera by using dedicated mounting jig (car mount, adapter) and optional products utilizing the above algorithm. In addition, we introduce RoboVision 3 (quad camera) with two stereo cameras as the latest stereo camera model.

8.1 RoboVision 2s

ZMP's RoboVision 2 series has a baseline length of 220 mm, a CMOS sensor manufactured by Sony as an image sensor, an interface compatible with USB 3.0 connection, and a general-purpose stereo camera for development that can easily measure images. From image acquisition board to software design and manufacture. Software for PC and stereo image viewer are also included and evaluation is possible immediately. In addition, SDK is also prepared for development, and it is possible to develop applications and software according to what users want to do.
By connecting the USB to the PC equipped with the standard application, activating the software, you can specify the measurement frame rate and check the image (angle of view) like an electronic viewfinder and measure it. In RoboVision 2s, we have reduced the influence of moire and false colors by adding optical low pass filter from conventional RoboVision 2.
Stereo camera unit
RoboVision®2s
As a stereo camera unit and development kit that can be easily introduced into research and development, it is easy to perform parallax and distance measurement by SDK and API.
RoboVision®2s
SSD package
Package capable of continuous image measurement for about 4 hours with 2 SS SSD set at 30 fps (FP)

8.2 RoboVision2s object detection package

RoboVision 2s object detection package
Supports software (application) capable of displaying the size and position coordinates of the detected object in real time

8.3 RoboVision 2s CarTrack package

RoboVision®2s CarTrack package​ ​
Tracking nearby objects/Detecting relative speed & Software capable of CAN output

8.4 Stereo camera unit for Autonomous Driving RoboVision 3

Stereo camera unit RoboVision 3 is the newest image sensor Quad camera which can measure high quality images with Sony IMX390. It is also possible to offer it in a form fitted with a body (housing) according to your request.Check the product page for details.
The latest stereo camera system
RoboVision®3
High resolution stereo camera capable of sensing up to 150 m, horizontal 100 ° distance and field of view

8.5 360 ° global surrounding image measurement (Surround Robo Vision)

Surround RoboVision is a new stereo camera unit that can be mounted on a vehicle using the dedicated mount, upon request and can measure the entire circumference. For details of the function, check the product page below.
Surround RoboVision®
As a peripheral monitoring sensor for Autonomous Driving, it is possible to photograph 360 ° all around with high sensitivity

For inquiries

To TOP