This blog post first presents a high-level overview of SLAM, as it is essential for our interested readers to start with some basic background and connect the dots as we go.
What is SLAM and how it’s leveraged at Sama to boost efficiency and 3D annotation workflows.
As the autonomous driving industry continuously evolves and 3D computer vision undergoes rapid development, the need for accurate and efficiently-annotated 3D data is on the rise. A key technology we have applied at Sama to this end is Simultaneous Localization and Mapping (SLAM).This blog post first presents a high-level overview of SLAM, as it is essential for our interested readers to start with some basic background and connect the dots as we go.Then, more importantly, we are going to focus on making the connection between the powerful SLAM technology and the 3D annotation challenges we face at Sama. As you will see, we are going to dive deep into a key technique of SLAM: point cloud frame matching. When executed well, this technique enables the alignment and aggregation of point cloud maps where objects such as cars, pedestrians and trees are clearly defined and easily visualized by our expert annotators.Finally, we are eager to share some compelling results from our internal A/B testing experiments, where we implemented cutting-edge SLAM technology to boost the efficiency of our 3D annotation platform.It is worth noting that while SLAM covers a broad scope, in the context of this blogpost, we are specifically interested in the frame matching part as it is the most relevant for Sama’s annotation use-cases.Now, let’s start the journey.
Imagine a world where robots can build an accurate map of their surroundings while simultaneously locating themselves within that map, how amazing is that? Well, actually this illustrates exactly the process of SLAM. SLAM actually answers two key questions in the field of robotics: given the robot’s pose, i.e., its position and orientation, estimate the map of the environment. Then given the map, estimate the robot’s pose. Hence, people often approach SLAM as a classical “chicken and egg” problem: how to address these questions when neither the map nor the pose is initially given. Well, lots of SLAM solutions resolve this problem by using algorithms that solve those two unknowns at the same time and converge to an optimal solution in an iterative fashion.SLAM finds its usefulness in a wide range of applications, such as virtual reality headsets, robotic vacuum cleaners and autonomous driving. It enables safer and more intelligent navigation in new environments and generation of maps for secondary tasks, such as room layouts or road measurements.
Traditionally, the common approach to solve the SLAM challenge has been through 2D visual SLAM: using cameras to capture color data from the environment through RGB images. As the landscape of 3D lidar is evolving, 3D SLAM is becoming more and more prevalent, especially for outdoor navigation. Compared to cameras, lidars can produce highly accurate measurements and capture richer data such as intensity, angle, and distance information. Nowadays, most 3D outdoor algorithms can be more reliable and outperform their 2D counterparts, especially when facing challenging outdoor illumination conditions.
When robotics was in its infancy, statistical methods played a central role in allowing mobile robots to estimate their location and map their environment. In particular, Bayesian-based techniques such as Kalman Filter and its variants are the traditional paradigms for SLAM. As time went by, SLAM evolved from statistics towards more feature-based optimization approaches. As a result, modern SLAM pipelines are usually composed of multiple components. One of the core components is the front-end odometry that performs frame matching and robot localization estimation. Meanwhile, there are usually other components such as loop closure and bundle adjustment that perform finer-grained refinements in an offline fashion.
At Sama, we have identified a specific area of SLAM that aligns with our objectives in accurate and efficient 3D data annotation: lidar odometry, more specifically, lidar frame-to-frame matching. One essential method that serves as the foundation for many lidar odometry algorithms is the Iterative Closest Point (ICP) algorithm.At its core, ICP consists of two steps. The initial step involves finding corresponding points between two point clouds. The second step computes the transformation that minimizes the sum of distances between all correspondence pairs. This two-step process repeats iteratively until a convergence criteria is met. A sample iterative convergence is shown in Figure 1.
Figure 1: Visualization ICP convergence iterations (Left: initial, Right: converged). Once convergence is achieved the two point clouds align and their 3D features are co-located. Image source: https://www.cs.cmu.edu/~halismai/cicp/.Among the various approaches that have been created based on the concept of ICP, Lidar Odometry and Mapping (LOAM) stands out as a particularly innovative and impactful work. LOAM and its derivatives, e.g. V-LOAM and FLOAM, quickly became the leading technologies that deliver the top performance in the KITTI odometry benchmark.While LOAM-based algorithms have proven to be powerful, they often necessitate manual customization and parameter adjustment to adapt to specific data and sensor types. Without these customization procedures, valuable data can be filtered out as noise and downstream performance can suffer.
SLAM and related techniques, such as frame-to-frame matching and ICP, can be applied at Sama to reduce annotation effort, boost accuracy, and increase efficiency. To start, we use the sequence of lidar scans provided by the client, assumed to be in the lidar coordinate frame (i.e., the (0, 0, 0) coordinate in each point cloud is the center position of the lidar sensor). Then, we apply a custom ICP algorithm that outputs the transformation of each point cloud such that static objects are aligned across all scans in the sequence (see Figure 2). This results in static objects requiring only a single label across the entire sequence, as well dynamic objects having more accurate interpolation across frames.
FIgure 2: Left: sample bird’s-eye-view of 10 naively aggregated scans from a KITTI sequence. The parked vehicles have smeared points due to inaccurate sensor poses. Right: the same scans aggregated after obtaining refined pose estimates from Sama’s ICP scan alignment.Further, accurately aligned scans enable the construction of more informative, aggregated point cloud maps, as shown in Figure 3. The environmental context is particularly helpful for our associates to quickly and confidently identify all objects in the scene and boost cuboid and segmentation labeling efficiency.
Figure 3: Sample aggregate point cloud. Image from https://isprs-archives.copernicus.org/articles/XLIII-B1-2020/515/2020/isprs-archives-XLIII-B1-2020-515-2020.pdf.Sama’s custom ICP algorithm is robust to diverse datasets and requires minimal custom parameter adjustments, making it easy to deploy and adapt to our various 3D annotation workflows. Further, is it able to utilize Inertial Measurement Unit (IMU) pose estimates provided by clients, if they are provided. These can be input to the algorithm as initial estimates of transformations that will align scan points for static objects and tend to result in more accurately aligned aggregated scans.
Through multiple experiments on sequences of real-world lidar, we have found 5-25% decreases in labeling time when using our ICP scan alignment framework described above. It significantly increased cuboid labeling accuracy, particularly in the cases of static objects. Specifically, we found:
Now, are you curious about what our associates had to say about their experience working with the aligned lidar scans? There is a consensus in their feedback: they all feel that our ICP-based scan-alignment framework has improved their annotation experience. In their own words:
Our experiments show that quantitatively, our alignment framework can bring improvements in terms of efficiency and accuracy. At the same time, the qualitative feedback that we received further confirms its positive impact. Together, those results demonstrate the potential of Sama’s scan-alignment technology for lidar-based 3D annotation workflows.
We hope to have provided you with a high-level overview of SLAM, as well as its applications for boosting accuracy and efficiency in 3D lidar data labeling. Experimental results on real vehicle lidar data demonstrate the efficacy of this approach, both quantitatively, with a decrease in labeling time, and qualitatively, with praise from Sama annotation associates. We look forward to more applications of SLAM and invite you to contact our sales team to discuss potential use cases in your data labeling lifecycle.