Scalable Data Processing for Connected Cars

Automakers are facing unprecedented challenges in developing self-driving and connected vehicles. Replacing the decision-making of a human driver requires the intake and categorisation of an immense amount of data. For this reason, automakers need scalable solutions for collecting and processing all this data.

Data Processing At The Vehicle Level

To truly understand how much data is created in a connected car at any given time, we can look at a breakdown from Stephan Heinrich of Lucid Motors in a presentation given in 2017. Within, he broke down the amount of data that each sensor type (radar, LIDAR, cameras, ultrasonics) produces per second. Since each vehicle can have a varying number of sensors, the range in the total amount of data is quite large, spanning from 1-19 TB of data every hour.Over the years, many other autonomous driving providers have cited figures in this range.

This means that, at the very least, each car in a fleet processes 1000 GB of data every hour. In England, the average driver spends 597 hours driving annually. That means each car would collect between 597 and 11,343 TB of data. This is an absurd amount of data to try and handle.

How It’s Handled

If you look at any artificial intelligence currently available to the public, it all functions in a similar manner when it comes to collecting data and responding. When prompted with a task or question, the device sends the query to a centralised cloud computing platform that handles the actual task and then sends it back to the device to execute the task or answer the question. Running your own personal ChatGPT requires locally stored training data and beefy hardware.

This type of setup is simply not possible for self-driving systems. The time it would take to send GBs of data every second and come back with a response would be significantly slower than the average reaction time of the human driver it’s intended to replace.

As such, automakers have turned to using edge computing to deal with this data workload. It’s a simple but powerful concept: performing most of the computations in a vehicle. However, this requires that each vehicle on the road carry its own powerful processing unit. Being computationally proficient has never been a priority for cars for most of their history, but now they have to rival the power of high-end computers.

Whether automakers design their computation units in-house or purchase units from established semiconductor companies like Nvidia, the increase in processing power has been rapid. In the middle of the 2010s, most units could reach 2.5-8 TOPS (trillions of operations per second). This power was good enough for low-level autonomy (L1, L2) but nowhere near powerful enough for L3 and beyond. Now we have units capable of reaching several hundred TOPS and Nvidia is getting set to release a unit capable of 2000 TOPS, designed for use in L5 autonomy.

Data Processing At The System Level

With this setup, connected cars should be able to perform many of their tasks without needing to rely on instruction from an external server. However, automakers still have the tall task of parsing all the data they collect to train the models that each vehicle will use.

While edge computing allows automakers to sidestep the issue of transferring massive amounts of data in milliseconds, this data is still vital for training their systems. Each day, a fleet of vehicles can produce hundreds to thousands of TB of data that needs to be collected and processed.

The general process of collecting and processing the data is as follows.

  • Collection. We’ve already touched on this: data is sent from all the relevant sensors so that analysts can have a full view of the car’s location, speed, what it sees, how far away everything is, etc. While a lot of data analysis would quickly look to cut out anything extra when dealing with a self-driving car, everything is important; no one wants their system to become faulty and cause an accident just because someone decided specific data could be ignored.
  • Preprocessing. Raw data is preprocessed to facilitate annotation and compatibility with the training models. Data from all the sensors is synchronised, and sensor data is transferred into easier-to-parse formats. At this point, unnecessary data can be filtered out (e.g., a car sitting in a parking spot while one provides no useful data for driving situations).
  • Annotation. The success of training data rests on how well it is annotated. At this point, objects in the data are annotated, describing what everything in a given frame represents, such as cars, pedestrians, road signs, lane markings, and so on. The accuracy of these annotations is critical, as they tell the machine learning algorithms what everything is.
  • Validation. The annotations are scrutinised once more, checking for any errors or inconsistencies. Once passed, it can be added to the training data, hopefully improving the automaker’s self-driving system.

How It’s Handled

For large corporations, storage costs are not too cost-prohibitive, so storing all these TBs of data for automakers isn’t too large of a concern, especially after its processed and extraneous data is discarded.

The main roadblock then becomes the actual processing of large amounts of data. As such, many companies have turned to using machine learning to perform the annotations. You tell a neural network what a stop sign looks like enough times, and it’ll start to be able to pick it out on its own. This significantly speeds up the process for automakers. However, human review is still needed to review the work, particularly in edge cases where the algorithms are presented with scenes on which they don’t have much data.

Additionally, to supplement any real-world data, automakers also use simulated situations that they generate from real-world data, entirely skipping the initial data collection set and allowing automakers to simulate specific situations that their system might have issues with.

At the end of the day, all this reliance on data has significantly changed the organisational makeup of automakers, especially compared to what they were 15-20 years ago. With a much higher reliance on computer science, UI/UX design, and data analysts, the average automaker is starting to look more and more like the average tech company. However, this is necessary to handle these monumental tasks of collecting and analysing large amounts of data.