Quantcast
Channel: Machine Learning | Towards AI
Viewing all articles
Browse latest Browse all 792

Revolutionizing Autonomy: CNNs in Self-Driving Cars

$
0
0
Author(s): Cristian Rodríguez Originally published on Towards AI. Photo by Erik Mclean on Unsplash This article uses the convolutional neural network (CNN) approach to implement a self-driving car by predicting the steering wheel angle from input images of three front cameras in the car’s center, left, and right. The architecture of the model used was developed by NVIDIA for its autonomous navigation system called DAVE-2. The CNN is tested on the Udacity simulator. Introduction Self-driving cars can sense their surroundings and move independently through traffic and other obstacles without human input [1]. The usage of autonomous vehicles (AVs) has become a leading industry in almost every area of the world. Over the years, faster and more valuable vehicles have been produced, but in our accelerated world with more and more cars, unfortunately, the number of accidents has increased. In most cases, accidents are the fault of the human driver. Therefore, they could be theoretically replaceable with the help of self-propelled cars [2]. Many relevant companies like Waymo, Zoox, NVIDIA, Continental, and Uber are developing this product. With this type of car, the safety, security, and efficiency of automotive transportation are increased, and human errors can be mitigated while the drive is made to its best [1]. Like humans, AVs rely on various sensor technologies to perceive the environment and make logical decisions based on the gathered information. The most common types of AV sensors are RADAR, LiDAR, ultrasonic, camera, and global navigation systems [3]. The Advanced Driver Assistance System (ADAS) is a sis-tiered system that categorizes the different levels of autonomy. It ranges from vehicles that are solely human-driven to those that are entirely autonomous, as shown in Figure 1. Figure 1. Levels of Autonomy. [3] Convolutional Neural Networks (CNNs) The first work on modern CNNs occurred in the 1990s, inspired by the neocognitron. Yann LeCun et al., in their paper “Gradient-Based Learning Applied to Document Recognition,” demonstrated that a CNN model that aggregates more straightforward features into progressively more complicated features can be successfully used for handwritten character recognition [4]. A CNN is a neural network with one or more convolutional layers and is used mainly for image processing, classification, segmentation, and other auto-correlated data. Before the adoption of CNNs, most pattern recognition tasks were performed by hand during the initial feature extraction stage, followed by a classifier. With the advancement of CNNs, it has become possible for features to be learned automatically from training examples, surpassing human performance on standard datasets [5]. The CNN approach is compelling in image recognition tasks because the convolution operation captures 2D images. Also, using the convolution kernels to scan an entire image requires relatively few parameters to learn compared to the total number of operations [5]. Dataset To obtain the data, the training mode was used in the Udacity simulator [6], driving the vehicle manually on the first track for four laps in one direction and four more laps in the opposite direction. The data log is saved in a CSV file and contains the path to the images, saved in a folder, as well as the steering wheel angle, throttle, reverse, and speed. The steering angles came pre-scaled by a factor of 1/25, so they are between -1 and 1. The data provided consisted of 6563 center, left, and right jpg images for a total data size of 19689 examples. The photos were 320 in width by 160 in height. An example of the left/center/right images taken at a one-time step from the car is shown below. Figure 2. Left, Center, and Right Camera Images Example. [Image by Author] Data Augmentation Augmentation helps to extract as much information from data as possible. Four different augmentation techniques were used to increase the number of images the model would see during training; this reduced the tendency of overfitting. The image augmentation techniques used are described as follows: Brightness reduction: changing brightness to simulate day and night conditions. Left and right camera images: we use left and right camera images to simulate the effect of a car wandering off to the side and recovering. Horizontal and vertical shifts: the camera images are shifted horizontally to simulate the car’s different positions on the road and vertically to simulate the effect of driving up or down the slope. Flipping: since the left and right turns in the training data are not even, image flipping was essential for model generalization. The following images show examples of the data augmentation applied. Figure 3. Brightness Reduction Example. [Image by Author] Figure 4. Horizontal and Vertical Shifts Example. [Image by Author] Figure 5. Image Flipping Example. [Image by Author] Data Preprocessing After augmenting the images, data preprocessing was applied to format them before using them by model training. Preprocessing aims to improve the quality of the image so that it can be analyzed better. The image preprocessing techniques used are described as follows: Cropping: the bottom 25 and top 40 pixels from each image were cropped to remove the front of the car and most of the sky above the horizon from the pictures. RGB to YUV: the images were converted from RGB to YUV type as it has more advantages in illumination change and shape detection. Resizing: to be consistent with the NVIDIA model, all the images were resized to 66 x 200. Normalization: dividing by 255 the image pixel values to have just pixel values between 0 and 1. The following figure shows an example of the image preprocessing applied. Figure 6. Image After Preprocessing Example. [Image by Author] Data Generator As thousands of new training instances are needed from each original image, generating and storing all this data on the disk is impossible. Therefore, a Keras generator was used to read original data from the log file, augment it on the fly, and use it to train the model, producing new images at each batch. Model Proposal As previously mentioned, a CNN model was used. The model architecture, inspired by the NVIDIA model used in its […]

Viewing all articles
Browse latest Browse all 792

Trending Articles