Spencer Karofsky

Spencer Karofsky

Home
Skills
Projects
Experience

Traffic Sign Classification
Using Convolutional Neural Networks

April 2023-July 2023

GitHub Repository Link

Overview

To develop my skills in deep learning and computer vision, I trained a Convolutional Neural Network to classify eight common traffic signs: Stop, Yield, Do Not Enter, No U-Turn, No Left Turn, No Right Turn, One Way (Left), and One Way (Right).

First, I custom-built a dataset by creating miniature, freestanding traffic signs, and took dozens of pictures of each sign. I then applied OpenCV transformations to each image to artificially enlarge and augment the dataset.

python

    
    # augmentImages function enlarges the dataset by augmenting an image to n number of images.
    def augmentImage(img):
        # Rotation
        # Set random rotation angle that follows a normal distribution
        center = (WIDTH / 2, HEIGHT / 2)
        angle = (int)(np.random.normal(0,10)) # large standard deviation of 10 degrees
        if angle < 0:
            angle = 360 + angle  # Convert negative angle to positive angle
        rot_matrix = cv2.getRotationMatrix2D(center,angle,1.0)
        img = cv2.warpAffine(img, rot_matrix,(WIDTH,HEIGHT))

        # Translation -- shift amounts follow normal distribution
        shift_x = np.random.normal(0,5) # ~68% of values will fall between -5 and 5
        shift_y = np.random.normal(0,5) # ~95% of values will fall between -10 and 10
        M = np.float32([[1, 0, shift_x], [0, 1, shift_y]]) # transformation matrix
        img = cv2.warpAffine(img, M, (img.shape[1], img.shape[0]))

        # Contrast Adjustment
        contrast_factor = np.random.normal(1.5,.5)
        img = cv2.convertScaleAbs(img, alpha=contrast_factor, beta=0)

        # Brightness Adjustment
        brightness_factor = np.random.normal(1.5,.4)
        img = cv2.convertScaleAbs(img, alpha=brightness_factor, beta=0)

        return img
    
    

Here's a visualization of the function on an image (top left image):

Visualization of the augmentImage function. The top left image is the base image, and the rest are augmented.

I used the ResNet-50 Neural Network architecture to train the dataset. I chose this architecture experimentally, as it yielded the best accuracies of all the architectures that I tested over numerous runs.

Graphical comparison of four popular CNN architectures. The ResNet-50 architecture was the best-performing CNN architecture as it exceeded 90% accuracy during the 8th and 9th epochs.

During the initial stages of training, I only achieved accuracies that were at best 60-80% on the validation set. After experimenting with numerous architectures and tuning the hyperparameters, I achieved a 94.04% accuracy on the validation set and a 95.24% accuracy on the testing set.

Visual representation of the model's predictions on the dataset. Correct predictions are represented by black text, while incorrect predictions are represented by red text.

Expanded view of the model's predictions on the entire test set (168 images). The model correctly classified 95% of the images.

Here's the code that trains the DL model. It divides the shuffled dataset into training, validation, and testing sets, then it trains the model using the ResNet-50 architecture and stops the training if an epoch exceeds 90% validation accuracy.

python

    
    # Divide into X_train, y_train, X_val, y_val for training and validation sets
    # images are X; labels are y
    TRAIN_SIZE = .8 # the percentage of the list that is used for training. Rounded to the nearest integer
    VAL_SIZE = .15 # percentage of validation test set. rounded to the nearest integer

    train_size = (int)((len(images) + .5) * TRAIN_SIZE)
    val_size = (int)((len(images) + .5) * VAL_SIZE)


    # Split data into training, validation, and test sets
    X_train = images[:train_size]
    y_train = labels[:train_size]
    X_val = images[train_size:train_size+val_size]
    y_val = labels[train_size:train_size+val_size]
    X_test = images[train_size+val_size:]
    y_test = labels[train_size+val_size:]

    # Save test sets to pickle files to predict in predict.py
    test_set = {
        'X_test': X_test,
        'y_test': y_test
    }

    with open("test_data.pkl", "wb") as f:
        pickle.dump(test_set, f)

    '''
    CONVOLUTIONAL NEURAL NETWORK
    '''

    # Instantiate the ResNet-50 model
    cnn = tf.keras.applications.resnet.ResNet50(include_top=True, weights=None, input_shape=(WIDTH, HEIGHT, 3), pooling='max', classes=8)

    # Compile the model
    cnn.compile(optimizer='adam', loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True), metrics=['accuracy'])

    # Sourced from: https://towardsdatascience.com/a-practical-introduction-to-early-stopping-in-machine-learning-550ac88bc8fd
    # Stop training cnn when val_accuracy achieves 90% accuracy
    # Define the EarlyStopping callback
    early_stopping = tf.keras.callbacks.EarlyStopping(monitor='val_accuracy', patience=10, mode='max', baseline=0.9, verbose=1)

    # Fit the CNN to the training data
    cnn.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_val, y_val), callbacks=[early_stopping])

    # Evaluate the model
    loss, accuracy = cnn.evaluate(X_val, y_val)
    
    

Here's a more detailed writeup for the project: