Deep learning is a technology driving advancements in AI, but how? In this blogpost we will start by understanding its roots in neural networks, inspired by how the human brain processes information, and break down how deep learning enables machines to learn and make decisions with minimal human intervention.
What is a Neural Network?
Firstly we need to look at biology. The term neural network originates from biology, describing the intricate network of neurons in the brain that work together to perform complex tasks like decision-making and memory. In 1943, neuropsychologist Warren McCulloch and mathematician Walter Pitts introduced a computational model mimicking how neurons process and transmit information, laying the foundation for modern artificial neural networks.
Basics of Neural Networks in Biology
Neurons:
Neurons are brain cells with three main parts:
- Dendrites: Receive signals from other neurons.
- Cell Body (Soma): Processes signals and decides to pass them on.
- Axon: Transmits signals to other neurons via synapses.
Signal Transmission:
Neurons communicate using electrical impulses and chemical neurotransmitters. When enough input triggers a neuron, it fires an action potential, sending signals to other neurons.
Learning:
Neural networks adapt through synaptic plasticity, strengthening or weakening connections based on usage and therefore enabling learning / memory.
McCulloch and Pitts model put forward this process with computational neurons that process inputs, compare them to a threshold, and output signals. This model inspired artificial neural networks, which simulate learning and adaptability, though they are far less complex than biological networks.
What is Deep Learning?
A neural network as discussed above that has many layers is called a deep neural network, which is where the term deep learning originates. The groundwork for modern deep learning, where multi-layered networks enable machines to analyze complex patterns and make decisions with remarkable accuracy, all thanks to research into human biology.
The rapid progress of deep learning in recent years can be attributed to several key factors. One of the most significant is the crazy amount of data now available for training large, complex neural networks. Deep learning thrives on vast amounts of data (don’t we all?), and only in the last couple of decades have we had access to this scale of information. Data from computers, connected devices, and sensors etc, combined with the efforts of researchers to organize and label it effectively, has made large-scale neural network training possible.
Another reason for this progress is the advancement of computational power. With modern hardware, we can now support much deeper and more complex neural network architectures than ever before, especially with cloud-enabled hardware. This has enabled breakthroughs that were previously unattainable due to simply the lack of technology.
On top of this, researchers have refined the algorithms themselves, addressing many of the inherent limitations in neural network architecture. These improvements have unlocked new potential for neural networks to handle complex tasks more efficiently and effectively.
Right now, deep learning is everywhere, powering applications we use daily. For example, image recognition systems can automatically identify and tag friends in photos you upload to social media or even on your device (If you have an iPhone you can select a person in your photos and it will list all photos you have taken of that person). Similarly, neural machine translation allows apps to translate seamlessly between multiple languages. These advancements showcase how far deep learning has come and its growing influence in daily lives.
Artificial Neuron Types
Perceptron: A perceptron is a simple model for binary classification. It takes a set of inputs (x), multiplies each by a weight (w), sums them up, and passes the result through a threshold function. If the sum is greater than 0, the output is 1; if it’s less than 0, the output is -1. Essentially, the perceptron is a basic linear model that combines inputs with weights and uses a threshold to make predictions.
Logistic Regression: A logistic regression is a simple model used for binary classification. Like a perceptron, it combines inputs (x) with weights (w) to calculate a z-score. However, it adds an activation function, specifically a sigmoid function*, which outputs the probability that the target y equals 1. If this probability is greater than 0.5, the prediction is 1; otherwise, it’s 0. Logistic regression also uses this probability to calculate a loss function, and the goal is to adjust the weights to minimize this loss.
* A sigmoid function is a mathematical function that converts any input value into an output between 0 and 1, making it useful for representing probabilities in models like logistic regression.
Training a Neuron
Training a neuron involves finding the right weights that minimize the cost function. Since neural networks often use nonlinear activation functions like the sigmoid, we can’t solve for the weights directly. Instead, we use gradient descent, an iterative process:
- Start with random weights and calculate the prediction (y) using the current data point (forward propagation).
- Compare the prediction to the actual value to calculate the cost and its gradient.
- Update the weights by moving them in the opposite direction of the gradient, scaled by a learning rate.
- Repeat this process for all data points in the dataset until the cost is minimized.
The learning rate controls how big each step is; if it’s too small, training is slow, and if it’s too large, the process may not converge. By fine tuning the learning rate, the model learns effectively and finds the weights that minimize the cost.
An alternative to the stochastic gradient descent is batch gradient descent, where we use the entire dataset to calculate the gradient and update the weights at each step. This method is efficient because it leverages vectorized operations, making it computationally faster for smaller datasets. However, for very large datasets, batch gradient descent can become impractical due to high computational requirements.
To balance efficiency and scalability, we can use mini-batch gradient descent, which splits the dataset into smaller subsets (batches) and performs gradient updates on these batches. This approach combines the computational efficiency of batch gradient descent with the practicality of handling large datasets. Mini-batch gradient descent is widely used for training neural networks because it works well with large datasets while keeping computations manageable and efficient.
Neurons to Neural Networks
Artificial neurons, like perceptrons, are limited to solving problems with linear decision boundaries. So how do we get from training neurons to a neural network? We do this by combining multiple perceptrons into layers, we can then create neural networks capable of modeling more complex, non-linear relationships.
In a neural network, the input layer passes data to one or more hidden layers, where perceptrons or artificial neurons compute weighted sums of inputs and apply an activation function (e.g., sigmoid, ReLU etc). These activation functions allow the network to capture non-linear patterns in the data.
The outputs of one layer feed into the next, and the final output layer produces predictions. For binary classification, this might be a single output with a threshold; for multiclass problems, the output layer assigns a score to each class, selecting the highest score as the prediction.
By stacking layers, neural networks can approximate complex functions. For example, a three-layer network can use the outputs of two perceptrons in a hidden layer to combine and model non-linear decision boundaries in the output layer.
The flexibility of neural networks lies in their ability to use non-linear activation functions and multiple layers to approximate relationships far beyond the capacity of a single artificial neuron. This layered structure enables them to tackle tasks like image classification or multi-class predictions effectively.
Training Neural Networks
Training a neural network builds on the process used for a single artificial neuron but extends it to multiple layers. Each layer has its own weights that need updating, and we calculate these updates using a method called backpropagation.
- Forward Propagation: Input data flows through the network layer by layer. Each layer calculates outputs by applying weights, biases, and an activation function until we reach the final output.
- Calculate Cost: The predicted output is compared to the actual value, and the cost is computed.
- Backpropagation: Starting from the output layer, we work backward, calculating gradients (using the chain rule) for each layer’s weights based on their contribution to the total cost.
- Weight Updates: Gradients are used to adjust weights via gradient descent. The process is repeated with more data until the network converges to minimize the cost.
Designing a neural network involves decisions like the number of layers, units per layer, activation functions, and learning rate. For large datasets, we often use pre-trained models through transfer learning, where most of the network is already trained, and we fine-tune the final layers for our specific task. This saves time and computational effort while leveraging previous work.
Summary
Deep learning builds on the foundation of neural networks, mimicking how the human brain processes information to solve complex problems. From simple artificial neurons to intricate multi-layered architectures, neural networks have evolved into powerful tools for tasks like image recognition, language translation and much more.
Training these models involves forward propagation to make predictions, backpropagation to calculate gradients, and weight updates using gradient descent. With techniques like mini-batch gradient descent and transfer learning, we can handle large datasets and adapt pre-trained models for specific tasks efficiently.
The flexibility and scalability of deep learning have made it an essential technology driving advancements in AI, powering applications we interact with daily. While the journey from biology to machine learning has been amazing, the potential of deep learning continues to expand as computational power and research advance.
Additional Reading
An Introduction to Machine Learning
McCulloch-Pitts Neuron — Mankind’s First Mathematical Model Of A Biological Neuron