From the course: Deep Learning with Python: Optimizing Deep Learning Models
Parameters versus hyperparameters - Python Tutorial
From the course: Deep Learning with Python: Optimizing Deep Learning Models
Parameters versus hyperparameters
- [Instructor] In machine learning and deep learning, parameters and hyperparameters are fundamental concepts that play distinct roles in model design, training, and optimization. Parameters are the internal variables of a model that are learned from the training data during the training process. These are the values that the model adjusts to fit the data and make accurate predictions. Parameters are not set manually. Instead, they're optimized by the learning algorithm to minimize the loss function, which measures the difference between the model's predictions and the actual data. For example, consider a simple linear regression model that predicts house prices based on square footage. The model can be represented by the equation shown here, where Y is the predicted house price, X is the input feature square footage, W is the weight or slope of the line, and B, is the bias, also known as the Y intercept. In this equation, W and B are the parameters. During training, the model adjusts W and B to minimize the loss, which represents the difference between the predicted prices and the actual prices in the training dataset. Now let's look at a slightly more complex example involving a neural network. Suppose we're building a neural network to recognize handwritten digits. The training dataset contains 70,000 images of handwritten digits that range in value from zero to nine. Each image is 28 by 28 pixels, resulting in 784 input features per image. Our neural network could have an architecture similar to the one shown here, an input layer with 784 neurons, one for each pixel, two hidden layers with 512 neurons and 128 neurons, and an output layer with 10 neurons, one for each digit class from zero to nine. In this network, the parameters consist of the weights and biases. Each connection between neurons has an associated weight. Between the input layer and the first hidden layer, there are 784 times 512 weights. Between the first hidden layer and the second hidden layer, there are 512 times 128 weights. Finally, between the hidden layer and the output layer, there are 128 times 10 weights. Each neuron in the hidden and output layers also has a bias term, so there are 512 biases in the first hidden layer, 128 biases in the second hidden layer, and 10 biases in the output layer. These weights and biases are adjusted during training using an optimization algorithm like stochastic gradient descent. The goal is to find optimal values for these parameters that minimize the loss function. Now that we know what parameters are, let's discuss hyperparameters. Hyperparameters are external configurations set before the training process begins. Unlike model parameters which are adjusted and learned directly from the training data during the model's training process, hyperparameters govern the behavior of the training algorithm and the overall model architecture, but are not themselves learned from the data. They often require experimentation and tuning to achieve optimal performance. We can also think of the difference between parameters and hyperparameters in the context of building a house. Parameters are like the materials used in construction, bricks, cement, wood. The quality and arrangement of these materials determine the strength and stability of the house. Hyperparameters are like the architectural blueprint, the design decisions such as the number of rooms, the layout, the type of foundation. These decisions guide how the house is built, what are determined before construction begins. Examples of hyperparameters in deep learning include the learning rate, which determines the step size during weight updates. The batch size, which defines how many samples are used to compute the gradient in each iteration. And the number of epochs, which dictates how many times the model iterates over the entire dataset during training. In the next video, we'll explore some of the other key hyperparameters used in deep learning.