From the course: Introduction to Large Language Models
What are parameters
- [Instructor] When talking about large language models, we almost always reference the size of the model or the parameter count. GPT-3 is 175 billion parameter model. Meta's largest Llama 2 model has 70 billion parameters. But what do we mean by parameters? Now the parameters in a neural network are the variables that the model learns during the training process. They get adjusted during training because for a given input during the training process, you want to try and minimize the difference between the predicted outputs from the actual output. Let me give you an example. This is a visual representation of a neural network and you can see that the architecture has layers. So a node is represented by circles in this graphic. It receives input from other nodes, it processes it, and then passes the outputs to other nodes. So nodes represent neurons, and a collection of nodes or neurons is known as a neural network. The input layer has three nodes, the hidden layers have four nodes each, and the output layer has one node. This is a fully connected network. So going from the input layer on the left this means that there is a line from every node in one column or layer to all the nodes in the next layer. This type of neural network usually makes a small part of most large language model architectures. Each of the lines connecting one node to all the other nodes are edges and they represent weights. The network learns by adjusting these weights and we can easily calculate the number of weights by multiplying the number of nodes in the left layer and the right layer. So just having the weights associated with the connections won't give us an accurate result when trying to fit or train our model with data. So we'll need an offset to adjust any output and this is known as the bias term. We'll also have one bias term for each node on the right. So in this example, there will be four in total. This means that the number of parameters in this example is three times four, plus four, which is a total of 16 parameters. We apply exactly the same logic to the next part of the network, which is the middle section. And we have four nodes on the left and four nodes on the right. So that's four times four, plus four for the bias term. And so that's a total of 20. And finally, we have four times one, plus one for the final part of the neural network. So this means if we sum up all of the parameters across this neural network, we get 41 parameters. So we say that this is a 41 parameter model. Now to give you a sense of perspective, most large language models have several billion parameters. A large language model would also have different types of layers with different components. I've just shown you how you'd calculate the parameter count for a fully connected network. All right, so we've learned that parameters in a neural network are the variables that the model learns during the training process and to get adjusted during this process. We've also looked at how to calculate the parameter count for a fully connected network. Go ahead and Google neural network and you should see a whole lot of images that look similar to this one. Now try and calculate the parameter count for some of these neural networks.