From the course: Deep Learning with Python: Optimizing Deep Learning Models

Common loss functions in deep learning - Python Tutorial

From the course: Deep Learning with Python: Optimizing Deep Learning Models

Common loss functions in deep learning

- [Instructor] In machine learning, a loss function is a mathematical function that quantifies the error or difference between the predicted outputs of a model, and the actual target values in the training data. In deep learning, loss functions serve as the foundation for training neural networks, as they provide the feedback or error necessary for the optimization process to update the model's parameters, which are the weights and the biases. By minimizing the value of the loss function, the model learns to make predictions that are increasingly accurate over time. Selecting an appropriate loss function is crucial, because it directly influences how a model learns and performs on specific tasks. Such regression, binary classification, or multi-class classification. For regression tasks where the goal is to predict continuous values, the mean squared error, or MSE loss function is a common choice. MSE calculates the average of the square differences between the predicted values and the true values. Mathematically, it is expressed as shown here, where YI represents the true values of the dependent variable in the training data, Y hat I represents the predictive values of the dependent variable, and N is a number of samples. Squaring the differences ensures that the loss is always positive, and penalizes larger errors more heavily. While MSE is widely used, it can be sensitive to outliers, as large deviations contribute disproportionately to the loss. An alternative is the mean absolute error, MAE, which computes the average of the absolute differences between predictive values, and the true values. MAE is more robust to outliers, but may converge slower than MSE during training. For binary classification problems, where the output represents one of two possible classes, for example, zero or one, the binary cross-entropy loss function is commonly used. This loss function measures the difference between the predicted probabilities, and the actual binary labels. Mathematically, it is defined as shown here, where YI are the true binary labels, zero or one, and Y hat I are the predicted probabilities of the positive class. Binary cross-entropy encourages a model to produce probabilities close to one for the positive class, and close to zero for the negative class. This loss function is particularly well suited for tasks like spam detection, medical diagnosis, or fraud detection, where the outputs are the probabilities of either true or false, yes or no, or one or zero. For multi-class classification problems, where the goal is to assign an input to one of several possible classes, the categorical cross-entropy loss is widely used. Similar to binary cross-entropy, this loss function compares the predictive probability distribution over all classes with the actual class labels. Mathematically, it is defined as shown here, where N is a number of samples, K is a number of classes, YIJ is a binary indicator for whether sample I belongs to class J, and Y hat IJ is the predicted probability for sample I being in Class J. Categorical cross-entropy is particularly effective for tasks like image classification, where the model predicts one class out of many possible categories. Categorical cross cross-entropy assumes that the values of the dependent variable are encoded as one hat vectors. However, in situations where these values are encoded as integers, a simplified version of categorical cross-entropy, known as sparse categorical cross-entropy, can be used instead. In addition to the commonly used loss functions introduced here, advanced deep learning tasks often require specialized loss functions tailored to their unique challenges and objectives. For instance, in object detection or segmentation tasks, the intersection over union loss, or dice loss are used to evaluate the overlap between predicted and ground truth bounding boxes, or masks. For sequence to sequence tasks like machine translation, the sequence loss is often employed to handle variable length predictions. Choosing the right loss function is essential for ensuring that a deep learning model learns effectively for the given task. The loss function serves as a driving force for model training, guiding the optimization process to minimize error, and maximize predictive accuracy.

Contents