From the course: Learning Graph Neural Networks

Unlock this course with a free trial

Join today to access over 24,400 courses taught by industry experts.

Exercise: Mini batches of data

Exercise: Mini batches of data

- [Instructor] Let's explore another dataset available from the PyG datasets library. This dataset is the proteins dataset. Again, a benchmark dataset in the field of graph based machine learning. This dataset is particularly used for tasks involving graph classification, so not node classification, but graph classification. It's part of the TU dataset collection, which includes several other datasets for classification tasks. Now, the dataset consists of a collection of protein structures, each represented as a graph. These protein structures can be classified as enzymes or non enzymes. Nodes represent the amino acids and two nodes are connected by an edge if they're less than six angstroms apart. Edges are just spatial adjacencies between these amino acids. We instantiate the two datasets, specify root as proteins and name as proteins_full to download the full proteins dataset. Notice it's downloaded as a ZIP file and made available to us. Now, if you look at the length of the…

Contents