From the course: Introduction to Modern Data Engineering with Snowflake

Loading data using snowflake CLI

Another powerful way of ingesting data into Snowflake is using Snowflake's official Command Line Interface. This is especially useful for when you want to automate the loading of data using a script and when you want to integrate this automation into CI/CD pipelines. We'll cover data pipeline automation in a later module, and we'll cover DevOps in a different course intended to be the follow-up to this course. For now, we'll focus on using the Snowflake CLI to load data into your Snowflake account. Later, when you learn about DevOps with Snowflake, you'll know how to integrate what you learn here into CI/CD pipelines. Let's walk through the process of using Snowflake CLI to load data into Snowflake together. In an earlier video, we installed Snowflake CLI, so we can dive right in. If you haven't yet installed Snowflake CLI, you can pause the video and install it by following the instructions in the corresponding video in the first module. The first thing you'll need to do is configure your connection to Snowflake. When you installed Snowflake CLI, a file named config.toml was created for you. This file is where you'll configure and manage connections to Snowflake. Let's start by opening VS Code. Next, let's open the config.toml file. On my Mac, this file lives in a folder called .snowflake, which is created in my root directory. If you're on a different operating system, it'll be in a different directory. If you're having trouble finding the file, check out Snowflake's official documentation on the Snowflake CLI for guidance. Also, if for some reason this file doesn't exist for you, even though you installed Snowflake CLI, you can run snow --help. You can actually run any Snowflake CLI command, and the file will automatically be created for you. Okay. I'll go ahead and open the file by typing code ~/.snowflake/config.toml. Here, I'll configure our connection to Snowflake. Let's name our connection modern_data_engineering_snowflake. You can name your connection whatever you want, but just be sure it starts with "connections" followed by the name you want to give your connection. Next, I'll specify the account identifier, username, password, and a couple of other parameters here. Technically, only the account identifier, user, and password parameters are required to connect, but you can specify the other parameters like database, schema, and role in case you want your connection to persist a specific Snowflake context. To find your account identifier, navigate to your Snowflake account. Click on your account details at the bottom left, navigate to Account, hover over the account you're currently logged into, and click the Copy Account Identifier button. Navigate back to the file and paste it in surrounded by double quotes. Update the period to be a dash. This is a critical part that is often overlooked, so double-check to be sure you did this. Next, add your username and password. Set warehouse to COMPUTE_WH, database to LOAD_DATA, schema to PUBLIC, and role to ACCOUNTADMIN. Finally, at the top of the file, I'll specify that this should be the default connection used when executing any Snowflake CLI commands. I'll do this by typing default_connection_name equals modern_data_engineering_snowflake. Okay. Save the file. Now let's quickly sanity-check ourselves by testing the connection. Type snow connection test. If the connection is correctly configured, you should see output like this. Note the value "OK" for the field "Status." Okay. So with our connection configured, we can use the Snowflake CLI to load data into Snowflake. We'll prepare the data, create a stage, then load the data into Snowflake. The first thing that you'll need to do is prepare the data. Using the terminal, navigate to the module-2 folder in the repo. We'll use the Snowflake CLI to load the sample orders CSV file into our Snowflake environment. We don't need to do anything to this data. It's ready to load. Next, let's create a stage within Snowflake. This stage is where we'll upload the sample orders file to. Then we'll load the data from the stage into the LOAD_DATA database. You can use a web interface to create the stage, or you can use Snowflake CLI. I'll demonstrate how to do this using the Snowflake CLI. Type snow stage create snowflake_cli_stage. If successful, you should see output similar to this. Next, we'll use a snow stage copy command to load the sample order CSV file into Snowflake. Let's type snow stage copy sample_orders.csv @snowflake_cli_stage. This command will load sample order CSV to the Snowflake CLI stage that we just created. If successful, you should see output like this. Note the status field that says "UPLOADED." Let's run a quick sanity check. Type snow stage list-files @snowflake_cli_stage. We can also navigate to our Snowflake account and confirm that the stage is created and that the file was uploaded. Okay. That looks great. Let's load the data from the file into a table now. We can always run some SQL in a SQL worksheet to do this, but let's use a Snowflake CLI to do this instead. The file named "load_from_cli_stage.sql" within the module-2 folder contains a SQL that will run to load the data. We'll execute this file directly from the command line using Snowflake CLI. First, let's add the file to the stage. Run snow stage copy load_from_cli_stage.sql @snowflake_cli_stage. Now let's execute this file. Run snow stage execute @snowflake_cli_stage/ load_from_cli_stage.sql. Okay. Excellent. I see a success message in the terminal, and if I navigate to my Snowflake account, I can confirm that the table was indeed created and that the data was loaded into it. That was pretty cool. Feel free to browse the code and the load_from_cli_stage.sql file. You were exposed to some of this type of SQL in an earlier video when we loaded data via the web interface, but we haven't covered the concepts in detail just yet. That's okay. We're going to cover them in the next video. For now, it's important for you to come away from this exercise knowing that you can load data into your Snowflake environment using the Snowflake CLI entirely, and this makes it an invaluable tool in the context of data ingestion, especially when you want to automate the loading of data using a script or when you want to integrate this automation into CI/CD pipelines. Coming up, we'll explore one of the most common and powerful commands for batch ingestion of data into Snowflake: the COPY INTO command.

Contents