From the course: Introduction to Modern Data Engineering with Snowflake
Build a really simple data pipeline in snowflake - Snowflake Tutorial
From the course: Introduction to Modern Data Engineering with Snowflake
Build a really simple data pipeline in snowflake
Let's build a very simple data pipeline with Snowflake. This pipeline won't be anything highly complex or sophisticated, and that's intentional. The point of this exercise is for you to immediately get hands-on with Snowflake as well as get high-level exposure to the ingestion, transformation, and delivery framework. A quick note before we start building. If you've built data pipelines before using technologies other than Snowflake, then what we'll do in this video may come off as introductory. So you're welcome to skip to the next video, but I recommend following along, as you'll get a sense of the Snowflake-specific patterns that we'll use throughout the course. And who knows, you may also learn something new. If you're brand new to data engineering, you should absolutely follow along, but don't worry. Although I will explain at a high level what I'm doing, you're not expected to understand every single aspect of what we do. Again, the point is for you to get some basic hands-on experience using Snowflake. Okay. So here's the scenario. You're a data engineer tasked with building a data pipeline that can deliver insights for a couple of economic trends for the United States: the Consumer Price Index, also known as CPI, and annual wages. You have access to raw economic data, and you want to use this data to extract and share these insights. So here's what we'll do. We'll load the data into Snowflake, perform some transformations against the data, and deliver our final insights via an app. Follow along with me to build this simple pipeline. Now's a good time to stop the video and make sure you're logged into your Snowflake account. Let's start by loading the data. We're going to get economic data from Snowflake's Marketplace. This data will contain information about many different economic metrics for many countries across long periods of time. This will be the perfect data to use to deliver our annual wages and CPI insights. To load the data, start by navigating to Data Products on the left-hand side of the interface. Click on Marketplace. You'll notice that I'm now in the Snowflake Marketplace, and I can find all sorts of data listings here, some free, some paid, from all sorts of providers. Let's search for the data that we need. At the top, search for Cybersyn Economic Essentials. The first result should be labeled as Financial and Economic Essentials. Click on it. You'll be taken to the datasets landing page, where you can learn more about the type of data and the dataset. On the right, click Get. A modal should appear, giving you the option to configure the name and roles. We don't need to do any of that for now. Just click Get within the modal, and just like that, the data is loaded into the account and ready to use. In this modal, you can click on Done. Next, create a new SQL worksheet. You can do this by clicking on Projects on the left-hand side. The worksheet section within should already be highlighted. At the top right, create a new SQL worksheet. This will create and open a new SQL worksheet for you. And this worksheet is where we'll add the SQL code that will perform the correct operations or transformations to extract the insights we'll need. We won't write the SQL from scratch. Again, the point of this exercise is for you to get exposure to the ITD framework, not to focus on SQL syntax. In the GitHub repository for this course, navigate to the module-1 folder and open the transformations.sql file. Copy everything within the file and paste it into the SQL worksheet. You don't need to understand the SQL or transformations at the moment. For now, all you need to know is that this code will create two tables for us, each with an insight. At the top right, click on the dropdown next to the play button and click on Run All. This will run all of the operations in the file starting at the top. Give it about a minute or so for the operations to complete. Okay. Great. Once the file has been executed, you should see some results in the results pane, and we can also explore what was created. On the left, click on the three dots here and click Refresh. You should now see a database called WAGES_CPI. Click on it to expand it. Click on the DATA schema and then click on Tables. You should see two tables here: one called ANNUAL_WAGES_CPI_USA, and another called MONTHLY_CPI_USA. These are two tables that the SQL code created for us. Each table contains the insight and trend we want to deliver. In fact, if you hover over ANNUAL_WAGES_CPI_USA and then click on the open table details in a new tab button, you'll be able to quickly browse the details behind the table. For example, you can browse on the table data by clicking on Data Preview, and quickly, you can see that this table has average annual wages and CPI data for the USA for the years between 2012 and 2022. Excellent. You can do the same for the other table if you're interested in browsing its details. Now if we wanted to, we could stop here and share the tables with users in our Snowflake account. This would allow others to analyze this data, but let's instead take this a step further by sharing these insights via an app. Navigate to Projects and click on Streamlit. Create a new Streamlit App and give it a name. I'll name the app "WAGES_CPI_USA_APP," but you can name your app whatever you want. Next, set the app location so that it lives in the WAGES_CPI database and DATA schema. Set the app warehouse to COMPUTE_WH. It may already be set to that. Click Create. This creates a Streamlit App within my Snowflake account, and I can use this app to easily visualize this data and share it with my teammates. The first thing you'll see is a sample app that spins up. Pretty cool. Let's remove all of this Python code and instead replace it with the code for our application, which will visualize the annual wages and CPI trend for the USA. Navigate to the streamlit.py file in the module-1 folder of the repo. Copy the entire file and paste it into the code editor within Snowflake. Click Run. You should now see the data visualized in a chart, and the app is only about 60 lines of code. How amazing is that? So now we have an app that visualizes a CPI on a monthly basis and the average annual wages and CPI on a yearly basis. This app makes it a lot easier to parse and quickly understand the insights that we obtained from the data we loaded. And now for the grand finale, let's share the app with our team or other users. Click on Share, and here you'll see you can add roles within your Snowflake account that should have access to the app or grab a handy link to the app to share. That's it. Those are super simple, tiny data pipeline. Let's quickly recap what you did. You loaded economic data from Snowflake's Marketplace into your account. You performed the transformations needed to extract insights for CPI and annual wages. You wrote those results to new tables, and you visualized and delivered the insights using a Streamlit in Snowflake App either for you to consume or share more broadly, and you did all of that in a matter of minutes. Now again, this was a really simple and tiny data pipeline. There's so much more that we can learn and do. Here's some things to think about as we dive deeper in the next module. First, every step we took was manually performed. Could there be ways of automating parts of this pipeline so that it isn't so manual? Second, must our transformations be performed in SQL? If not, what other options are out there? And Streamlit is incredible, but are there other options for sharing data and insights? These are just some of the types of questions that we'll address throughout the course. The most important part of this exercise is that you got hands-on with Snowflake, and you hit every part of the framework: ingestion, transformation, and delivery. This should inspire you and give you confidence that with a little more practice, you can build more complex data pipelines and you can perform more sophisticated data engineering. We're not stopping here, of course. In fact, the rest of this course will be about continuously leveling up. With that, let's dive a lot deeper and start learning all about data ingestion with Snowflake.
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.