Road to Snowflake SnowPro Core Certification: Data Loading
Eighth Chapter: Data Loading
--
Hello everybody. In this chapter, we are going to study the different ways to COPY data into Snowflake tables, either using bulk load or continuous load with Snowpipe:
Remember that all the chapters from the course can be found in the following link.
DATA LOADING
When we load the data into Snowflake, we usually perform the following steps:
- Source system → Snowflake Stage → Snowflake Table.
A source system (it can be, for example, your application, or not even that, it can be whatever machine that generates data, like sensors) will generate data that it will send to a stage, for example, AWS S3. Once the data is at any stage, we’ll copy it into Snowflake tables. There are two different ways to do it, Bulk Load or Continuous Load; let’s study the differences.
BULK LOAD
Bulk load is the process of loading batches of data from files already available at any stage into Snowflake tables. We use the COPY command to do that (it consumes credits from the virtual warehouses). It supports data transformation while loading, using column reordering, column omission, casting… Let’s see how the COPY INTO command works.
COPY INTO
Using this command, you can load data from staged files to an existing table. It copies the data into tables. It works for any stage (internal, external, table, and user stages) or even from an external location like S3, Azure… Some factors affect the loading time, like the physical location of the stage, GZIP compression efficiency (files are automatically compressed using gzip unless compression is explicitly disabled), or the number and types of columns.