Road to Snowflake SnowPro Core Certification: Stages & Storage Integration
Seventh Chapter: Stages & Storage Integration
Stages in Snowflake specify where data files are stored (staged) so that we can load the data in the files into a table. It is where the files are before moving them to Snowflake tables. There are two types of stages, internal and external stages; let’s see them.
Remember that all the chapters from the course can be found in the following link.
EXTERNAL STAGES
External Stages, or Named External Stages, reference data files stored outside Snowflake. Named Stages are Database Objects. The following Cloud Storage Services are supported (where the files to be copied to Snowflake tables will be located):
- Amazon S3 Buckets for AWS.
- Google Cloud Storage Buckets for Google Cloud.
- Microsoft Azure Containers for Azure.
As we can see, the files are in AWS S3, and we’ll copy them into a Snowflake table using the COPY command that we will study in the next chapter.
INTERNAL STAGES
Stores data files internally within Snowflake. There are the following internal stages:
- User (Represented with “@~”) → Each user has a Snowflake personal stage allocated to them by default for storing files, and no one can access them except the user it belongs to. You can use it when a single user only accesses his files, but he will copy them into several tables. You cannot either see them in the Snowflake interface or drop/alter them; you can only access them using the CLI (SnowSQL). As we can see in the following picture, the data that we copy to the table remains in Snowflake; we don’t get it from a Cloud Provider.
In the first example, we are copying the file “myCSV.csv” from the user’s local directory (we will see in the next chapter how to use the PUT command) to his user’s internal Snowflake stage. Using the second command, we are copying data from the User Internal Stage of the user to a table called “mytable”. It’s important to realize that we use two commands, one to move the data from local to Snowflake and another one to copy from the internal user stage to the Snowflake table, as we can see in the following example:
PUT file://c:/myData/myCSV.csv @~;
COPY INTO MYTABLE FROM @~staged file_format=(format_name = 'my_csv_format');
- Table (Represented with “@%”) → Each table has a Snowflake stage allocated to it by default for storing files. This stage is a convenient option if your files must be accessible to multiple users and they only need to be in a single table. You cannot see them in the Snowflake interface; only access them using the CLI. In the following example, you are copying a file from the user’s local directory to the table stage of a table called “myTable”:
PUT file://c:/myData/myCSV.csv @%myTable
- Named Internal Stage → Cloud Storage Location managed by Snowflake. Named stages are database objects, both external and internal ones. If you plan to stage data files that will be loaded only by you, or will be loaded only into a single table, then you may prefer to use your user stage or the table stage. Named Internal stages can be either permanent or temporary.
STAGE METADATA
Snowflake automatically generates metadata for files in internal or external stages. This metadata is “stored” in the following virtual columns:
- METADATA$FILENAME → Name of the staged data file the current row belongs to. It includes the path to the data file in the stage.
- METADATA$FILE_ROW_NUMBER → Row number for each record in the container staged data file.
In the following example, we can see how to query the metadata from the files in “@mystage1”:
STORAGE INTEGRATION
A storage integration is a Snowflake object that stores a generated identity and access management (IAM) entity for your external cloud storage, along with an optional set of allowed or blocked storage locations (Amazon S3, Google Cloud Storage, or Microsoft Azure). This option will enable users to avoid supplying credentials when creating stages or when loading or unloading data.
Storage integrations are objects at the Account Level, NO Database Level. They can support several stages. In the following example, we can see how to create it for Amazon S3:
TYPICAL EXAM QUESTIONS
What are the different stages available in Snowflake
- User
- Table
- Named internal
- Named external
- Account
- Schema
Solution: 1, 2, 3, 4
Which are the two metadata columns for staged files?
- METADATA$FILENAME
- METADATA$FILEFORMAT
- METADATA$FILE_ROW_NUMBER
Solution: 1, 3
Which character identifies a table stage?
- “@”
- “%”
- “/@”
- “@%”
Solution: 4
Which character identifies a user stage?
- “@~”
- “~”
- “@”
Solution: 1
When staging uncompressed files in a Snowflake stage, are the files automatically compressed using gzip unless compression is explicitly disabled?
- True
- False
Solution: 1.
Which Snowflake object stores a generated identity and access management (IAM) entity for your external cloud storage, along with an optional set of allowed or blocked storage locations (Amazon S3, Google Cloud Storage, or Microsoft Azure)?
- Storage Schema
- Storage Integration
- User Stage
Solution: 2
Can a single storage integration support multiple external stages?
- True
- False
Solution: 1
You have two types of named stages, one is an external stage, and the other one is an internal stage. Will external stages always require a cloud storage provider?
- True
- False
Solution: 1
Thanks for Reading!
If you like my work and want to support me…
- The BEST way is to follow me on Medium here.
- Feel free to clap if this post is helpful for you! :)
- More Snowflake SnowPro Core practice exam questions? Find them at FullCertified.com!