Road to Snowflake SnowPro Core Certification: File Formats & Sequences
Chapter 19: Other Snowflake Objects. File Formats & Sequences
As we saw in the chapter on Snowflake architecture, there were several types of objects in Snowflake. Tables, views, stages, pipes, stored procedures, UDFs… We have seen all of them during this course, although we missed two file formats and sequences. In this chapter, we will see these two new objects!
Remember that all the chapters from the course can be found at the following link:
FILE FORMATS
A named file format object describes and stores the format information required to load data into Snowflake tables. You can specify different parameters, for example, the file’s delimiter, if you want to skip the header or not, etc.
Snowflake supports both Structured and Semi-Structured Data, so just as an example, you can store JSON files in tables. Let’s see the differences between them:
Structured Data
- CSV → You can Load and Unload files in CSV format. It’s the fastest file format to load data.
Semi-structured Data
Semi-structured data is saved as Variant type in Snowflake tables, with a maximum limit size of 16MB, and it can be queried using JSON notation. You can store arrays, objects… Non-native values, such as dates and timestamps, are stored as strings when loaded into a VARIANT column.
- JSON → It’s used for both loading & unloading data.
- Parquet → Binary format used for both loading & unloading data.
- XML → You can only load data in Snowflake using the XML format.
- Avro → Binary format used to load data.
- ORC → Binary format used to load data.
You can use FLATTEN to convert semi-structured data to a relational representation. It takes a Variant, Object, or Array column and produces a lateral view.
SEQUENCES
We use sequences to generate unique numbers across sessions and statements, including concurrent statements. You can use them to generate values for a primary key or any column that requires a unique value. They have an initial value and an interval.
You can access sequences in queries as expressions. The function “nextval”, will generate a set of distinct values. I don’t think the syntax will appear in the Snowflake SnowPro Core exam, but let’s make an example. Having the previous sequence “PEOPLE_SEQ” this is how we can increase it:
INSERT INTO PEOPLE (ID, NAME) VALUES
(PEOPLE_SEQ.nextval, "Gonzalo"),
(PEOPLE_SEQ.nextval, "Nacho"),
(PEOPLE_SEQ.nextval, "Megan"),
(PEOPLE_SEQ.nextval, "Angel")
We created a sequence with an initial value of 1 and an interval of 2. So the first value will be 1, and the second one will be 3. The result is shown in the next table:
+--------+---------+
| ID | NAME |
+--------+---------+
| 1 | Gonzalo |
+--------+---------+
| 3 | Nacho |
+--------+---------+
| 5 | Megan |
+--------+---------+
| 7 | Angel |
+--------+---------+
We can also use the “Default” statement when creating the tables, increasing the sequence automatically.
CREATE OR REPLACE TABLE PEOPLE
(
ID NUMBER DEFAULT PEOPLE_SEQ.nextval,
NAME VARCHAR(50)
)
-------------
INSERT INTO PEOPLE (NAME) VALUES
("Gonzalo"),
("Nacho"),
("Megan"),
("Angel")
This will generate the same result as before.
TYPICAL EXAM QUESTIONS
Does Snowflake allow only the load of structure data?
- True
- False
Solution: 2.
Which of the following file formats are supported by Snowflake?
- CSV
- XML
- TXT
- Avro
Solution: 1, 2, 4.
How will you store JSON data in Snowflake?
- Using a column of the JSON type
- Using a column of the VARCHAR type
- Using a column of the VARIANT type
- Using a column of the NULL type
Solution: 3.
Which of the following object types are stored within a schema?
- Tables
- Views
- File Formats
- UDFs
- Roles
- Users
- Sequences
Solution: 1, 2, 3, 4, 7.
You have the following data in a variant column from the table “myTable”. How can you query the favorite technology that Gonzalo uses?
{
"name": "Gonzalo",
"country": "Spain",
"favouriteTechnology": Snowflake,
"hobbies": [
{
"name": "soccer",
"since": "2000",
},{
"name": "music",
"since": "2005",
},{
"name": "technology",
"since": "1996",
}
]
}
- SELECT favouriteTechnology FROM myTable;
- SELECT src:favouriteTechnology FROM myTable;
- SELECT src:$favouriteTechnology FROM myTable;
- SELECT CONVERT_JSON(src:favouriteTechnology) FROM myTable;
Solution: 2. In the Snowflake SnowPro Core exam, they won’t probably ask you about syntax; that’s why I didn’t explain it before. But it’s really useful to know this command.
Which table function allows you to convert semi-structured data to a relational representation?
- FLATTEN
- CHECK_JSON
- PARSE_JSON
Solution: 1.
What file format provides the fastest load performance?
- Parquet
- JSON
- Avro
- CSV
Solution: 4.
Thanks for Reading!
If you like my work and want to support me…
- The BEST way is to follow me on Medium here.
- Feel free to clap if this post is helpful for you! :)
- More Snowflake SnowPro Core practice exam questions? Find them at FullCertified.com!