Road to Snowflake SnowPro Core Certification: Micro-partitions
Fourth Chapter: Micro-partitions
This chapter will study how Snowflake stores data internally using micro-partitions. These are the key concepts that we are going to review:
- Micro-partitions in Snowflake
- Snowflake Pruning Process
- Typical SnowPro exam questions regarding micro-partitions
Remember that all the chapters from the course can be found in the following link.
SNOWFLAKE MICRO-PARTITIONS
All data in Snowflake tables are automatically divided into micro-partitions, contiguous units of storage between 50 and 500MB of uncompressed data, organized in a columnar way. They are the physical structure of the tables. This is important, as they usually ask this question in the Snowflake SnowPro Core exam.
Micro partitions are immutable, meaning they cannot be changed once created. If a row is updated, the micro-partition holding the row is copied into a new micro-partition, where the updated row will be inserted.
The older micro-partition is then marked for deletion. This is important to understand.
SNOWFLAKE PRUNING PROCESS
Snowflake uses micro-partitions for the query pruning process, which consists of analyzing the smallest number of micro-partitions to solve a query. This technique retrieves all the necessary data to give a solution without looking at all the micro-partitions, saving a lot of time to return the result. For example, if we have a micro-partition for each day of the year, the most efficient way to give a result would be to scan just 1/365 micro-partitions.
Let’s try to understand this topic with an example (via Learning Journal):
1. The data that we logically see as a table is physically organized in Micro-partitions:
2. Snowflake automatically sorts the data in a columnar way, adds the header to the Micro-partition (with offsets), and compresses data by columns:
3. When we query the table, Snowflake will know which micro-partition to access just by looking at the metadata. This is the pruning process mentioned above. In the following picture, we can see how filtering by “Amazon.co.uk” would only access the second one:
4. Snowflake also has column pruning, only reading the columns that we need:
Due to this process, we’ve saved a lot of time returning the result. Snowflake stores metadata about all rows stored in a micro-partition, including:
- The range of values for each of the columns in the micro-partition.
- The number of distinct values.
- Additional properties are used for both optimization and efficient query processing.
TYPICAL SNOWPRO EXAM QUESTIONS REGARDING MICRO-PARTITIONS
What technique does Snowflake use to limit the number of micro-partitions retrieved as part of a query?
- Pruning
- Clustering
- Indexing
- Computing
Solution: 1.
Which statements are correct about micro-partitions in Snowflake?
- Contiguous units of storage
- Non-contiguous units of storage
- 50 and 500MB of compressed data
- 50 and 500MB of uncompressed data
- Organized in a columnar way
Solution: 1, 4, 5. This definition is a must, and we need to know it perfectly “All data in Snowflake tables are automatically divided into micro-partitions, which are contiguous units of storage between 50 and 500MB of uncompressed data, organized in a columnar way”.
Which options are correct regarding the data that is stored in micro-partition metadata?
- The range of values for each of the columns in the micro-partition.
- The number of distinct values.
- Additional properties are used for both optimization and efficient query processing.
Solution: 1, 2, 3. All of them are true.
This chapter doesn’t contain many questions as it is necessary to understand the cluster keys process before continuing with more questions. We will see it in the next chapter. See you soon!
Thanks for Reading!
If you like my work and want to support me…
- The BEST way is to follow me on Medium here.
- Feel free to clap if this post is helpful for you! :)
- More Snowflake SnowPro Core practice exam questions? Find them at FullCertified.com!