Road to Snowflake SnowPro Core Certification: Micro-partitions

Fourth Chapter: Micro-partitions

Gonzalo Fernandez Plaza
4 min readSep 7, 2021
Fourth Chapter of the Snowflake SnowPro Core Certification Complete Course.
Fourth Chapter of the Snowflake SnowPro Core Certification Complete Course.

This chapter will study how Snowflake stores data internally using micro-partitions. These are the key concepts that we are going to review:

  1. Micro-partitions in Snowflake
  2. Snowflake Pruning Process
  3. Typical SnowPro exam questions regarding micro-partitions

Remember that all the chapters from the course can be found in the following link.

SNOWFLAKE MICRO-PARTITIONS

All data in Snowflake tables are automatically divided into micro-partitions, contiguous units of storage between 50 and 500MB of uncompressed data, organized in a columnar way. They are the physical structure of the tables. This is important, as they usually ask this question in the Snowflake SnowPro Core exam.

How a table is organized into Micro Partitions in Snowflake.
A table is organized into Micro Partitions in Snowflake (via docs.snowflake.com).

Micro partitions are immutable, meaning they cannot be changed once created. If a row is updated, the micro-partition holding the row is copied into a new micro-partition, where the updated row will be inserted.

The older micro-partition is then marked for deletion. This is important to understand.

SNOWFLAKE PRUNING PROCESS

Snowflake uses micro-partitions for the query pruning process, which consists of analyzing the smallest number of micro-partitions to solve a query. This technique retrieves all the necessary data to give a solution without looking at all the micro-partitions, saving a lot of time to return the result. For example, if we have a micro-partition for each day of the year, the most efficient way to give a result would be to scan just 1/365 micro-partitions.

Let’s try to understand this topic with an example (via Learning Journal):

1. The data that we logically see as a table is physically organized in Micro-partitions:

How a Snowflake table is divided into Micro-partitions.
How a Snowflake table (left) is divided into Micro-partitions (right).

2. Snowflake automatically sorts the data in a columnar way, adds the header to the Micro-partition (with offsets), and compresses data by columns:

Headers in Snowflake micro-partitions.
Headers in Snowflake micro-partitions.

3. When we query the table, Snowflake will know which micro-partition to access just by looking at the metadata. This is the pruning process mentioned above. In the following picture, we can see how filtering by “Amazon.co.uk” would only access the second one:

Pruning process in Snowflake.
Pruning process in Snowflake.

4. Snowflake also has column pruning, only reading the columns that we need:

Column Pruning in Snowflake.
Column Pruning in Snowflake.

Due to this process, we’ve saved a lot of time returning the result. Snowflake stores metadata about all rows stored in a micro-partition, including:

  • The range of values for each of the columns in the micro-partition.
  • The number of distinct values.
  • Additional properties are used for both optimization and efficient query processing.

TYPICAL SNOWPRO EXAM QUESTIONS REGARDING MICRO-PARTITIONS

What technique does Snowflake use to limit the number of micro-partitions retrieved as part of a query?

  1. Pruning
  2. Clustering
  3. Indexing
  4. Computing

Solution: 1.

Which statements are correct about micro-partitions in Snowflake?

  1. Contiguous units of storage
  2. Non-contiguous units of storage
  3. 50 and 500MB of compressed data
  4. 50 and 500MB of uncompressed data
  5. Organized in a columnar way

Solution: 1, 4, 5. This definition is a must, and we need to know it perfectly “All data in Snowflake tables are automatically divided into micro-partitions, which are contiguous units of storage between 50 and 500MB of uncompressed data, organized in a columnar way”.

Which options are correct regarding the data that is stored in micro-partition metadata?

  1. The range of values for each of the columns in the micro-partition.
  2. The number of distinct values.
  3. Additional properties are used for both optimization and efficient query processing.

Solution: 1, 2, 3. All of them are true.

This chapter doesn’t contain many questions as it is necessary to understand the cluster keys process before continuing with more questions. We will see it in the next chapter. See you soon!

Thanks for Reading!

If you like my work and want to support me…

  1. The BEST way is to follow me on Medium here.
  2. Feel free to clap if this post is helpful for you! :)
  3. More Snowflake SnowPro Core practice exam questions? Find them at FullCertified.com!

--

--

Gonzalo Fernandez Plaza
Gonzalo Fernandez Plaza

Written by Gonzalo Fernandez Plaza

Computer Science Engineer & Tech Lead 🖥️. Publishing AWS & Snowflake ❄️ courses & exams. https://www.fullcertified.com

Responses (1)