Road to Snowflake SnowPro Core Certification: Micro-partitions

Fourth Chapter: Micro-partitions

Gonzalo Fernandez Plaza

--

Fourth Chapter of the Snowflake SnowPro Core Certification Complete Course.
Fourth Chapter of the Snowflake SnowPro Core Certification Complete Course.

This chapter will study how Snowflake stores data internally using micro-partitions. These are the key concepts that we are going to review:

  1. Micro-partitions in Snowflake
  2. Snowflake Pruning Process
  3. Typical SnowPro exam questions regarding micro-partitions

Remember that all the chapters from the course can be found in the following link.

SNOWFLAKE MICRO-PARTITIONS

All data in Snowflake tables are automatically divided into micro-partitions, contiguous units of storage between 50 and 500MB of uncompressed data, organized in a columnar way. They are the physical structure of the tables. This is important, as they usually ask this question in the Snowflake SnowPro Core exam.

How a table is organized into Micro Partitions in Snowflake.
A table is organized into Micro Partitions in Snowflake (via docs.snowflake.com).

Micro partitions are immutable, meaning they cannot be changed once created. If a row is updated, the micro-partition holding the row is copied into a new micro-partition, where the updated row will be inserted.

The older micro-partition is then marked for deletion. This is important to understand.

SNOWFLAKE PRUNING PROCESS

Snowflake uses micro-partitions for the query pruning process, which consists of analyzing the smallest number of micro-partitions to solve a query. This technique retrieves all the necessary data to give a solution without looking at all the micro-partitions, saving a lot of time to return the result. For example, if we have a micro-partition for each day of the year, the most efficient way to give a result would be to scan just 1/365 micro-partitions.

Let’s try to understand this topic with an example (via Learning Journal):

1. The data that we logically see as a table is physically organized in Micro-partitions:

--

--

Gonzalo Fernandez Plaza

Computer Science Engineer & Tech Lead 🖥️. Publishing AWS & Snowflake ❄️ courses & exams. https://www.fullcertified.com