Road to Snowflake SnowPro Core Certification: Snowflake Architecture

Second Chapter: Snowflake Architecture

Gonzalo Fernandez Plaza
6 min readAug 30, 2021
Second Chapter of the Snowflake SnowPro Core Certification Complete Course.
Second Chapter of the Snowflake SnowPro Core Certification Complete Course.

In this chapter, we will explore the architecture of Snowflake and cover key concepts essential for the Snowflake SnowPro Core Certification Exam. The main topics include:

  1. Shared-Disk & Shared-Nothing
  2. Snowflake Layers
  3. Snowflake Objects
  4. Typical Exam Questions about Snowflake Architecture

Don’t forget — Every chapter from the course and 300 practice questions are available at FullCertified. 🚀📚

SHARED-DISK & SHARED-NOTHING

Snowflake’s architecture combines elements of both shared-disk and shared-nothing architectures, which is crucial to understand for the exam.

  • Shared-Disk Architecture: Similar to this approach, Snowflake uses a central data repository where all data is stored and can be accessed by all compute nodes (warehouses) in the platform. This central repository ensures that data is consistently available to all the nodes.
  • Shared-Nothing Architecture: Like shared-nothing systems, Snowflake processes queries using virtual warehouses. Each node in a virtual warehouse holds a portion of the data locally, allowing for distributed query processing and improved performance.
Shared Disk & Shared Nothing Architectures.
Shared Disk & Shared Nothing Architectures.

This hybrid approach provides the data management ease of a shared-disk system with the performance and scalability benefits of a shared-nothing system.

SNOWFLAKE LAYERS

Snowflake’s unique is composed of three primary layers, each with high availability. Each layer is priced separately:

1) Centralized Database Storage (Storage): Data loaded into Snowflake is reorganized into an optimized, compressed, columnar format. Snowflake handles all aspects of data storage, ensuring efficiency and reliability.

2) Query Processing (Compute): Query execution occurs in the compute layer, where virtual warehouses process queries. Each virtual warehouse consists of multiple compute nodes provided by a cloud provider.

3) Cloud Services: This layer includes various services that manage operations across Snowflake, such as:

  • Authentication: Ensures that users are who they claim to be when accessing Snowflake. Snowflake supports various authentication methods, including single sign-on (SSO) and multi-factor authentication (MFA).
  • Infrastructure management: Manages the underlying cloud infrastructure that Snowflake operates on.
  • Metadata management: Metadata includes information about the structure of data, such as tables, columns, and data types, as well as details about how data is accessed and used. Snowflake’s metadata management ensures that this information is up-to-date and available for querying and analysis.
  • Query parsing and optimization: Processes and optimizes SQL queries to ensure they execute efficiently
  • Access control: Manages user permissions and access to data within Snowflake.

You can see the different layers in the following diagram (via snowflake.com).

Snowflake Layers.
Snowflake Layers.

SNOWFLAKE OBJECTS

In this section, we will introduce some key Snowflake objects. These will be discussed in more detail later in the course:

  • Account → A unique identifier within an organization, e.g., “tn10000.eu-west-1.snowflakecomputing.com”.
  • Warehouse → Warehouses are the compute part of the Snowflake engine. They are a set of virtual machines provided at the runtime to help execute a given query.
  • Database → A logical collection of schemas. It serves as a container for organizing objects such as tables and views.
  • Schema → A logical grouping of objects within a single database. When a database is created, it automatically includes two schemas, the Public Schema, and the Information_Schema.

Schemas can contain various types of objects, the most important ones are detailed below:

  • Tables → These are database objects that store all the data within a database. Tables are organized into rows and columns, holding the actual data for queries and analysis.
  • Views → A view is a virtual table created by querying one or more tables. It displays the result of the query as if it were a table, allowing for simplified data retrieval and reporting.
  • Stages → Stages are locations in cloud storage where data files are stored before they are loaded into Snowflake tables.
  • File Formats → These are predefined structures that describe how data files are formatted. Snowflake supports various formats including CSV, JSON, AVRO, ORC, PARQUET, and XML. File formats help Snowflake understand how to parse and load the data.
  • Sequences → Sequences are database objects used to generate unique numeric values. They function similarly to counters and are often used for generating primary keys or unique identifiers.
  • Pipes → Pipes are a specialized object in Snowflake that automate the process of loading data from files into Snowflake tables as soon as the files are available in a stage. They facilitate continuous data ingestion with minimal manual intervention.
  • Stored Procedures & User-Defined Functions (UDF) → These objects allow you to extend Snowflake’s capabilities.

ACCOUNT VS SCHEMA OBJECTS

Account-level objects, such as roles or users, are created for the entire account, while schema-level objects, like tables or views, are specific to individual schemas.

In the image below, you can see some of the most common objects at both the schema and account levels.

Snowflake objects are divided by Account or Schema Level.
Snowflake objects are divided by Account or Schema Level.

TYPICAL EXAM QUESTIONS ABOUT SNOWFLAKE ARCHITECTURE

The three main layers of Snowflake are…

  1. Extraction, Ingestion, Load
  2. Database Storage, Compute & Cloud Services
  3. Database, Virtual Warehouse, Data Experience

Solution: 2.

Select the term that is associated with the compute layer:

  1. Query optimization
  2. Query planning
  3. Query processing

Solution: 3. You can find the name of the layers in different ways, like Query Processing for the Compute Layer.

Which of the following services are provided by the Cloud Services Layer?

  1. Metadata Management
  2. Authentication
  3. Storage
  4. Infrastructure Management
  5. Query Execution

Solution: 1, 2, 4.

What statement is true about Snowflake's unique architecture?

  1. One Node Shared Data
  2. Multi-Cluster Shared Data
  3. One Node Private Data

Solution: 2.

What is the storage hierarchy in Snowflake?

  1. Account → DB → Schemas → Objects
  2. Account → Schemas → DB → Objects
  3. Account → DB → Objects → Schemas

Solution: 1.

Can two Virtual Warehouses access the same data simultaneously without any contention issues?

  1. True
  2. False

Solution: 1. We haven't seen this topic in-depth, but this is the main reason for the "Shared Data" concept. All the data is in the same place, and the Virtual Warehouse has access.

Are the interactions with data initialized through the services layer?

  1. True
  2. False

Solution: 1. Cloud services tie together all of the different components of Snowflake to process user requests, from login to query dispatch.

In which layer of Snowflake architecture is stored all security-related information?

  1. Storage
  2. Compute
  3. Cloud Services
  4. All of the above.

Solution: 3.

Can the table functions in INFORMATION_SCHEMA be used to return account-level usage and historical information for storage, warehouses, user logins, and queries?

  1. True
  2. False

Solution: 1.

Thanks for Reading!

If you like my work and want to support me…

  1. The BEST way is to follow me on Medium here.
  2. Feel free to clap if this post is helpful for you! :)
  3. More Snowflake SnowPro Core content? Find it at FullCertified.com!

--

--

Gonzalo Fernandez Plaza

Computer Science Engineer & Tech Lead 🖥️. Publishing AWS & Snowflake ❄️ courses & exams. https://www.fullcertified.com