Snowflake
Flatten JSON Data

Snowflake FLATTEN: How to Flatten JSON Data

The FLATTEN function in Snowflake expands nested data, such as semi-structured data, into a tabular format that can be more easily manipulated with SQL. It 'explodes' complex data types like VARIANT, OBJECT, or ARRAY, turning one row into multiple rows by expanding nested arrays or objects. This transformation makes semi-structured data more accessible for querying and analysis.

Key Concepts

  • Data Transformation: FLATTEN converts complex data structures into individual rows, simplifying data analysis.
  • Flexible Usage: It allows specific targeting of objects, arrays, or both and can be recursive to handle deeply nested structures.
  • Query Enhancement: By producing a lateral view, FLATTEN enables detailed queries on elements within complex data types.

Common Use Cases

  • Log Analysis: Extract specific elements from JSON-formatted log files for error tracking or performance metrics.
  • API Data Management: Transform nested API response data into a tabular format for easier querying and visualization.
  • Preprocessing for Analytics: Flatten data structures before analysis or machine learning model ingestion.

Syntax


_10
FLATTEN(
_10
INPUT => <expr>,
_10
[PATH => <constant_expr>],
_10
[OUTER => TRUE | FALSE],
_10
[RECURSIVE => TRUE | FALSE],
_10
[MODE => 'OBJECT' | 'ARRAY' | 'BOTH']
_10
)

  • INPUT: Specifies the expression to be flattened, which must be a VARIANT, OBJECT, or ARRAY.
  • PATH: Optional path to the element within the VARIANT data structure to be flattened.
  • OUTER: If set to TRUE, generates a row even if there are no entries to expand.
  • RECURSIVE: If TRUE, expands all sub-elements recursively.
  • MODE: Specifies whether to flatten objects, arrays, or both.

Examples

Flatten JSON data

Imagine a scenario from “The Hitchhiker’s Guide to the Galaxy” where you store information about interstellar journeys in a table, with details about each journey stored in a JSON column.

Our goal is to transform this semi-structured data into a tabular format for easy querying:

Input


_18
CREATE
_18
OR REPLACE TABLE interstellar_journeys AS
_18
SELECT
_18
column1 AS journey_id,
_18
parse_json (column2) AS journey_data
_18
FROM VALUES
_18
( 1,
_18
'{
_18
"voyage": "Heart of Gold",
_18
"stops": [{"planet": "Earth"}, {"planet": "Vogon Constructor Fleet"}, {"planet": "Magrathea"}]
_18
}'
_18
),
_18
( 2,
_18
'{
_18
"voyage": "Starship Titanic",
_18
"stops": [{"planet": "Barnards Star"}, {"planet": "Betelgeuse"}]
_18
}'
_18
);

Table: interstellar_journeys

journey_idjourney_data
1{ "stops": [ { "planet": "Earth" }, { "planet": "Vogon Constructor Fleet" }, { "planet": "Magrathea" } ], "voyage": "Heart of Gold" }
2{ "stops": [ { "planet": "Barnards Star" }, { "planet": "Betelgeuse" } ], "voyage": "Starship Titanic" }

Snowflake SQL Query


_10
SELECT
_10
journey_id,
_10
f.value:planet::varchar AS planet_visited
_10
FROM interstellar_journeys,
_10
LATERAL FLATTEN(input => journey_data:stops) f;

Output

journey_idplanet_visited
1Earth
1Vogon Constructor Fleet
1Magrathea
2Barnards Star
2Betelgeuse

Explanation

This query retrieves the journey_id and the planets visited from the interstellar_journeys table.

The FLATTEN function is used to expand the stops array from the journey_data JSON column into individual rows, making each planet visit accessible.

For each row in interstellar_journeys, the FLATTEN function creates a new row for every planet in the stops array, extracting the planet’s name with f.value:planet as planet_visited. This process allows detailed analysis of each stop on the interstellar journeys.

Y42 SQL completion user interface

Your AI copilot for Snowflake SQL

Write production-ready SQL with AI-based suggestions.

Get Started