Snowflake
Calculate Percentiles

Snowflake PERCENTILE: How to Calculate Percentiles

Percentile functions in Snowflake, specifically PERCENTILE_CONT and PERCENTILE_DISC, are used to find the value at a specified percentile in your data. They help in understanding the distribution of data points by providing a value below which a given percentage of observations fall.

What They Do

  • PERCENTILE_CONT: This function calculates the percentile based on a continuous distribution of your data. If the exact percentile falls between two data points, it interpolates (or estimates) the value, providing a smooth and precise measurement.
  • PERCENTILE_DISC: In contrast, this function calculates the percentile using a discrete distribution. It picks the closest actual data point without estimating between values, giving you the exact data point that corresponds to or surpasses the percentile.

Syntax

Both functions are used similarly but with a key difference in how they calculate the percentile:


_10
-- For continuous interpolation
_10
PERCENTILE_CONT(<percentile>) WITHIN GROUP (ORDER BY <order_by_expr>) [OVER ([PARTITION BY <expr3>])]
_10
_10
-- For discrete values
_10
PERCENTILE_DISC(<percentile>) WITHIN GROUP (ORDER BY <order_by_expr>) [OVER ([PARTITION BY <expr3>])]

where:

  • <percentile>: The percentile to find (between 0.0 and 1.0). For example, 0.9 for the 90th percentile.
  • <order_by_expr>: The column by which to order the data. This column is also where the percentile value is picked from.
  • <expr3>: (Optional) Used to divide data into partitions, treating each as a separate group for percentile calculations.

Key concepts

  • 📈 Interpolation vs. Exact Match: Choose PERCENTILE_CONT for a smoothed estimate or PERCENTILE_DISC for the exact data point matching the percentile.
  • 🔢 Data Ordering: Both functions require an order specification because percentiles are inherently dependent on the ranking of data.
  • No DISTINCT Support: These functions do not support the DISTINCT keyword.
  • 🔄 Grouping Optional: Can be used with or without grouping. Without PARTITION BY, the entire data set is considered a single group.

Examples

Finding the 90th percentile with PERCENTILE_CONT and PERCENTILE_DISC

Imagine you have a table student_scores that lists students’ scores on a math test. You want to find the 90th percentile score to understand how the top 10% of students performed.

Input

Table: student_scores

student_idnamemath_score
1John Doe88
2Jane Smith92
3Alice Johnson85
4Dave Wilson90

Snowflake SQL Query


_10
-- Continuous percentile calculation
_10
SELECT PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY math_score)
_10
FROM student_scores;
_10
_10
-- Discrete percentile calculation
_10
SELECT PERCENTILE_DISC(0.9) WITHIN GROUP (ORDER BY math_score)
_10
FROM student_scores;

Output

PERCENTILE_CONT: 91.4

PERCENTILE_DISC: 92

Explanation

  • PERCENTILE_CONT(0.9): This function calculates the 90th percentile using continuous distribution, which means it can interpolate between existing data points if the percentile doesn’t exactly match an existing value. The 90th percentile falls between the scores of 90 and 92, which represent the 75th and 100th percentiles, respectively. Since the 90th percentile is 15% of the way between these scores, the interpolation calculates: 90 + (0.9-0.75) * [(92-90)/1.0-0.75] = 91.4
  • PERCENTILE_DISC(0.9): This function chooses the smallest value that is at least the 90th percentile. Since 90 (3rd rank) is the 75th percentile and 92 is the next score and also the maximum score here, 92 is chosen directly by PERCENTILE_DISC for the 90th percentile.
Y42 SQL completion user interface

Your AI copilot for Snowflake SQL

Write production-ready SQL with AI-based suggestions.

Get Started