Anomaly detection

Anomaly detection tests - Public Beta

Y42 offers tests for detecting data quality issues. Y42 data tests are set up and run as native tests within your dbt assets.

Y42 tests complement dbt built-in tests, package tests (like dbt-expectations), and custom tests.

Y42 integrates Elementary (opens in a new tab), an open-source anomaly detection package, to monitor data quality. It focuses on specific metrics such as row count, null rate, and average value, comparing recent measurements with historical data. This comparison helps to identify significant changes and deviations, likely indicating data reliability issues.

When a test is executed, your data is divided into time buckets according to the time_bucket field, and constrained by the training_period variable. The test then compares a specific metric (e.g., row count) from buckets within the detection_period against the metric from all prior buckets during the training_period. If anomalies are detected within the detection_period timeframe, the test fails.

Image source: Elementary data.

Image source: Elementary data.

Detection method

The method uses the Z-score to identify anomalies. It calculates how far a data point is from the mean in terms of standard deviations. The Z-score thresholds are:

  • ~68% of data points fall within a Z-score of 1 or less.
  • ~95% of data points fall within a Z-score of 2 or less.
  • ~99.7% of data points fall within a Z-score of 3 or less.

A Z-score above 3 indicates an outlier. This threshold is adjustable using the anomaly_sensitivity variable.

Supported tests

Anomaly detection testSupported?