Join our Community Kickstart Hackathon to win a MacBook and other great prizes

Sign up on Discord

Product updates

Explore the latest features and improvements in Y42.

Data diff (Public Beta)
data diff
data quality
pull requests
Data diff (Public Beta)

With data diff, you get a row-by-row and column-by-column comparison between data builds. Look back in time to see which records have changed between scheduled runs, or review how a pipeline update affects the data in your data warehouse. Just like you can review code changes between commits, you can now see how data is updated, deleted, or added.

ℹ️ Please note that the data diff feature is currently in Beta and available for use. As we continue to refine and improve this feature, we encourage users to provide feedback on their experience. Keep in mind that during the beta phase, you may encounter unexpected behavior or changes.

Read on →

You can find data diff in three key areas on the platform:

  • In the Asset Editor (preview mode)
  • In the Build History
  • In Pull Requests
View update
Pull Requests (Public Beta)
pull requests
git
CI
Pull Requests (Public Beta)

You can now create, manage, and review Pull Requests (PRs) right from Y42. Depending on the type of repository your space uses, Y42 will create a Pull Request (GitHub) or Merge Request (GitLab). PRs in Y42 have all the features you expect in your development flow: you can assign reviewers, analyze code changes, approve changes, and run automated CI checks.

PRs are a fundamental tool for engineers, enabling contributors to review code changes before integrating them. A unique feature is that Y42 structures PRs around assets. Rather than going through individual .sql and .yml files, Y42 groups them together so that you immediately see how an asset is affected.

ℹ️ Please note that the Pull Requests feature is currently in Beta and available for use. As we continue to refine and improve this feature, we encourage users to provide feedback on their experience. Keep in mind that during the beta phase, you may encounter unexpected behavior or changes.

Read on →

View update
Introducing CData connectors for data ingestion (Public Beta)

The CData connectors enable ingestion from databases & data warehouses, cloud & SaaS applications, flat files, and unstructured data sources, among others.

ℹ️ Please note that the CData connectors feature is currently in Beta and available for use. As we continue to refine and improve this feature, we encourage users to provide feedback on their experience. Keep in mind that during the beta phase, you may encounter unexpected behavior or changes.

We are rapidly expanding our data source coverage with the CData Connectors. The following are currently supported and ready to use:

Read on →

View update
Anomaly detection tests (Public Beta)
data quality
tests
anomaly detection
Anomaly detection tests (Public Beta)

You can now perform anomaly detection tests directly within your dbt assets through Y42's integration with Elementary, an open-source anomaly detection package designed to improve data quality. This integration enables the discovery of data issues, such as data freshness, volume, or column-level anomalies.

Setting up anomaly detection tests is similar to configuring dbt built-in tests, package tests (like dbt-expectations), or custom tests.

Supported anomaly tests: Column-level anomalies

ℹ️ Please note that the Anomaly detection tests feature is currently in Beta and available for use. As we continue to refine and improve this feature, we encourage users to provide feedback on their experience. Keep in mind that during the beta phase, you may encounter unexpected behavior or changes.

Read on →

View update

Protect your main branch with Y42 out-of-the-box CI checks. The Y42 CI Check, triggered by a pull request (PR) or merge request (MR), generates a status report on affected assets, helping to prevent the merging of problematic code changes.

Use the Y42 CI check to prevent faulty code merges from going live.

Read on →

View update
Python actions
python
automation
Python actions

Python Actions allow you to sync your data pipelines with workflows in Asana, JIRA, Hubspot, or any other tool you prefer. This feature supports a wide range of use cases, such as Reverse ETL data into Salesforce, task creation in Asana, notifications in Slack, data generation in Google Sheets, or data entry into operational databases.

Python Actions provide a Zapier or Make-like functionality, with the added benefits of having all your processing in one place, within one data lineage, and having integrated documentation within your team's existing workflow.

🔄 Learn how you can trigger Census or Hightouch syncs with Python actions.

Read on →

View update
Fivetran packages
sources
fivetran

Use dbt packages for Fivetran to streamline source configuration for your models:

- package: fivetran/package_name
    version: x.xx.x

Find out how →

View update
Managing git identities
git
Managing git identities

You can control which identity performs Git actions in each space.

Learn how →

View update
sql_header and set_sql_header
dbt

SQL headers allow you to manipulate the current session settings, such as roles and timezones, or create functions, directly within your asset's run session.

{{ config(
sql_header="alter session set timezone = 'Europe/London';"
) }}
{% call set_sql_header(config) %}
create or replace ssn_mask(ssn STRING)
returns STRING
language SQL
AS '
    REGEXP_REPLACE(ssn, ''[0-9]'', ''X'') /* 123-45-6789 -> XXX-XX-XXXX */
';
{%- endcall %}

select ssn_mask(ssn) from {{ ref('model_name') }}

Dive deeper →

View update
Python assets preview
python
data preview
Python assets preview

You can now preview Python scripts output and logs before committing.

Read more →

View update

This field indicates when a value was extracted at the source or, if not available, when it was written on Y42's side. There is no required action. Any full import after the apiVersion upgrade will automatically add the _y42_extracted_at column to your tables.

Read more about the impact on the Behavior changes page.

View update
Asset Health integration in Lineage and List modes
devex
observability

The Asset Health is now embedded within both the Lineage and List views to better understand the health of your space.

Toggle the Asseth Health widget in the List view.
View update

When setting up a new space in Y42, you have the option to link your own repository hosted on GitHub Cloud or GitLab Cloud, or opt for the Y42-managed repository hosted on Y42's GitLab instance.

Connect your own repository to Y42.

Connect your own repository to Y42.

Learn more →

View update
Swap git repository
git

You can now swap a space Git repository from the Space settings > Integrations page.

Swap git repository.

Swap git repository.

Learn more →

View update

Python Ingest assets now support logging and incremental loads.

You can record messages using the logging module for your Python ingest assets. Access these logs by navigating to the Logs tab in the asset's Build History.

Visualize Python asset logs in the Builg history tab.

Visualize Python asset logs in the Builg history tab.

Incremental loads

Y42 has introduced support for incremental data loading through the context.state variable in Python-based data ingestion assets. context.state is a dictionary that gets updated with each asset refresh and can store any key-value pair for your data process.

from y42.v1.decorators import data_loader

import requests
import pandas as pd
import json

@data_loader
def todos(context) -> pd.DataFrame:

    prev_state = context.state.get()

    if prev_state: 
        last_update = prev_state['last_update']
        # perform incremental activities, such as filtering the dataset by last_update variable
    else: 
        # perform full refresh activities 

    context.state.set({"last_update": datetime.utcnow()})

    return df 

Learn more →

View update
CTE autocompletion
developer experience

Y42 supports now autocompletion for Common Table Expression (CTEs) code blocks. You have the ability to:

  • Retrieve the names of CTEs referenced within the SQL model.
  • Obtain a complete list of columns for any referenced CTE.
  • Access specific columns from a CTE.

Learn more →

View update
Custom schemas and databases
dbt
custom schemas

By default, all assets are built in the schema/dataset specified in the Branch environment. Each branch can be configured to write to its own schema and dataset.

You can now build assets in schemas and databases other than the default schema and database set in the Branch environments settings page.

{{ config(schema='staging_schema') }}
select ... 
models:
  y42_project:
    staging:
      +materialized: view
      +schema: staging # Assets in `models/staging/ will be built in the "<branch_name>_staging" schema.
    mart:
      +schema: mart # Assets in `models/mart/ will be built in the "<branch_name>_mart" schema.
    # All other assets will be built in the "<branch_name>" schema.

Learn more →

View update

In the data preview section, you can now perform the following operations as well:

  • Group by
  • Pivot
  • Warehouse selection (Snowflake only)

Group by

You can aggregate results and group them by specific columns in the preview. Click on the three dots of any column for more actions, navigate to the second tab with three horizontal lines (aggregations), and select Group By <column_name>.

Group by the resultset.

Group by the resultset.

Pivot

You can pivot the preview result set. Click on the <n>/<m> columns label and toggle Pivot on. You can now select the Row Groups, Values, and Column Labels by dragging columns from the list.

Pivot the resultset.

Pivot the resultset.

Warehouse selection for previewing data (Snowflake only)

For Snowflake-connected spaces, you can select from the available warehouses to preview data.

Select warehouse to use when previewing data.

Select warehouse to use when previewing data.

Learn more →

View update

We've capitalized Snowflake table and column names to be compatible with the default Snowflake behaviour on unquoted columns. This leads to better autocomplete in BI tools and within Y42’s own SQL editor.

Read more about the impact on the Behavior changes page.

View update
Warehouse selection for previewing data
query preview

For Snowflake spaces, you can select any available warehouses to preview data.

Access control overview

Access control overview

Learn more →

View update
Asset catalog view
ux
Asset catalog view

A comprehensive read-only view of all your space assets. At a glance, see how many assets you have by type and quickly identify how many are unhealthy. The Asset Catalog is synchronized with the development modes, and offers detailed asset-level information like documented columns, lineage, asset queries, and more.

Learn more about the Asset catalog →

View update
Python Ingest (Public Beta)
sources
python

Y42 simplifies data integration from external APIs using Python. Our platform handles the infrastructure and eliminates the need for boilerplate code for loading data into the Data Warehouse (DWH).

Your main task is writing Python logic to fetch data into a DataFrame, with each DataFrame representing a unique source table. These can then be modeled and processed downstream like any other source. Furthermore, Python ingest assets are subjected to version control and comply with the Virtual Data Builds mechanism. This ensures consistency and reliability in pipelines that utilize Python Ingest assets as sources.

from y42.v1.decorators import data_loader

import os
import requests
import json
import pandas as pd

@data_loader
def pizza_status(context) -> pd.DataFrame:
    url = "https://database.supabase.co/rest/v1/orders?select=status"
    api_key = context.secrets.get("SUPABASE_PIZZA_POSTGRES")
    headers = {
        'apikey': api_key,
        'Authorization': 'Bearer ' + api_key
    }
    response = requests.get(url, headers=headers)
    data = response.json()
    df = pd.DataFrame(data)

    return df

Read more on how you can run your python scripts in Y42 →

View update
Orchestrated Fivetran syncs
sources
fivetran
Orchestrated Fivetran syncsView update
Public API
api

You can use the public API for more advanced integrations and customizations. Capabilities include retrieving the manifest content, triggering runs by command or asset, and retrieving run information by id or conditions.

curl --request POST \
    --url https://api.y42.dev/api/4/orchestrations/org_slug/space_slug \
    --header 'accept: application/json' \
    --header 'content-type: application/json'

API Documentation ↗

View update
UI for configuring snapshots
dbt
snapshots
ux
UI for configuring snapshots

You can now configure your Snapshots via UI. For those who prefer a more hands-on approach, direct manipulation of the underlying code is still available via our VS Code IDE integration

Learn how to set up snapshots →

View update
Automatically generate staging models
automation
Automatically generate staging models

Generate staging models from seeds and sources with a simple right-click.

View update
Variables in the Y42 build command
orchestration
View update

Introducing a new notification system for branches. When your branch is behind the main branch, a yellow notification icon will alert you, indicating that it's time to sync changes from the main branch into your branch. If this merging result in any conflicts, the icon changes to red, signaling the need for conflict resolution. The one-click merge feature simplifies the process of updating your branch with the latest changes from main.

Access control overview

Access control overview

Learn more about resolving conflicts and using one-click merge here →

View update
Y42-hosted sandbox
sandbox
View update
Incremental predicates support for incremental models
dbt
incremental models

incremental_predicates offer an advanced approach for managing large-volume data in incremental models, justifying further performance optimization efforts. This configuration accepts a list of valid SQL expressions. Note that Y42 does not verify the syntax of these SQL statements.

{{
  config(
    materialized = 'incremental',
    unique_key = 'id',
    cluster_by = ['session_start'],  
    incremental_strategy = 'merge',
    incremental_predicates = [
      "target_alias.session_start > dateadd(day, -7, current_date)"
    ]
  )
}}
..

The above configuration will generate the following MERGE command:

merge into <existing_table> target_alias
    from <temp_table_with_new_records> source_alias
    on
        -- unique key
        target_alias.id = source_alias.id
        and
        -- custom predicate: limits data scan in the "old" data / existing table
        target_alias.session_start > dateadd(day, -7, current_date)
    when matched then update ...
    when not matched then insert ...

Learn more about incremental models →

View update
Column-level lineage
dbt
observability
Column-level lineageView update
Monitoring new features
monitoring
Monitoring new featuresView update
Snapshots
dbt
snapshots

Capture and analyze historical data changes with snapshots.

{{
    config(
      target_database='analytics',
      target_schema='snapshots',
      unique_key='customer_id',
      strategy='check',
      check_cols=['column1', 'name', 'birthdate'],
    )
}}

select * from {{ source('jaffle_shop', 'orders') }}

Learn how to set up snapshots →

View update
Partitioning & clustering for BigQuery assets
dbt
partitioning

Optimize your BigQuery storage and query performance through partitioning and clustering.

{{ config(
    materialized='table',
    partition_by={
      "field": "orderdate",
      "data_type": "date",
      "granularity": "month"
    },
    
    cluster_by = ['customerid', 'orderid'],
  )
}}

with orders AS (
  select 
    orderid, 
    customerid, 
    employeeid, 
    orderdate, 
    price
  from {{ source('mdm-prod', 'orders') }}
)
select * from orders

Learn about partitioning and clustering configurations →

View update
Seeds
sources
seeds

You can upload and reference CSV files in your Y42 pipelines.

Add a CSV seed file.

Read more →

View update
Source data freshness checks
sources
freshness

Configure source data freshness tests to halt the execution of downstream assets if the source asset is stale.

version: 2

sources:
  - name: pizza_shop
    database: raw

    freshness: # default freshness
      error_after: {count: 24, period: hour}

    loaded_at_field: _etl_loaded_at

    tables:
      - name: customers # this will use the freshness defined above

      - name: orders # this will use the more specific freshness below
        freshness: # make this a little more strict
          error_after: {count: 12, period: hour}

Set up source data freshness tests →

View update
Data preview for each job run
query preview
Data preview for each job run

You can preview data related to the current or of previous materializations of an asset.

View update
Partial SQL query preview and compile, and new keyboard shortcuts

You can now preview and compile either the entire SQL query or parts of it.

Alternatively, you can use CMD/CTRL + ENTER to preview data, or CMD/CTRL + SHIFT + ENTER to compile queries.

Find out how →

View update

With published assets, you can turn your branches into data warehouse environments. This feature is especially useful for teams who need to rapidly test changes in isolated environments before merging into main.

For materialized tables, views are now replaced by more efficient zero-copy clone (in Snowflake) or a table clone (in BigQuery). If the asset is materialized as a view, the published asset will remain a view.

With clones, you can leverage BigQuery's wildcard table feature to query multiple tables simultaneously using a single SQL statement. This enables a more efficient way to handle datasets that span across multiple tables.

Discover more about publishing assets →

View update
Customizable SQLFluff configurations
dbt
linting
sqlfluff

You can customize SQLFluff rules by adding a .sqlfluff file at the root level of your project. Here's an example:

[sqlfluff:rules]
allow_scalar= True
single_table_references = consistent
unquoted_identifiers_policy = all 

[sqlfluff:rules:capitalisation.keywords]
capitalisation_policy = upper

Explore the SQLFluff integration →

View update
Exposures
dbt
exposures
Exposures

You can now define exposures to group relevant upstream assets together, outlining the data required for external use, including dashboards, notebooks, data apps, or ML use cases.

Read more →

View update
Asset health status indicator
observability
Asset health status indicator

Gain a deeper understanding of your assets with the new health status indicators. Learn how thes health status is derived to streamline your pipeline development process.

The Stale status is triggered when the asset configurations are changed, impacting downstream assets linked to it. The stale status serves as a notification that these linked models may now contain outdated information due to recent changes.

You can use the following command to build all stale assets in a space:

y42 build --stale

Read more →

View update
Asset Tiers
observability
Asset Tiers

Assign different tier levels to your assets to prioritize and manage them more effectively. These tiers assist in distinguishing the criticality and importance of each asset, thereby aiding in optimal resource allocation and focus.

❗ It's important to note that assigning an asset to a particular tier does not influence its health status. The tier levels are primarily for organizational and prioritization purposes, allowing for a more structured approach to asset management.

Read more →

View update
Full data query preview
ux
query preview

In addition to previewing the top 100 rows, you can now materialize the preview query as a table inside the Data Warehouse with a 24-hour time-to-live period.

With a full data preview you can filter across all your asset rows, not just the first 100. This is handy for verifying specific row data before committing changes and building your asset.

Read more →

View update
Refresh package dependencies
dbt
packages
Refresh package dependencies

The command menu now facilitates downloading or refreshing of package content to be displayed in the Y42 UI. The content is stored in the dbt_packages folder within the file system, and it can be parsed and displayed across various UI modes including catalog, lineage, and code.

Read more →

View update
dbt Analyses
dbt
analyses

dbt analyses serve as a tool to manage SQL statements that aren't intended to be materialized in your data warehouse, promoting version control of analytical SQL files within your dbt project. They can be accessed in code editor mode to preview data.

Read more →

View update

Y42 now clearly separates the bottom drawer into two: local and remote, showcasing more clearly that previews are based on local changes, whereas each build is executed based on the state of the current commit.

View update
Keyboard shortcuts in the help drawer
shortcuts
Keyboard shortcuts in the help drawer

You can find all keyboard shortcuts within the help drawer now.

View update
dbt hooks
dbt
hooks

Y42 now supports dbt pre- and post-hooks, primarily used for data warehouse administration tasks like masking sensitive columns. These hooks run immediately before and after the main query and are treated as a single operation.

If a hook fails, Y42 will not update your latest valid job, preventing downstream models from referencing tables created by the failed job. This design choice adds a layer of security and control, especially for tasks like masking sensitive customer data.

{{ config(
    pre_hook="SQL-statement" | ["SQL-statement"],
    post_hook="SQL-statement" | ["SQL-statement"],
) }}

select ...

Read more →

View update
Linting with SQLFluff
dbt
linting
sqlfluff
Linting with SQLFluff

SQLFluff is a SQL linter that improves your SQL code quality and development workflow. SQLFluff helps find issues in your SQL code to enforce a consistent code style and early detection of errors.


You can activate an auto-fix for most issues detected in the linting:

Read more →

View update
Unpublishing assets
publishing
Unpublishing assets

Users can now unpublish specific assets from virtual branch environments within Y42, giving precise control over the assets' visibility for downstream consumption.

Read more →

View update
Enhanced published assets
publishing
Enhanced published assets

The latest update to Y42 allows you to easily convert specific branches, not just the main branch, into datasets (Bigquery) or schemas (Snowflake) for virtual environments. Naming conventions have also been standardized offering more control and consistency.

Read more →

View update
Multi-branch orchestration
orchestration
Multi-branch orchestration

Orchestration allows you to automatically build assets and pipelines on a schedule. By default, scheduled builds and alerts only work on the main branch. If you need orchestrations to run on multiple branches, especially useful when you have multiple virtual environments, you can enable this feature from here. Set up which branch will be enabled for build schedules to run automatically.

View update
Auto-delete unused tables
governance
Auto-delete unused tables

Y42 keeps the materialization of each physical UUID table within the DWH, allowing you to easily rollback using the Virtual Data Builds mechanics. By default, these tables are kept for 30 days in your data warehouse. You can customize the timeframe for when the physical UUID tables should be deleted.

The expiration logic for deleting tables is based on two criteria:

1. The job is older than 30 days or older than what the user has configured

2. It is not the latest valid job run on any existing branch's head

Read more →

View update
New Airbyte sources
sources
New Airbyte sourcesView update
Incremental models
dbt
incremental

Incremental models allows you to configure models to load only the latest rows during each run, resulting in faster runtime for your data pipelines. Incremental models also help save costs, such as in Snowflake processing, as less data means less time spent running the Virtual Warehouse.

{{ config(materialized='incremental') }}

select
    *,
    my_slow_function(my_column)
from raw_app_data.events

{% if is_incremental() %}

  -- this filter will only be applied on an incremental run
  where event_time > (select max(event_time) from {{ this }})

{% endif %}

Read more →

View update
New Airbyte sources
sources
New Airbyte sources

We've added several new Airbyte sources:

- Google Anlaytics 4

- Google Ads

- Salesforce

- Hubspot

- Amazon Ads

- Shopify

Read more →

View update