Overview

Ingesting data

In Y42, sources have a broader application compared to dbt. Apart from declaring raw data in the data warehouse that you do not control, sources in Y42 can also be used to perform ingestion tasks using Airbyte. This adds another layer of functionality and control for managing ELT data pipelines.

Y42 allows for a variey of sources to integrate data from.

Y42 allows for a variey of sources to integrate data from.

There are five types of source assets available:

  1. Reference sources, which is used to reference data in your data warehouse.
  2. Airbyte connectors, which allows you to ingest data from third-party services using Y42-managed Airbyte connectors (opens in a new tab).
  3. CData connectors, which allows you to ingest data from third-party services using Y42-managed CData connectors (opens in a new tab).
  4. Fivetran sources, which allows you to trigger and manage Fivetran connector syncs.
  5. Python sources, which allows you to ingest data from APIs.

Reference sources

Reference sources create views of other datasets, in your Y42 space's dataset (BigQuery) or schema (Snowflake). Referencing data involves pointing to the location of the data in its origin source rather than replicating it. This method is particularly useful when regulations or data volume make it impractical to copy the data into your Y42 space's dataset/schema.

CData & Airbyte connectors

Y42 provides a managed service for running data ingestion jobs. To make this possible, Y42 leverages CData and Airbyte to securely extract and load data from external sources to Y42.

Fivetran sources

If you utilize Fivetran for your data ingestion process, you can seamlessly integrate with Fivetran to trigger and manage syncs.

Python sources

The Python source asset allows you define your custom logic to extract data, with no infrastructure setup or boilerplate code to load data into your data warehouse.