CData connectors

Ingest data using CData - Public Beta

Overview

Y42 offers managed ingestion powered by CData (opens in a new tab) to ingest your data. For each data ingestion job, Y42 creates an instance of the CData Python connector (opens in a new tab).

To ingest your data using CData, you need to create the following two resources in your Y42 space:

  • Source: For each CData connector you will need to create a source object. Sources provide metadata information about external data sources, such as table or column-level information.
  • Connection: Connections hold information about authentication. When you set up a source, you don't need to worry about creating additional accounts, you can use your existing account with the data provider (e.g., username and password for database sources or an API token).

For a deeper dive into features of each connector, please refer to the individual documentation page for each source.

CData specific features

CData exposes a SQL-like interface to interact with third-party data sources. The Y42-Asset Editor List view exposes an intuitive way to create and configure a source before selecting the tables and columns that you want to ingest. For some third-party data sources (e.g., analytical sources like Facebook Ads or Google Analytics), more complex queries are necessary to retrieve the data in a format that is useful to you. For this purpose, we expose Custom Filters and Custom Queries via the Y42 Code mode.

Custom Filters

Custom filters are necessary if the third party data source does not support the naive retrieval of all data for a given table. If an import with default settings fails, it usually indicates which column needs an additional custom filter.

To add custom filters, switch into Code mode and open the source and table that you want to configure. You can add custom filters in sources[*].tables[*].config.y42_table.filters. An example for Google Ads looks like this:


_16
version: 2
_16
sources:
_16
- name: googleads_example
_16
config:
_16
y42_source:
_16
type: cdata-googleads
_16
connection: my-secret
_16
tables:
_16
- name: AdGroupAd
_16
config:
_16
y42_table:
_16
filters:
_16
- column: 'Date'
_16
operator: '>='
_16
value: '2024-01-01'
_16
import: AdGroupAd

Supported operators are: =, >, <, >=, <=, and IN. They all expect a single value, aside from IN which expects a list of values.

Custom Queries

Custom queries are useful if filter conditions are not sufficient to retrieve the data you need. You can create relations to other tables on the same source and use them for filtering or represent complex filter conditions or specific selects.

To create a custom query, switch into Code mode and open the source you want to configure. You can create a custom query like any other table in code mode, e.g.:


_12
version: 2
_12
sources:
_12
- name: shipstation_example
_12
config:
_12
y42_source:
_12
type: cdata-shipstation
_12
connection: my-secret
_12
tables:
_12
- name: CustomCarrierPackages
_12
config:
_12
y42_table:
_12
query: "SELECT * FROM carrierpackages WHERE carriercode IN (SELECT carriercode FROM carriers)"

Please refer to the CData documentation for the respective connector (opens in a new tab) for details on which tables and columns are available and their limitations.

FAQ

Do I need to whitelist any IP addresses?

Yes, you do. The IP address you'll need to whitelist is 35.198.72.34. All data extraction requests come from Y42, and this IP address is where those requests originate. To establish a successful connection with a database, this IP must be whitelisted.

Do I need to set up additional accounts to make it work?

No. Y42 takes care of all the computation. There's no need to worry about setting up additional accounts to get things up and running.

Can I verify the row count of a table after data ingestion?

Yes, you can inspect the row count and other ingestion statistics easily. Just go to the bottom section of the app and hover over (or click on) an ingestion job that successfully finished.

What kind of sync mode will be performed when a scheduled build updates a source table?

If a source supports incremental imports, a scheduled build will, by default, trigger an incremental import. To perform a full refresh, you need to override the default configurations and include the --full-refresh flag at the end of the y42 build command.