Selecting assets to run

Selecting assets to run

Overview

The y42 build command materializes and tests assets in DAG order for selected assets or the entire space. The command supports asset selection syntax and various flags for customization.

Run the Y42 build command

Run the `y42 build` command

Selecting assets to build

By default, y42 build executes all committed assets in the space. However, you can select a subset of assets to include in the build DAG using the --select and --exclude command flags. When used in conjunction with graph and set operators, these options allow you to actively select and exclude specific assets, enabling you to perform targeted execution of Y42 build tasks.

Anatomy of a build command

Build command

Command flags

  • --select / -s
  • --exclude
  • --vars
  • --stale / --no-stale
  • --selector
  • --resource-type
  • --full-refresh / -f
  • --fail-fast / -x
  • --no-fail-fast
  • --max-runtime-source / --max-runtime-model / --max-runtime-seed / --max-runtime-all
  • --retry-attempts / --retry-attempts-model / --retry-attempts-source / --retry-attempts-seed
  • --retry-interval / --retry-interval-model / --retry-interval-source / --retry-interval-seed

Graph operators

  • + selects parents or children
  • @ selects children and all of its children's upstream assets
  • +N selects parents or children up to N edges away
  • * selects matched assets within a directory

Methods

  • source: selects assets that depend on a specified source
  • exposure: selects parents of a specified exposure
  • tag: selects assets that have a specific tag

Set operators

  • space-delineation - selects union of options
  • comma-delination - selects intersect of options
CLI

y42 build --select model_1+
--vars '{key: value, date: 20180101}’

Command flags

  • --select / -s: Selects all matched assets.

  • --exclude: Excludes all matched assets.

  • --stale: Filters for stale-only jobs, excluding assets that have not yet been materialized.

  • --no-stale: Excludes stale jobs, focusing on active ones only.

  • --vars: This argument allows you to override variables defined in your dbt_project.yml file. It accepts a YAML dictionary in string format, for example, {my_variable: my_value}. See this example for more details.

  • --selector: Uses a selector name defined in selectors.yml.

  • --resource-type: Limits the command to specific resource types: [source, model, seed, all].

  • --fail-fast / -x: Stops execution at the first encountered failure, canceling the execution of any other asset, even for the ones independent of the failed asset.

  • --no-fail-fast: Skips the execution of downstream assets connected to any failed ones, while continuing the execution of assets not impacted by the failures. This behavior mirrors the default execution mode, without any flags.

  • --full-refresh / -f: Performs a full import on every selected asset, useful for incremental sources or models that require a full update.

  • --max-runtime-source / --max-runtime-model / --max-runtime-seed / --max-runtime-all: These flags define the maximum runtime limits for various asset types to prevent early failures in long-running tasks. See the example below for how to apply these settings.

  • --retry-attempts / --retry-attempts-model / --retry-attempts-source / --retry-attempts-seed: Specify the number of retry attempts after a failure. See the example below for how to apply these settings.

  • --retry-interval / --retry-interval-model / --retry-interval-source / --retry-interval-seed: Define the delay (in minutes) between retries. See the example below for how to apply these settings.

Setting Maximum Runtime for Assets Execution

Example:

CLI

_10
# Sets maximum runtime for sources to 6 hours and for all other asset types to 1 hour
_10
y42 build --max-runtime-source 6h --max-runtime-all 1h

Possible Values:

  • Minimum value: 1 hour
  • Maximum value: 7 days
  • Examples: 1h, 2h20m, 48h, 4h30m20s, 2d6h30m, 2d6h, 2d

Default Values:

  • --max-runtime-source, --max-runtime-model, --max-runtime-seed : 48 hours
  • --max-runtime-all: 4 hours

Retry Configuration Flags

Examples:

CLI

_17
# Apply global retry attempts and specific retries for models with a longer interval
_17
y42 build --retry-attempts 2 --retry-attempts-model 5 --retry-interval-model 5
_17
_17
# Set retry attempts globally and override specifically for sources
_17
y42 build --retry-attempts 2 --retry-attempts-source 5
_17
_17
# Configure retries and intervals for sources
_17
y42 build --retry-attempts 2 --retry-attempts-source 5 --retry-interval-source 10
_17
_17
# Set global retry attempts without affecting sources
_17
y42 build --retry-attempts 5 --retry-attempts-source 1
_17
_17
# Define a longer retry interval for seeds while maintaining general interval settings for other assets
_17
y42 build --retry-attempts 5 --retry-interval-seed 10
_17
_17
# Enable retries only for models
_17
y42 build --retry-attempts-model 5

Default Values:

  • --retry-attempts: Minimum 1, maximum 10, default 1.
  • --retry-interval: Minimum 1 minute, maximum 60 minutes, default 1 minute.

FAQ

What mode does the y42 build command trigger?

By default, the y42 build command initiates an incremental run, unless otherwise specified. This incremental run is triggered if:

  • The model incorporates a specific materialization configuration. If there is no specific materialization type specified, y42 build performs a full refresh of the model.
  • The source table supports incremental updates.

To override the default configurations and enforce a full refresh, you should append the --full-refresh flag at the end of the y42 build command. This will ensure that a complete refresh is performed.

What happens when I rename an asset?

Renaming an asset changes its lineage hash, which requires a full-refresh in the next run. To avoid this, you can revert to the asset's original name.

The DAG is running long or timing out. How can I improve the performance?

  • Review the model logic. Simplify by breaking down complex transformations or excessive joins into multiple models.
  • Consider using a different materialization type for your model and/or upstream models.
  • Optimize SQL queries. While specific optimizations depend on the warehouse, general improvements include using GROUP BY instead of DISTINCT, or UNION ALL instead of UNION.
  • Utilize warehouse-specific features, such as partitioning and clustering in BigQuery.

Note: The above solutions are particularly helpful in addressing common issues like the compilation memory exhausted error in Snowflake. However, different warehouses may have unique errors and solutions.