Skip to content

Lineage

Lineage page the main interface to Recce and how you can quickly determine the zone of impact of any modeling changes.

Lineage Diff

It's from Lineage Diff that you will determine which models to investigate further to validate your changes.

Recce Lineage Diff

Lineage Diff

Node Summary

  • Models are color coded to indicate added, removed, and modified models.
  • The bottom icon indicates if there is row count changed or schema changed detected. A row count changed icon is only shown if there is row count diff executed on this node.
  • Click a model to view the Node detail and perform other checks.

Select Models

By clicking the Select models button, you can select multiple nodes for further operations. For detail, see the [Multi Nodes Selections] section (#multi-nodes-selection)

Filter Nodes

By clicking the Filter nodes button, you use different aspect to view the nodes

  1. View Mode:
    • Changed Models: Modified nodes and their downstream + 1st degree of their parents.
    • All: Show all nodes.
  2. Package: Filter by dbt package names.

Node Detail

Schema Diff

Note

Schema Diff requires catalog.json in both environments.

Schema Diff shows added, removed, and renamed columns. Click a model in the Lineage DAG Diff to view the Schema Diff.

Recce Schema Diff

Schema Diff

Recce Schema Diff

Schema Diff showing renamed column

Row Count Diff

Row Count Diff shows the difference in row count between the base and current environments.

Recce Row Count Diff - Single model

Row Count Diff - Single model

Code Diff

Code diff

  1. Select the model from the Lineage DAG.
  2. Click the Diff button on the upper right corner.

Value Diff

Note

Value Diff uses the compare_column_values from audit-helper. To use Value Diff, ensure that audit-helper is installed in your project.

packages:
  - package: dbt-labs/audit_helper
    version: <version>

Value Diff shows the matched count and percentage for each columns in the table. It use the primary key(s) to uniquely identify the records between the model in both environments.

The primary key is automatically inferenced by the first column with the unique test. If no primary key is detected, at least one column required to be specified as primary key.

Recce Value Diff

Value Diff
  • Added: Newly added PKs.
  • Removed: Removed PKs.
  • Matched: For a column, the count of matched value of common PKs.
  • Matched %: For a column, the ratio of matched over common PKs.

You can query all the diff records from the value diff result.

Profile Diff

Note

Profile diff uses the get_profile from dbt-profiler. To use Profile Diff, ensure that dbt-profiler is installed in your project.

packages:
  - package: data-mie/dbt_profiler
    version: <version>

Profile Diff compare the basic statistic (e.g. count, distinct count, min, max, average) for each columns between two environments.

  1. Select the model from the Lineage DAG.
  2. Click the Advanced Diffs button

Recce Profile Diff

Profile Diff

Please reference dbt-profiler for the definition of the profiling stats.

Histogram Diff

Histogram Diff compares the distribution of a numeric column in an overlay histogram chart.

Recce Histogram Diff

Histogram Diff
  1. Select the model from the Lineage DAG.
  2. Click the Advanced Diffs buton and select Histogram Diff.
  3. Select the column to diff and click Execute.

Generate a Recce Histogram Diff

Generate a Recce Histogram Diff

Top-K Diff

Top-K Diff compares the distribution of a categorical column. The top 10 elements are shown by default. This can be expanded to the the top 50 elements.

Recce Top-K Diff

Recce Top-K Diff
  1. Select the model from the Lineage DAG.
  2. Click the Advanced Diffs buton and select Top-K Diff.
  3. Select the column to diff and click Execute.

Generate a Recce Top-K Diff

Generate a Recce Top-K Diff

Multi Nodes Selection

Select Models

  1. Click the Select models button
  2. Select one or more nodes
  3. or right click on a nodes, you can Select parent nodes or Select child nodes
  4. Click the action in the multi select control bar.

Row Count Diff

Row Count Diff shows the difference in row count between the base and current environments.

Recce Row Count Diff - Multiple models

Row Count Diff - Multiple model

Value Diff

Recce Value Diff - Multiple models

Screenshot

In the diff result, we can find a Copy to Clipboard button. it's a handy feature to copy the result image to clipboard and paste in your PR comment.

Note

FireFox does not support to copy image to clipboard. Recce show a modal instead. You can download the image to local or right-click on the image to copy the image.

Add to Checklist

In the lineage page, we can run different type of check. However, for these reason we would like to add to checklist

  1. Keep the check and I can rerun this after my code change
  2. Add my result and interpretation for review purpose

To add the checklist,

  1. Lineage
    • All nodes: Click Add lineage diff check button to add all lineage
    • Partial nodes: Click Select models button > select nodes > Click Add lineage check
  2. Schema
    • Single node: Click a model > Add check > Schema check
    • Multiple nodes: Click Select models button > select nodes > Click Add schema check
  3. Row count diff:
    1. Click Select models button
    2. Select nodes
    3. Click Row count diff
    4. Select a model
    5. Click Add to checklist
  4. Other Diffs:
    1. Execute the diff
    2. Click Add to checklist