Recce CI integration with GitHub Action
Recce provides the recce run
command for CI/CD pipeline. You can integrate Recce with GitHub Actions (or other CI tools) to compare the data models between two environments when a new pull-request is created.
The following guide demonstrates how to configure Recce in GitHub Actions.
Prerequisites
Before integrating Recce with GitHub Actions, you will need to configure the following items:
-
Set up two environments in your data warehouse. For example, one for production and another for development.
-
Provide the credentials profile for both environments in your
profiles.yml
so that Recce can access your data warehouse. You can put the credentials in aprofiles.yml
file, or use environment variables. -
Set up the data warehouse credentials in your GitHub repository secrets.
Set up Recce with GitHub Actions
We suggest setting up two GitHub Actions workflows in your GitHub repository. One for the production environment and another for the development environment.
-
Production environment workflow: Triggered on every merge to the
main branch
. This ensures that production artifacts are readily available for use when a PR is opened. -
Development environment workflow: Triggered on every push to the
pull-request branch
. This workflow will compare production models with the current development environment.
Production Workflow (Main Branch)
This workflow will perform the following actions:
- Run dbt on the production environment.
- Upload the generated artifacts to S3 for later use.
name: Recce CI Base Branch
on:
push:
branches:
- main
concurrency:
group: recce-ci-base
cancel-in-progress: true
jobs:
build:
name: DBT Runner
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: "3.10.x"
- name: Install dependencies
run: |
pip install -r requirements.txt
- name: Run DBT
run: |
dbt deps
dbt seed --target ${{ env.DBT_BASE_TARGET }} --target-path target-base
dbt run --target ${{ env.DBT_BASE_TARGET }} --target-path target-base
dbt docs generate --target ${{ env.DBT_BASE_TARGET }} --target-path target-base
env:
# Set the dbt target name of the base environment
DBT_BASE_TARGET: prod
- name: Package DBT artifacts
run: |
tar -czvf dbt-artifacts.tar.gz target-base
mv dbt-artifacts.tar.gz $GITHUB_WORKSPACE/${{ github.sha }}.tar.gz
- name: Upload to S3
run: |
aws s3 cp $GITHUB_WORKSPACE/${{ github.sha }}.tar.gz s3://${{ env.AWS_S3_BUCKET }}/${{ github.sha }}.tar.gz
env:
# Set these in your repository secrets
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
# Set these in your repository secrets
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
# Set these in your repository secrets
AWS_REGION: ${{ secrets.AWS_REGION }}
# Set these in your repository secrets
AWS_S3_BUCKET: ${{ secrets.AWS_S3_BUCKET }}
Development Workflow (Pull Request Branch)
This workflow will perform the following actions:
- Run dbt on the development environment.
- Download previously generated production artifacts from S3.
- Use Recce to compare the current environment with the downloaded production artifacts.
- Use Recce to generate the summary of the current changes and post it as a comment on the pull request. Please refer to the Recce Summary for more information.
name: Recce CI Current Branch
on:
pull_request:
branches: [main]
jobs:
check-pull-request:
name: Check pull request by Recce CI
runs-on: ubuntu-latest
permissions:
pull-requests: write
steps:
- name: Checkout repository
uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.10.x"
- name: Install dependencies
run: |
pip install -r requirements.txt
- name: Install Recce
run: |
pip install recce
- name: Prepare DBT Base environment
run: |
if aws s3 cp s3://$AWS_S3_BUCKET/${{ github.event.pull_request.base.sha }}.tar.gz .; then
echo "Base environment found in S3"
tar -xvf ${{ github.event.pull_request.base.sha }}.tar.gz
else
echo "Base environment not found in S3. Running dbt to create base environment"
git checkout ${{ github.event.pull_request.base.sha }}
dbt deps
dbt seed --target ${{ env.DBT_BASE_TARGET }} --target-path target-base
dbt run --target ${{ env.DBT_BASE_TARGET }} --target-path target-base
dbt docs generate --target ${{ env.DBT_BASE_TARGET }} --target-path target-base
fi
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_REGION: ${{ secrets.AWS_REGION }}
AWS_S3_BUCKET: ${{ secrets.AWS_S3_BUCKET }}
# Set the dbt target name of the base environment
DBT_BASE_TARGET: prod
- name: Prepare DBT Current environment
run: |
git checkout ${{ github.event.pull_request.head.sha }}
dbt deps
dbt seed --target ${{ env.DBT_CURRENT_TARGET }}
dbt run --target ${{ env.DBT_CURRENT_TARGET }}
dbt docs generate --target ${{ env.DBT_CURRENT_TARGET }}
env:
# Set the dbt target name of the current environment
DBT_CURRENT_TARGET: dev
- name: Run Recce CI
run: |
recce run --github-pull-request-url ${{ github.event.pull_request.html_url }}
- name: Archive Recce State File
uses: actions/upload-artifact@v4
id: recce-artifact-uploader
with:
name: recce-state-file
path: recce_state.json
- name: Prepare Recce Summary
id: recce-summary
run: |
recce summary recce_state.json > recce_summary.md
cat recce_summary.md >> $GITHUB_STEP_SUMMARY
echo '${{ env.NEXT_STEP_MESSAGE }}' >> recce_summary.md
# Handle the case when the recce summary is too long to be displayed in the GitHub PR comment
if [[ `wc -c recce_summary.md | awk '{print $1}'` -ge '65535' ]]; then
echo '# Recce Summary
The recce summary is too long to be displayed in the GitHub PR comment.
Please check the summary detail in the [Job Summary](${{github.server_url}}/${{github.repository}}/actions/runs/${{github.run_id}}) page.
${{ env.NEXT_STEP_MESSAGE }}' > recce_summary.md
fi
env:
ARTIFACT_URL: ${{ steps.recce-artifact-uploader.outputs.artifact-url }}
NEXT_STEP_MESSAGE: |
## Next Steps
If you want to check more detail inforamtion about the recce result, please download the [artifact](${{ steps.recce-artifact-uploader.outputs.artifact-url }}) file and open it by [Recce](https://pypi.org/project/recce/) CLI.
### How to check the recce result
```bash
# Unzip the downloaded artifact file
tar -xf recce-state-file.zip
# Launch the recce server based on the state file
recce server --review recce_state.json
# Open the recce server http://localhost:8000 by your browser
- name: Comment on pull request
uses: thollander/actions-comment-pull-request@v2
with:
message: |
Recce `run` successfully completed.
Please download the [artifact](${{ env.ARTIFACT_URL }}) for the state file.
env:
ARTIFACT_URL: ${{ steps.recce-artifact-uploader.outputs.artifact-url }}
Review the Recce State File
Review the downloaded Recce state file with the following command:
In the Recce server --review
mode, you can review the comparison results of the data models between the base and current environments. It will contain the row counts of modified data models, and the results of any Recce Preset Checks.