Airflow
Develop, test, and deploy Airflow DAGs using the skale airflow commands.
Initialize a project
skale airflow initScaffolds a new Airflow 3 project in the current directory:
.
├── Dockerfile # FROM ghcr.io/skaledata/airflow:<version> + comments
├── README.md # project layout, CLI commands, deploy workflow
├── requirements.txt # pip deps — auto-installed via ONBUILD
├── packages.txt # apt deps — auto-installed via ONBUILD
├── dags/example_dag.py # example DAG to get you started
├── plugins/ # project-specific Airflow plugins
├── tests/ # DAG tests (pytest)
├── .gitignore
└── .dockerignoreThe Dockerfile is a single FROM line — the
SkaleData Airflow base image pre-installs everything you need
and auto-picks-up requirements.txt + packages.txt from the build-context
root. No COPY / RUN boilerplate required.
Layout matches Astronomer’s Astro Runtime
so anyone migrating from astro finds a familiar shape.
Local development
Start Airflow locally
skale airflow startBuilds the Docker image from your Dockerfile, starts all services (api-server, scheduler, dag-processor, triggerer, postgres), and waits for the api-server to be healthy.
If your project is bound to a cluster (via --cluster or .skaledata.yaml), the local environment is configured with the same secrets backend and cloud credentials as your deployed instance.
# Start with cluster credentials
skale airflow start --cluster analytics-prodStop Airflow
skale airflow stopGracefully stops all containers. Preserves volumes and data — use skale airflow start to resume.
Restart Airflow
skale airflow restartStops and restarts all containers without rebuilding. Useful after config changes.
Destroy local environment
skale airflow killStops and removes all containers, networks, and volumes. Deletes your local Postgres data. Use skale airflow init + start to start fresh.
Open a shell
# Default: scheduler container
skale airflow bash
# Specific container
skale airflow bash webserverValid containers: scheduler, api-server, dag-processor, triggerer, postgres.
Run Airflow CLI commands
skale airflow run dags list
skale airflow run tasks test my_dag my_task 2024-01-01Executes an Airflow CLI command inside the scheduler container.
Deploying
Full deploy
skale airflow deploy --cluster analytics-prodBuilds the Docker image, pushes it to the cluster’s container registry, and triggers a rolling deploy. The first time you run this, pass --cluster — the binding is saved to .skaledata.yaml so subsequent deploys just need:
skale airflow deployDAG-only deploy
skale airflow deploy --dag-onlyUploads your dags/ folder to cloud storage (GCS / S3 / Azure Blob). The Airflow scheduler picks up changes within 30 seconds via a sync sidecar. No image build, no downtime.
Deploy flags
| Flag | Description |
|---|---|
--cluster <id> | Target cluster (saved to .skaledata.yaml after first use) |
--app <name> | Airflow instance name (for clusters with multiple Airflows) |
--tag <tag> | Image tag (defaults to git SHA) |
--force-image | Force a full image build even if only DAGs changed |
--dag-only | Upload DAGs only, skip image build |
Refresh credentials
skale airflow refreshRe-mints short-lived cloud credentials for the secrets backend without restarting containers.
- GCP / Azure: Running containers pick up the new credential file automatically
- AWS: Containers are restarted to pick up the new environment variables
Requires the project to be bound to a deployed instance (.skaledata.yaml).
CI/CD
Use --dag-only with an API key for automated DAG deployments:
# .github/workflows/deploy-dags.yml
name: Deploy DAGs
on:
push:
branches: [main]
paths: ['dags/**']
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install CLI
run: curl -fsSL https://get.skaledata.com | bash
- name: Deploy DAGs
env:
SKALE_API_KEY: ${{ secrets.SKALE_API_KEY }}
run: skale airflow deploy --dag-only