The right way to create an area dbt challenge with dummy information for testing functions with Docker
dbt (information construct instrument) is without doubt one of the hottest applied sciences within the information engineering and analytics area. Not too long ago, I’ve been engaged on a process that performs some post-processing over dbt artefacts and wished to put in writing up some checks. So as to take action, I’ve needed to create an instance challenge that would run regionally (or in a docker container), in order that I wouldn’t must work together with the precise Information Warehouse.
On this article we’ll undergo a step-by-step course of one can comply with in an effort to create a dbt challenge and join it with a containerized Postgres occasion. You need to use such initiatives both for testing functions, and even for experimenting with the dbt itself in an effort to check out options and even practise your abilities.
Step 1: Create a dbt challenge
We shall be populating some information in a Postgres database due to this fact, we first want to put in the dbt Postgres adapter from PyPI:
pip set up dbt-postgres==1.3.1
Observe that the command may even set up the dbt-core
bundle in addition to different dependencies which can be required for operating dbt.
Now let’s go forward and create a dbt challenge — to take action, we are able to initialise a brand new dbt challenge by operating the dbt init
command within the terminal:
dbt init test_dbt_project
You’ll then be prompted to pick which database you want to make use of (relying on the adapters you might have put in regionally, you might even see totally different choices):
16:43:08 Operating with dbt=1.3.1
Which database would you want to make use of?
[1] postgres(Do not see the one you need? https://docs.getdbt.com/docs/available-adapters)
Enter a quantity: 1
Ensure that to enter the quantity that corresponds to the Postgres adapter, as proven within the output record. Now the init
command ought to have created the next primary construction within the listing the place you’ve executed it:
Step 2: Create a Docker Compose file
Now let’s create a docker-compose.yml
file (place the file on the similar degree because the test_dbt_project
listing) through which we shall be specifying two companies — one would correspond to a ready-made Postgres picture and the second to a dbt
picture that we are going to outline in a Dockerfile
within the subsequent step:
model: "3.9"companies:
postgres:
container_name: postgres
picture: frantiseks/postgres-sakila
ports:
- '5432:5432'
healthcheck:
check: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 5s
retries: 5
dbt:
container_name: dbt
construct: .
picture: dbt-dummy
volumes:
- ./:/usr/src/dbt
depends_on:
postgres:
situation: service_healthy
As you may inform, for the Postgres container, we shall be utilizing a picture referred to as frantiseks/postgres-sakila
which is publicly accessible and accessible on Docker Hub. This picture, will populate the Sakila Database on the Postgres occasion. The database fashions a DVD rental retailer and is consisted of a number of tables that are normalised and correspond to entities resembling movies, actors, prospects and funds. Within the subsequent few following sections we’ll make use of this information in an effort to construct some instance dbt information fashions.
The second service, referred to as dbt
, would be the one which creates an setting the place we’ll construct our information fashions. Observe that we mount the present listing into the docker container. It will let the container have entry to any adjustments we could also be doing to the info fashions with out having to re-build the picture. Moreover, any metadata generated by dbt instructions (resembling manifet.json
) will seem immediately on the host machine.
Step 3: Create a Dockerfile
Now let’s specify a Dockerfile
that shall be used to construct a picture on high of which the operating container will then construct the fashions laid out in our instance dbt challenge.
FROM python:3.10-slim-busterRUN apt-get replace
&& apt-get set up -y --no-install-recommends
WORKDIR /usr/src/dbt/dbt_project
# Set up the dbt Postgres adapter. This step may even set up dbt-core
RUN pip set up --upgrade pip
RUN pip set up dbt-postgres==1.3.1
# Set up dbt dependencies (as laid out in packages.yml file)
# Construct seeds, fashions and snapshots (and run checks wherever relevant)
CMD dbt deps && dbt construct --profiles-dir profiles && sleep infinity
Observe that within the final CMD
command, we deliberately added an additional && sleep infinity
command such that the container received’t exit after operating the steps specified within the Dockerfile
in order that we are able to then entry the container and run extra dbt instructions (if wanted).
Step 4: Create a dbt profile for the Postgres database
Now that we have now created the required infrastructure for our host machines in an effort to create a Postgres database, populate some dummy information in addition to creating a picture for our dbt setting, let’s deal with the dbt aspect.
We are going to first must create a dbt profile that shall be used when interacting with the goal Postgres database. Throughout the test_dbt_project
listing, create one other listing referred to as profiles
after which a file referred to as profiles.yml
with the next content material:
test_profile:
goal: dev
outputs:
dev:
kind: postgres
host: postgres
person: postgres
password: postgres
port: 5432
dbname: postgres
schema: public
threads: 1
Step 5: Outline some information fashions
The subsequent step is to create some information fashions primarily based on the Sakila information populated by the Postgres container. If you’re planning to make use of this challenge for testing functions, I might advise to create at the least one seed, one mannequin and a snapshot (with checks if attainable) so that you’ve a full protection of all dbt entities (macros excluding).
I’ve created some information fashions, seeds and snapshots already, you could entry them on this repository
Step 6: Run the Docker containers
We now have every thing we want in an effort to spin up the 2 docker containers we specified within the docker-compose.yml
file earlier, and construct the info fashions outlined in our instance dbt challenge.
First, let’s construct the pictures
docker-compose construct
And now let’s spin up the operating containers:
docker-compose up
This command ought to have initialised a Postgres database utilizing the Sakila Database, and created the dbt fashions specified. For now, let’s ensure you have two operating containers:
docker ps
ought to give an output that features one container with identify dbt
and one other one with identify postgres
.
Step 7: Question the fashions on Postgres database
In an effort to entry the Postgres container, you’ll first must infer the container id
docker ps
After which run
docker exec -it <container-id> /bin/bash
We are going to then want to make use of psql
, a command-line interface that provides us entry the postgres occasion:
psql -U postgres
In case you have used the info fashions I’ve shared within the earlier sections, now you can question every of the fashions created on Postgres utilizing the queries beneath.
-- Question seed tables
SELECT * FROM customer_base;-- Question staging views
SELECT * FROM stg_payment;
-- Question intermediate views
SELECT * FROM int_customers_per_store;
SELECT * FROM int_revenue_by_date;
-- Question mart tables
SELECT * FROM cumulative_revenue;
-- Question snapshot tables
SELECT * FROM int_stock_balances_daily_grouped_by_day_snapshot;
Step 8: Creating extra or modifying current fashions
As talked about already, the Dockerfile
and docker-compose.yml
information have been written in such a means such that the dbt container would nonetheless be up and operating. Due to this fact, everytime you modify or create information fashions, you may nonetheless use that container to re-build seeds, fashions, snapshots and/or checks.
To take action, first infer the container id of the dbt
container:
docker ps
Then enter the operating container by operating
docker exec -it <container-id> /bin/bash
And eventually run any dbt command you would like, relying on the modifications you’ve made to the instance dbt challenge. Right here’s a fast reference of probably the most generally used instructions for these functions:
# Set up dbt deps
dbt deps# Construct seeds
dbt seeds --profiles-dir profiles
# Construct information fashions
dbt run --profiles-dir profiles
# Construct snapshots
dbt snapshot --profiles-dir profiles
# Run checks
dbt check --profiles-dir profiles
The right way to get the complete code of this tutorial
I’ve created a GitHub repository referred to as dbt-dummy
that accommodates all of the items you want in an effort to shortly create a containerized dbt challenge that makes use of Postgres. You may entry it within the hyperlink beneath.
This challenge can be accessible within the instance initiatives part of the official dbt documentation!
Closing Ideas
In at present’s tutorial we went by way of a step-by-step course of for making a dbt challenge on an area machine utilizing Docker. We’ve constructed two photos, one for the Postgres database that additionally populates the Sakila database, and one other one for our dbt setting.
It’s essential to have the ability to shortly construct some instance initiatives with information construct instrument that may then be used as a testing setting or perhaps a playground for experimenting and studying.
Develop into a member and browse each story on Medium. Your membership price straight helps me and different writers you learn. You’ll additionally get full entry to each story on Medium.
Associated articles you may additionally like