Data Engineering
Blog
dbt
5
min read

dbt packages every project needs

Stop reinventing macros. Discover 9 essential dbt packages that solve real problems.
Author
Kristy Broekmans
Kristy Broekmans
Data Engineer
dbt packages every project needs
Share article

Your dbt project is running. Models are being built. Tests are passing. But somewhere in that codebase, a team member is writing a custom macro that already exists - battle-tested and maintained - in a package they never installed.

That’s where dbt packages come in.

The dbt ecosystem has matured to the point where most common problems have been solved by open-source packages. Writing custom macros for surrogate keys, data profiling, or schema generation is not a sign of craftsmanship. It is a sign that the team has not looked at what is already available.

What are dbt packages?

A dbt package is a standalone dbt project - containing models, macros, tests, or seeds - that you import into your own project to extend its functionality. Packages are installed by adding them to a packages.yml file and running dbt deps. Most are distributed via the dbt Package Hub, and many are maintained by dbt Labs or well-known data teams. For more on configuration, see our guide to dbt YAML configuration.

9 Essential dbt packages your project needs

The foundation: stop rewriting what already exists

dbt_utils is the one package that belongs in every project, no exceptions. It provides surrogate key generation, date spine utilities, pivot macros, and safe type casting across warehouses. Many other packages depend on it. If a team is writing custom macros for things like deduplication or union operations, there is a good chance dbt_utils already handles it.

dbt_codegen solves one of the most tedious parts of dbt development: writing YAML. It connects to your database, reads metadata, and generates source definitions and base model SQL automatically. For a project with dozens of source tables, this cuts setup time from hours to minutes. It also reduces the risk of typos and mismatches between what exists in the warehouse and what is declared in dbt.

Together, these two packages form the baseline. Everything else builds on top.

Quality and testing: trust the data or test it

dbt ships with four generic tests out of the box: unique, not_null, accepted_values, and relationships. That covers the basics. It does not cover regex validation, distribution checks, row count comparisons, or schema drift detection.

dbt_expectations fills that gap. Inspired by the Great Expectations Python library, it brings over 50 test types into dbt's native testing framework. Tests like expect_column_values_to_match_regex or expect_table_row_count_to_be_between inserted directly into your YAML property files. No Python runtime needed, no separate orchestration layer.

# Example: validate that order IDs follow a specific format models:  
	- name: stg_orders
		columns:
        	- name: order_id
        		tests:
                - dbt_expectations.expect_column_values_to_match_regex:
                      regex: '^ORD-[0-9]{8}$'

elementary takes a different angle. Where dbt_expectations focuses on rule-based tests, elementary adds observability: anomaly detection, test result dashboards, and alerting. It turns your dbt project into a monitoring system by tracking historical test outcomes and flagging deviations from expected patterns. Elementary needs a dedicated database to save these result - so take that into account when choosing a package.

dbt_project_evaluator audits the project itself rather than the data. It checks your DAG against best practices from dbt Labs - flagging missing tests, undocumented models, direct references to sources from mart models, and naming convention violations. Think of it as a linter for your dbt project structure. The trade-off is that it adds models to your warehouse, so teams should decide early whether to run it in CI only or keep it in production.

Development flow: catch mistakes before they reach production

dbt_audit_helper is the package that pays for itself during refactoring. Rewriting a model is easy. Proving the rewrite produces identical output is hard. This package compares the old and new versions row by row, column by column, and summarises the differences. Without it, teams resort to manual spot-checking - which is how subtle regressions slip through.

1-- Compare old and new model outputs after a refactor
2{%- set old_relation = ref('dim_customers_v1') -%}
3{%- set new_relation = ref('dim_customers_v2') -%}
4
5{{audit_helper.compare_relations(
6	a_relation=old_relation,
7    b_relation=new_relation,
8    primary_key='customer_id'
9) }}

dbt_profiler generates column-level statistics - min, max, null counts, distinct values - directly inside dbt. It is useful during discovery phases when a team inherits a new data source and needs to understand what they are working with before writing any transformation logic.

Both packages serve a narrow purpose. That narrowness is their strength. They do one thing well and stay out of the way.

Operations: control costs and manage external data

dbt_external_tables manages the lifecycle of external tables pointing to files in S3, GCS, or Azure Blob Storage. It handles the DDL so teams do not have to write and maintain custom SQL for staging external files. For organisations that land data as files before loading into the warehouse, this removes a common source of drift between what the warehouse expects and what actually exists in storage.

dbt_snowflake_monitoring is Snowflake-specific but valuable. It tracks query costs at the warehouse, user, and model level. When the finance team asks why the Snowflake bill doubled last month, this package provides the data to answer that question. Cost visibility is not a luxury - it is a prerequisite for making informed decisions about materialisation strategies and scheduling.

dbt-colibri provides column-level lineage for dbt Core. The standard dbt lineage graph shows model-to-model relationships. Colibri goes deeper, tracing which columns flow through which transformations. For teams working with PII, regulatory requirements, or complex multi-hop transformations, column-level lineage is the difference between confidence and guesswork.

Choosing packages wisely

Having more packages in your dbt project is not better. Every dependency is a maintenance commitment. Packages can conflict with each other, lag behind dbt releases, or introduce unexpected behaviour during upgrades.

A practical approach: start with dbt_utils and dbt_codegen. Add dbt_expectations when the native four tests are no longer sufficient. Bring in dbt_project_evaluator when the team grows beyond three or four people, and informal standards start to erode. Adopt the rest based on specific, concrete needs - not because a list said to.

The question worth asking before adding any package is straightforward: Does this save more time than it costs in maintenance? If yes, add it. If the answer is unclear, write a focused custom macro and revisit the decision in three months.

The dbt package ecosystem keeps growing. Keeping up with every new release is neither practical nor necessary. What matters is knowing the packages that solve real problems in your stack - and installing them before someone on the team reinvents them from scratch.

Facts & figures

About client

Testimonial

Blogs you might also like

Snowflake vs Databricks
Arrow icon darkArrow icon dark

Snowflake vs Databricks

Snowflake vs Databricks compared — architecture, pricing, ML capabilities, and use cases. Find out which cloud data platform fits your team.

Data Engineering
Blog
Snowflake
Snowpark Connect for Apache Spark
Arrow icon darkArrow icon dark

Snowpark Connect for Apache Spark

Optimize workflows with Snowpark Connect for Apache Spark. Run Spark code directly in Snowflake to lower costs and simplify your architecture

Data Engineering
Blog
Snowflake
The role of semantics in agentic analytics
Arrow icon darkArrow icon dark

The role of semantics in agentic analytics

Learn more about the role of semantics in agentic analytics and how it drives clear, consistent autonomous insights.

Data Engineering
Blog
7 Things You Should Know About NULL Values
Arrow icon darkArrow icon dark

7 Things You Should Know About NULL Values

Having troubles with NULL values? Here are 7 things you should know about them.

Data Engineering
Blog
Tableau
Tableau <> Snowflake key-pair authentication
Arrow icon darkArrow icon dark

Tableau <> Snowflake key-pair authentication

Discover how you can use key-pair authentication to connect Tableau to Snowflake.

Data Engineering
Blog
Snowflake
Snowflake 101: Loading cloud data using AWS
Arrow icon darkArrow icon dark

Snowflake 101: Loading cloud data using AWS

Let's discover how to load structured data from Cloud using AWS S3 into Snowflake.

Data Engineering
Blog
Snowflake
Loading data from local environments into Snowflake
Arrow icon darkArrow icon dark

Loading data from local environments into Snowflake

Discover how to load structured data from a computer into Snowflake.

Data Engineering
Blog
Snowflake
How to pass the SnowPro Core certification exam
Arrow icon darkArrow icon dark

How to pass the SnowPro Core certification exam

Get a hands-on personal take on the SnowPro Core certification and how you should prepare for it. We're sure you'll ace the exam!

Data Engineering
Blog
Snowflake
How to UPSERT or MERGE data with Tableau Prep’s write-back functionality
Arrow icon darkArrow icon dark

How to UPSERT or MERGE data with Tableau Prep’s write-back functionality

A demonstration of how to simply apply the straightforward concept of MERGE and UPSERT in Tableau Prep.

Data Engineering
Blog
Tableau
dbt Configuration: YAML file
Arrow icon darkArrow icon dark

dbt Configuration: YAML file

Learn the basics of configuring your YAML files for dbt the right way.

Data Engineering
Blog
dbt
Improving your Data Quality in 7 Steps
Arrow icon darkArrow icon dark

Improving your Data Quality in 7 Steps

Want to improve your data quality? Learn how to improve data quality in 7 steps here. Read the full article.

Data Engineering
Blog
Why automate your data pipelines?
Arrow icon darkArrow icon dark

Why automate your data pipelines?

Thinking of building your own data pipelines? This article explains why that's not always the best option.

Data Engineering
Blog
Fivetran
Using dbt to model GA4 raw data
Arrow icon darkArrow icon dark

Using dbt to model GA4 raw data

Learn how to leverage dbt to model GA4 raw data for in-depth analysis and insights.

Data Engineering
Blog
dbt
What is Salesforce Data 360: The Ultimate Guide
Arrow icon darkArrow icon dark

What is Salesforce Data 360: The Ultimate Guide

Discover how Salesforce Data 360 unifies customer data, enhances decision-making, and powers AI-driven innovations.

Data Engineering
Blog
Data Cloud
Snowflake vs Salesforce Data 360
Arrow icon darkArrow icon dark

Snowflake vs Salesforce Data 360

Discover how Snowflake and Salesforce Data Cloud can complement each other to create an integrated, scalable, and actionable data strategy.

Data Engineering
Blog
Data Cloud