min read

dbt packages every project needs

Stop reinventing macros. Discover 9 essential dbt packages that solve real problems.

Author

Kristy Broekmans

Data Engineer

Your dbt project is running. Models are being built. Tests are passing. But somewhere in that codebase, a team member is writing a custom macro that already exists - battle-tested and maintained - in a package they never installed.

‍

That’s where dbt packages come in.

‍

The dbt ecosystem has matured to the point where most common problems have been solved by open-source packages. Writing custom macros for surrogate keys, data profiling, or schema generation is not a sign of craftsmanship. It is a sign that the team has not looked at what is already available.

‍

What are dbt packages?

‍

A dbt package is a standalone dbt project - containing models, macros, tests, or seeds - that you import into your own project to extend its functionality. Packages are installed by adding them to a packages.yml file and running dbt deps. Most are distributed via the dbt Package Hub, and many are maintained by dbt Labs or well-known data teams. For more on configuration, see our guide to dbt YAML configuration.

‍

9 Essential dbt packages your project needs

‍

The foundation: stop rewriting what already exists

dbt_utils is the one package that belongs in every project, no exceptions. It provides surrogate key generation, date spine utilities, pivot macros, and safe type casting across warehouses. Many other packages depend on it. If a team is writing custom macros for things like deduplication or union operations, there is a good chance dbt_utils already handles it.

‍

dbt_codegen solves one of the most tedious parts of dbt development: writing YAML. It connects to your database, reads metadata, and generates source definitions and base model SQL automatically. For a project with dozens of source tables, this cuts setup time from hours to minutes. It also reduces the risk of typos and mismatches between what exists in the warehouse and what is declared in dbt.

‍

Together, these two packages form the baseline. Everything else builds on top.

‍

Quality and testing: trust the data or test it

‍

dbt ships with four generic tests out of the box: unique, not_null, accepted_values, and relationships. That covers the basics. It does not cover regex validation, distribution checks, row count comparisons, or schema drift detection.

‍

dbt_expectations fills that gap. Inspired by the Great Expectations Python library, it brings over 50 test types into dbt's native testing framework. Tests like expect_column_values_to_match_regex or expect_table_row_count_to_be_between inserted directly into your YAML property files. No Python runtime needed, no separate orchestration layer.

‍

# Example: validate that order IDs follow a specific format models:  
	- name: stg_orders
		columns:
        	- name: order_id
        		tests:
                - dbt_expectations.expect_column_values_to_match_regex:
                      regex: '^ORD-[0-9]{8}$'

‍

elementary takes a different angle. Where dbt_expectations focuses on rule-based tests, elementary adds observability: anomaly detection, test result dashboards, and alerting. It turns your dbt project into a monitoring system by tracking historical test outcomes and flagging deviations from expected patterns. Elementary needs a dedicated database to save these result - so take that into account when choosing a package.

‍

dbt_project_evaluator audits the project itself rather than the data. It checks your DAG against best practices from dbt Labs - flagging missing tests, undocumented models, direct references to sources from mart models, and naming convention violations. Think of it as a linter for your dbt project structure. The trade-off is that it adds models to your warehouse, so teams should decide early whether to run it in CI only or keep it in production.

‍

Development flow: catch mistakes before they reach production

‍

dbt_audit_helper is the package that pays for itself during refactoring. Rewriting a model is easy. Proving the rewrite produces identical output is hard. This package compares the old and new versions row by row, column by column, and summarises the differences. Without it, teams resort to manual spot-checking - which is how subtle regressions slip through.

‍

1-- Compare old and new model outputs after a refactor
2{%- set old_relation = ref('dim_customers_v1') -%}
3{%- set new_relation = ref('dim_customers_v2') -%}
4
5{{audit_helper.compare_relations(
6	a_relation=old_relation,
7    b_relation=new_relation,
8    primary_key='customer_id'
9) }}

‍

dbt_profiler generates column-level statistics - min, max, null counts, distinct values - directly inside dbt. It is useful during discovery phases when a team inherits a new data source and needs to understand what they are working with before writing any transformation logic.

‍

Both packages serve a narrow purpose. That narrowness is their strength. They do one thing well and stay out of the way.

‍

Operations: control costs and manage external data

‍

dbt_external_tables manages the lifecycle of external tables pointing to files in S3, GCS, or Azure Blob Storage. It handles the DDL so teams do not have to write and maintain custom SQL for staging external files. For organisations that land data as files before loading into the warehouse, this removes a common source of drift between what the warehouse expects and what actually exists in storage.

‍

dbt_snowflake_monitoring is Snowflake-specific but valuable. It tracks query costs at the warehouse, user, and model level. When the finance team asks why the Snowflake bill doubled last month, this package provides the data to answer that question. Cost visibility is not a luxury - it is a prerequisite for making informed decisions about materialisation strategies and scheduling.

‍

dbt-colibri provides column-level lineage for dbt Core. The standard dbt lineage graph shows model-to-model relationships. Colibri goes deeper, tracing which columns flow through which transformations. For teams working with PII, regulatory requirements, or complex multi-hop transformations, column-level lineage is the difference between confidence and guesswork.

‍

Choosing packages wisely

‍

Having more packages in your dbt project is not better. Every dependency is a maintenance commitment. Packages can conflict with each other, lag behind dbt releases, or introduce unexpected behaviour during upgrades.

‍

A practical approach: start with dbt_utils and dbt_codegen. Add dbt_expectations when the native four tests are no longer sufficient. Bring in dbt_project_evaluator when the team grows beyond three or four people, and informal standards start to erode. Adopt the rest based on specific, concrete needs - not because a list said to.

‍

The question worth asking before adding any package is straightforward: Does this save more time than it costs in maintenance? If yes, add it. If the answer is unclear, write a focused custom macro and revisit the decision in three months.

‍

The dbt package ecosystem keeps growing. Keeping up with every new release is neither practical nor necessary. What matters is knowing the packages that solve real problems in your stack - and installing them before someone on the team reinvents them from scratch.

Facts & figures

About client