dbt (Data Build Tool) has quickly become a key piece in the dizzying puzzle of applications that make up the modern data stack. Serving as the answer to the T in ELT (Extract, Load, Transform), dbt is used by a myriad of data teams to help apply software development practices to transformations in their data pipelines.
Two flavors exist when we talk about dbt. There’s dbt Core, which exists as a free, open source command line tool, and dbt Cloud, their subscription based cloud offering. This raises the question when evaluating tools of which to use.
In this article, we will explore why a data team may want to consider the use of dbt Cloud and what differentiates it from its open source counterpart. There are four main factors that we will examine; orchestration, git integration, documentation, and configuration.
Unless you are already making use of an orchestration tool like Apache Airflow, Prefect, etc, dbt Cloud provides an interface for scheduling and running dbt jobs to do things like capture snapshots and facilitate transformations. Dbt Core would require the use of an external orchestrator.
Dbt Cloud’s orchestration also allows for the creation of webhooks that can send notifications about jobs to downstream processes or on call notification services. This allows for enhanced observability of jobs running within the dbt Cloud scheduler.
Depending on a user’s level of comfort with Git, the built in features in dbt cloud can really help a team get used to git workflows for proper version control. The dbt Cloud IDE has built in functionality for pulling and pushing branches to and from git repositories, making workflows more straightforward.
Along with simplified workflows, the job scheduler also offers integration with git, so CI/CD testing jobs can be run automatically when actions like pull requests are taken on the repository.
Though dbt core does allow for documentation to be generated, the actual documentation site would still need to be hosted as a static website which would involve the use of something like Amazon S3. In dbt Cloud, the hosting of the static doc site is automatic and accessible via a link in any job that generates documentation, allowing for docs to be more consistently updated without large amounts of additional maintenance.
Configuring connections to both external databases and git providers within dbt Cloud is straightforward with a simple “fill in the blank” type interface. Once all configurations are made, building models in the built in IDE can be done almost immediately. In contrast, dbt Core requires a bit more installation, including working with a separate IDE, orchestrator, and git tools to have everything up and running.
When to Make the Move to dbt Cloud
With all of this in mind, it is important to note that dbt Cloud may not be the best choice for every team out there. Different team structures in varied organizations have different requirements that need to be met.
For example, if a data team is well established, with well put together processes for Git integration, orchestration, and app configuration, you may opt to go down the dbt Core route as it allows for a bit more customization. An organization may also have different integration requirements for technologies not currently supported by dbt Cloud such as AWS Athena or Firebolt, which are currently available in community supported packages for dbt Core.
That being said, if a data team is just being stood up, or the team is small and needs a fast, convenient way to put their SQL skills to use in making models, dbt Cloud is a great choice. Even larger, more established data teams should consider dbt Cloud if they don’t find themselves needing the level of customization offered with an open source tool.
Overall, dbt Cloud is a more professionally supported set of tools that provides simplicity and stability to an organization’s dbt environment without the need for as much technical overhead. Though dbt Core is still a great tool with a lot of community created connectors that make it a bit more versatile than dbt Cloud, the ease of use of dbt Labs SaaS offering is definitely something to consider when looking at bringing the tool into an organization’s data stack.