Creating Useful Documentation
The following blog by Jacqui Moore was originally published on Do Mo(o)re With Data January 26, 2023 and is cross-posted here with permission. Jacqui Tableau Social Ambassador and a Senior Data Analytics and Viz Consultant for Cleartelligence.
I’ve written a lot of documentation, and it’s a task that few people enjoy, and I am no exception. I’ve also read a lot of documentation and unwound a lot of undocumented reporting, and it’s a task that is often overlooked and underappreciated. Good documentation can be invaluable in maintainability, training, and knowledge transfer. I’ve certainly come back to a project I worked on six months later, and forgotten why something was done a certain way. Or dug my way through Tableau Workbooks and ETL code to find out where a certain piece of logic is coming from.
I find the most useful documentation in day-to-day work is the documentation that is right where you need it. Making documentation part of your workflow can save your own sanity, and pay dividends in time saved, either your own or whoever inherits your work.
This isn’t to say that a full technical document isn’t helpful or needed. These provide invaluable information on the business context, interactivity, use cases, logic, and more. However, documenting your process right in the tool where you are working will save immense amounts of time and confusion down the line, are easier to keep up-to-date, and you can even use this information at the end of a project to make the creation of technical documents and user documents easier.
So, where does this living documentation, well, live? Ideally, it lives everywhere the data is touched. Keep in mind, this type of documentation isn’t meant to be redundant, but to add context that isn’t immediately apparent.
Data Prep Stage
What to Include
Date created or modified
Designed purpose and limitations of the data source
Data lineage and dependencies. If you’re using a well-used database, it may not be as important as if you are connecting to spreadsheets or processes that need to be updated or maintained.
Data freshness timestamp, if applicable
Call out any inclusion/exclusion criteria, transformations done, business decisions, or logic explanations
Ensure this makes it to the users of the data! If they won’t open the workflow or see the SQL, then passing this information downstream is key. Some tools, like dbt’s “Exposures”, include features to surface this type of information to others.
Ways to Document
Commenting and annotating code and workflows is helpful to quickly orient yourself or others on what is happening, where, and why. Naming conventions for fields, subqueries, views, etc. will also go a long way.
Using a very simple example based on the Superstore data set, I’ve shown some ways to document data preparation below. Most of the queries and workflows we create will be much more complex than this, so documentation becomes more important. For this example, I used Ken Flerlage’s SQL Server Superstore instance. If you need a server to connect to for learning how to use data prep tools, check out his post on FlerlageTwins.com!
How this might look in SQL:
Comments should clarify any changes, logic that may be seen as unneeded or is unclear in purpose
Aliases should be easy to identify
Clean formatting to allow easy reading to locate key information
All fields are prefixed with the source table alias
/* Author: Jacqui Moore Date Created: 2023-01-19 Purpose: All Orders and returns for West Region Modified: 2023-01-20 Ticket ABC-123 */ SELECT o.[Order ID] ,o.[Order Date] ,o.[Ship Date] ,o.[Customer ID] ,o.[Customer Name] ,o.Segment ,o.[Product ID] ,o.[Product Name] ,o.[Category] ,o.[Sub-Category] ,o.Sales as [Amount Sold] ,o.Quantity as [Quantity Sold] -- ,o.Discount as [Discount as Sold] --Removed per ABC-123 -- ,o.Profit as [Profit as Sold] -- Removed per ABC-123 ,r.Returned FROM SuperstoreUS.dbo.Orders o LEFT JOIN SuperstoreUS.dbo.[Returns] r ON r.[ORDER ID]=o.[ORDER ID] WHERE o.Region = 'West'
How this might look in Alteryx:
Comment header to indicate name, purpose, creator, and important information about a workflow
Containers can be used to create a “Read Me” for additional information
Tools are annotated descriptively
Calculated fields are commented with assumptions, or additional context the next person needs to know
Containers are used to segment the steps and provide additional context on the processing of the data
How this might look in Tableau Prep
While there are fewer ways to add notes with Tableau Prep, you can add a description to each step