The following blog by Jacqui Moore was originally published on Do Mo(o)re With Data January 26, 2023 and is cross-posted here with permission. Jacqui Tableau Social Ambassador and a Senior Data Analytics and Viz Consultant for Cleartelligence.
I’ve written a lot of documentation, and it’s a task that few people enjoy, and I am no exception. I’ve also read a lot of documentation and unwound a lot of undocumented reporting, and it’s a task that is often overlooked and underappreciated. Good documentation can be invaluable in maintainability, training, and knowledge transfer. I’ve certainly come back to a project I worked on six months later, and forgotten why something was done a certain way. Or dug my way through Tableau Workbooks and ETL code to find out where a certain piece of logic is coming from.
I find the most useful documentation in day-to-day work is the documentation that is right where you need it. Making documentation part of your workflow can save your own sanity, and pay dividends in time saved, either your own or whoever inherits your work.
This isn’t to say that a full technical document isn’t helpful or needed. These provide invaluable information on the business context, interactivity, use cases, logic, and more. However, documenting your process right in the tool where you are working will save immense amounts of time and confusion down the line, are easier to keep up-to-date, and you can even use this information at the end of a project to make the creation of technical documents and user documents easier.
So, where does this living documentation, well, live? Ideally, it lives everywhere the data is touched. Keep in mind, this type of documentation isn’t meant to be redundant, but to add context that isn’t immediately apparent.
Data Prep Stage
What to Include
Date created or modified
Designed purpose and limitations of the data source
Data lineage and dependencies. If you’re using a well-used database, it may not be as important as if you are connecting to spreadsheets or processes that need to be updated or maintained.
Data freshness timestamp, if applicable
Call out any inclusion/exclusion criteria, transformations done, business decisions, or logic explanations
Ensure this makes it to the users of the data! If they won’t open the workflow or see the SQL, then passing this information downstream is key. Some tools, like dbt’s “Exposures”, include features to surface this type of information to others.
Ways to Document
Commenting and annotating code and workflows is helpful to quickly orient yourself or others on what is happening, where, and why. Naming conventions for fields, subqueries, views, etc. will also go a long way.
Using a very simple example based on the Superstore data set, I’ve shown some ways to document data preparation below. Most of the queries and workflows we create will be much more complex than this, so documentation becomes more important. For this example, I used Ken Flerlage’s SQL Server Superstore instance. If you need a server to connect to for learning how to use data prep tools, check out his post on FlerlageTwins.com!
How this might look in SQL:
Comments should clarify any changes, logic that may be seen as unneeded or is unclear in purpose
Aliases should be easy to identify
Clean formatting to allow easy reading to locate key information
All fields are prefixed with the source table alias
/* Author: Jacqui Moore Date Created: 2023-01-19 Purpose: All Orders and returns for West Region Modified: 2023-01-20 Ticket ABC-123 */ SELECT o.[Order ID] ,o.[Order Date] ,o.[Ship Date] ,o.[Customer ID] ,o.[Customer Name] ,o.Segment ,o.[Product ID] ,o.[Product Name] ,o.[Category] ,o.[Sub-Category] ,o.Sales as [Amount Sold] ,o.Quantity as [Quantity Sold] -- ,o.Discount as [Discount as Sold] --Removed per ABC-123 -- ,o.Profit as [Profit as Sold] -- Removed per ABC-123 ,r.Returned FROM SuperstoreUS.dbo.Orders o LEFT JOIN SuperstoreUS.dbo.[Returns] r ON r.[ORDER ID]=o.[ORDER ID] WHERE o.Region = 'West'
How this might look in Alteryx:
Comment header to indicate name, purpose, creator, and important information about a workflow
Containers can be used to create a “Read Me” for additional information
Tools are annotated descriptively
Calculated fields are commented with assumptions, or additional context the next person needs to know
Containers are used to segment the steps and provide additional context on the processing of the data
How this might look in Tableau Prep
While there are fewer ways to add notes with Tableau Prep, you can add a description to each step
Groups can be used to create a cleaner flow, with the ability to drill in on steps, and act similar to Alteryx containers in some ways
Calculations can be commented using // at the start of a comment line
Other helpful things to include
If you’ve used a macro, tool group, or snippet of code from somewhere else, include a link to the original source
If you’ve used a macro or tool group, include a brief description of the purpose and what operations are being performed
On the Data Source
Give your Tableau Data Source a descriptive name
Pre-filter any data in the data source, whenever possible
Rename the tables, if the names aren’t clear
If you are using Custom SQL, comment that code
If the data source is published, a description containing some of the high level information from the data source section is helpful context for users who might try to later connect to the data
On the Data Pane
Rename fields to use ‘friendly’ names, such as the common nomenclature for the field among the business users
Set the right data types
Add a comment to fields if your data source will be used for Ask Data or for business users who are less familiar with the data and/or Tableau
This will appear on hover in the data source pane and Ask Data on Server
If the Table names are enough context to group the fields, then that is fine, but if the data source has a lot of fields, using Folders can be useful
Having a naming convention that makes it clear when LODs or Parameters are being used can be very helpful, but can also sometimes be less friendly with displaying the field names on views
Did you know the field descriptions are searchable? Yep, you can come up with a tag system and include it in the description, and search right in the Tableau Desktop data pane. Field descriptions are also visible on Ask Data.
In addition to the items above, calculations can be commented in the calculation window, just like any other type of code
When you’re ready to publish, cleanup…
Delete calculations you ended up not using, copies of fields, etc.
Hide all unused fields
Hide fields that aren’t meant to be used (such as id fields that you need, but don’t mean anything to the user, or base fields that were replaced with LOD calculations). If they can’t be hidden, putting them in a folder labeled “Do Not Use” is also helpful. If it will mess up someone’s analysis to use that field, hide it.
Name the sheets descriptively, with leading names that help identify the section, dashboard, etc.
Color coding your tabs can be very helpful. People use the colors for different things, but I like to use it to show when certain filters will apply
The reason I like to use the colors to show filters, is because when changing filter settings to apply to specific sheets, you can see these colors, making it much easier to select the right sheets for the right filters
If you aren’t using the captions for display on a dashboard, you can use those to add notes on how a more complicated sheet is working
You can include a sheet with a “Read Me” for developers, containing data source or workbook level information. This sheet doesn’t get published, but can contain a wealth of knowledge
Layout containers are awesome. Use layout containers! But, really, containers will help a lot with development, layout, save you from floating many items, and help organize
Name the containers so you can identify them in the layout pane. This has saved me on complicated dashboards, and is definitely worth the time it takes to do it.
Include clear chart headings, axis headings, and helper text so the user knows what they are looking at, and have answers to any logic questions
When you’re done, “Hide all sheets” will clean up your workbook. Delete any unused sheets that you don’t need to keep for a reason.
For The End User
So far, the types of documentation I’ve covered are for developers (or yourself). But, whether you create functional user documentation or not, having documentation baked into the dashboard will be appreciated by the end user. For some users, it’s the only type of documentation they ever even see.
Tool tips can contain descriptions of what the metrics mean, text indicating what actions are available, and more. Don’t neglect tooltips!
Titles, labels and helper text are types of text that are displayed directly on the dashboard, and are important. These are things like clear axis labels, text describing interactivity, color legends, descriptive titles, and so on.
Overlays can be helpful for complicated dashboards with a lot of interactivity, where the visible helper text would be redundant, or just too much.
Include in the header or footer of the dashboard things like:
Data refresh date
Date range included, if different from the refresh date
Business points of contact
Developer point of contact
And now that I have thoroughly talked about one of the most tedious parts of development, go forth and do good data!