Updated: May 20, 2022
This is part 2 of Stacking Up: Word Nerd Does Data. Wherein the marketing guy tries to understand and explain the modern data stack. You can read part 1 here.
I had an aha moment the other day. "Light dawns on Marblehead," my high school English teacher would say. Tough but fair, I suppose.
A coworker was explaining the difference between ETL and ELT. These are terms I'd heard before, and my rudimentary understanding is that they're ways of moving data. This isn't exactly wrong, but it's also not exactly helpful.
Extract the source data, Transform it in some way, and Load it into its new location: ETL. Or, you can Extract the source data, Load it into the new system, and then Transform it: ELT.
On the face of it this is logical enough to me. But, as the humble marketing guy, I'm not seeing a lot more than face, either. To my great relief, they gave an ELI5 version, and therein lied my breakthrough.
Imagine you're moving into a new house. You've got to pack up all your stuff, load it into the truck, then unload it at the new house. Your friends pretend beer and pizza is fair compensation for their herniated disks. A good time is had by all.
Understanding ETL: Extract, Transform, Load
In the ETL methodology, imagine having to pack the moving truck in exactly the way your stuff will appear in the new house. Couches and dressers and dozens of kitchen gadgets you had to get (but have used twice) all need to be precisely curated before you pull away from the curb.
Understanding ELT: Extract, Load, Transform
In ELT, you're packing every inch of the moving truck as efficiently as you can. If you're anything like me, you're going 50mph down the highway, checking the mirrors every nine seconds, horrified that they let just about anybody drive these trucks. And, when you finally arrive at the new house, you unload and organize as you go.
Why Move to the Modern Data Stack?
To my surprise, the former methodology has been the prevailing one until fairly recently. However, the emergence of cloud warehouses like Snowflake, Amazon Redshift, and Google BigQuery have provided highly scalable computing power. This scalability has allowed SaaS tools like Fivetran and dbt Labs to reimagine how organizations move, curate, and optimize their data.
Like with moving to a new house, the reasons for moving your data are numerous, but typically hinge on economical factors. If your data lives in on-premise servers, moving to a scalable cloud environment can give greater financial and performance flexibility. Cloud warehouses can dial back storage needs in real time, whereas your server would simply sit unused. They can also temporarily scale up computing power, whereas an on-premise server has a concrete upper limit to what it can do. There are, of course, many unique variables to plan for, but generally speaking total cost of ownership is lower and the environment is more agile when embracing the modern data stack.
Here's what the traditional data stack looks like, and what the modern, cloud-based data stack can look like:
I assume I really glossed over total cost of ownership and cloud economics above, so perhaps more on that next time! I hope you find the moving truck analogy as helpful as I did. Going into that conversation I expected to have no clue what the technical staff was talking about, and I mean...we absolutely got there eventually. But at a high level I think the technical barrier to entry is pretty reasonable.
If I ever move again, I'm absolutely going the ELT route. I'll stick to what I'm good at - marketing and not lifting things. Not necessarily in that order.