Big Data and Self Service: Perpetual Failure

We used to call this ad hoc reporting or self service reporting. As with most things data, the vernacular changes, but the problems remain the same. Ten years later and we still don’t have a great solution. You want data, “learn sql” they say. So now Jenny the accountant who pulls data once a month, needs to learn and relearn sql in addition to doing her regular job of accounting. This is a massive failure of the big data industry. If you thought “learn to code” is disrespectful, I’m sorry but “learn sql” is much worse. It implies you’re not busy with a million other tasks related to your actual job.

Silicon Valley has the hungriest, most data savvy companies of just about anywhere. Here data is as good as oil. We scoop up everything we can and store it for as long as we can. Big data warehousing too expensive? No problem, here are 10 startups that are commoditizing storage and compute on a massively parallel scale. But for Jenny here’s our best offer:

  1. If you have the budget, hire a bunch of underpaid H1b “consultants”
  2. Have your data team develop 100’s of pre-cached reports (cubes/datasources/workbooks). You can never know which one has the data with the right set of filters. Everything is opaque here.
  3. SQL Templates you can hack. Better not fuck this up!
  4. Learn to SQL

This is of course in order of least effort to most effort.

Once upon a time we thought we had a solution in Semantic Layer technology. Business Objects was the first to offer it in the early 90’s followed by the rest. However, because of poorly executed implementations, our industry gave up on it. BI companies with Semantic Layers promised “speed of thought” intelligence, and we got a confusing mess. Why this happened is a complex journey down history. I’ll boil it down to the advent of big data, trends away from dimensional modeling, and a need for speed.

I am currently looking at the fraying edges of our weak ass strategy of cubes everywhere. Sure we got high performances cubes, but nobody knows which one to use and when. The Semantic Layer is once again looking promising. There is at least one middling startup attempting a version 2.0. Looker is that company, however, I’m not sure about the strength of their tech. Can they do multi-pass SQL, what about mixed granularities of metrics, and what’s the user experience like?

This might be another shit solution, but I’ll be looking to see if we are getting closer to something better than “learn sql.”

Disclosure: I have nothing to do with Looker and have never tried it any serious way.

Leave a Comment

Your email address will not be published. Required fields are marked *