Asking good questions to avoid the data breadline

Great questions are the crux of data analysis and building compelling insights. As I wrote about in my post on data retrieval, bad questions tank even the best intentions. Most people aren’t good at asking good questions, and it’s not because they lack the motivation to do so — it’s because to get really good at it the asker needs to understand where data is coming from.

The best questions tend to have the following qualities:

Low dimensionality - this means they are narrow in scope.
- A poor example of dimensionality may be something like: What drives the highest revenue for X customer? … The range of signals or groups of signals are enormous, therefore the permutations to get to answer are really wide.
Clear definition - the words used in the question are clear and not abstract enough to mean multiple things.
- A poor example of definition may be something like: What’s our best performing content? … “best” is subjective, and needs to be defined by the person asking the question.
Are time bound - the time range is clearly defined in the question.

All of these qualities are due to how data is stored (in databases), how it is queried (in languages like SQL), and the cost of querying (ambiguity requires more data, which is more expensive).

If we view data retrieval/analysis as a rate limiting step, which it often is (e.g. the trope “the data breadline”), then there are two ways to approach it: 1) make data retrieval much cheaper (e.g. don’t penalize bad questions). Right now, this requires super expensive tools like Looker or Tableau to enable business users to be their own analysts, or 2) increase the quality of questions so the work that’s queued is much simpler and more people are enabled to find data themselves.

I tend to lean towards empowering people. That often requires education in both data infrastructure and data retrieval tools such as SQL. Once someone has familiarity with both, they are much more equipped to wrap their head around what a good question is and is not .. and to find it on their own.

Examples of good questions:

What are the most read pages for Acme Corp in 2019?
What features do customers with the highest quantitative health scores use that joined in the last 12 months?
What’s the average activity (calls, emails) of sales reps with the most demos set in the last 3 months? Least demos set?

A couple of other thoughts:

The first question is often a jumping off point that leads to more questions - which creates incentive that the question is as good as possible.
If someone doesn’t ask a good question, help them get there by lowering dimensionality, clarifying definitions, and time bounding it.