Working in an ad hoc analytics environment is inherently random. You never know what question you'll tackle, what data you'll need, or what quantitative method you need to apply. In order to successfully deliver an end to end analysis, I like to think about the analysis process through the following lens.
- SME: Building a solid subject matter expertise
- Data Wrangling: Gather, transform, and know the data
- Analysis: Study relationships with your data, build models
- Delivery: Present the analysis
SME: Building a solid subject matter expertise is critical before you start analyzing the data. Learn the language and process of the client you're supporting. That'll equip you to ask good probing questions, understand the nuance, and help you understand how your analysis will really impact the business. There are no shortcuts here. It takes time, sitting with people, reading whatever you can get your hands on, and watching the process if possible. The holy grail, while not usually an option, is to actually do the process yourself. People might look at your funny, but go as far as you can to immerse yourself in their process. Visit the site where the widget is made, sit with the people who work the process, have lunch with them, or better yet, buy them a beer to learn what is really broken. I can't emphasis enough how important taking this step seriously is. If you go down the wrong path, or make wildly wrong assumptions in the beginning, you can end up wasting your time and the time of those you're serving. The time you spend with the business you're supporting can be valuable for both sides. While you're learning the business, you can organically educate them on what types of analytics methods would apply in that situation, and what the interpretation might be. Often they may over or underestimate your capabilities, so this is a good time to calibrate those expectations.
Data Wrangling: The starting point for this competency is knowing enough SQL coupled with SAS, R or Python to gather and transform data. Then once you have the data, it is about shaping the data and prepping it for analysis. It is also key to understand outliers and missing values. Say you have some binary indicator that takes on values 1, 0 or null. Do you know why you have null values? Are the 0 values different from null in this case? Research and ask questions to find that out. This step is about getting the data, where ever it lies. Typically that means knowing SQL well enough to pull the data you need. However, in a changing environment this means knowing how to get data from a NoSQL environment too. There is also a heavy social component to this as well. In a large organization, you may need to track down the people that know where the data sits, and networking is the best way to do so. Those relationships are critical to finding the data, getting access, and interpreting it once you have your hands on it.
Analysis: It may seem odd that we're halfway through the four stages and we're just now getting to analysis, but this step isn't something to rush into. The steps leading up to this were about learning the business, gathering and prepping your data for modeling or analysis. At this stage you are applying whatever quantitative techniques are appropriate to explore and understand the relationships in your data. I coach my team to go wide and deep to build out their analytic toolkit. Go wide by gaining a breadth of knowledge about different techniques, from domains such as statistics, econometrics, machine learning, text mining, network graphing, six sigma, simulation modeling, and optimization. It's unrealistic to ask someone to know all of these domains, but it is reasonable to build a basic understanding of each, and know which tool is appropriate for the solution. If you know which tool is right for the job, you can always dig deeper later, or lean on a colleague who has experience with it. As for going deep, I ask everyone on my team to pick one of these domains and become an expert in it. I let them pick the route they want to go because it'll take a lot of time and self motivation to really get there.
Delivery: Everything you've done up until this point is in vain if it isn't communicated well. Understand how your audience likes to take in information. Some want a web delivered dashboard so they can explore on their own. Others want a single page PowerPoint with all of the relevant facts on the page. Others want to see what decisions you made along the way to come to your conclusion. I have worked for several leaders that just want to talk about it, no slides or charts. If you work for me, I'll want a code walk through to understand what you did before any presentation is built. Be ready to adjust your style because it's not about you, it's about them understanding and buying into your analysis. I strongly encourage data analysts to join a toastmasters club to practice public speaking. In addition to getting the practice and confidence of pitching to an audience, you'll learn how to deliver o your ideas within the time constraints. I have seen many brilliant people fail to land their ideas because it took them too long to get there and lost the attention of their audience.
Side note for the Analytics Manager:
When each data analyst on the team gets a different project, it becomes very difficult to compare talent or judge performance. However, I think these four steps are underlying pattern and consistent framework for most ad hoc analytics projects, and analyst performance can be judged through this lens. It is not enough that someone just be good at coding and you automatically give them a top grade for data wrangling, so have them demonstrate it and teach their peers. Whether you choose to adopt a framework like this for evaluating performance or not, I recommend choosing something that is clear, and is communicated upfront to the analyst so they understand what is expected of them. Finally, if you work for a company where you calibrate performance ratings across peer teams that perform a similar function, consider making this the standard template for communicating performance.