It is very much important to have a framework and to know scope of work before starting analysis. below are the phases or framework that I follow while solving exciting problem that I get to solve
Learn and fully understand the problem:
This is the initial phase of the Data Analysis where we ask several questions in various angles to fully understand the problem statement. Often the problems are fuzzy and only effect of it is visible not the cause. It is very essential to ask right questions in a right way.
In this phase you may need to have meetings with the stake holders and your peers to have a solid questions to ask against data.
This phase is often called as requirement gathering phase in dev perspective.
Prepare:
Once you are clear with your problem statement it is time for you to identify data sources.
This phase is often called as data gathering or data collection phase.
In first phase while making sense of the problem and its attributes you also would have roughly thought about the data required to solve it. In this second phase you collect data, or if needed you copy them from various Database and files. If needed sometimes you also launch surveys to collect data.
This phase is extremely important since your whole analysis, insights out of it and often future actions will be based on the data you acquire here.
Make sure the data is fair and not biased. If you are conducting survey and/or doing sample data analysis make sure your data represents whole population fairly.
Processing, data cleansing and transforming:
The data you aquired and considered for analysis could be raw and untidy. This is the phase where you take close look at the data, standardize, clean and transform the data according to your need.
to make a easy example here: Imagine you had a table with columns product, quantity_sold , price_per_unit and date_of_sale. Imagine you are interested in trend of total revenue generated per week, then it is obvious that you need a new column with revenue=(quantity_sold* price_per_unit). basically such calculations and derivations are done in this phase.
you get the point 🙂
In summary, this is when you clean your data, do null handling, derive various matrix apply various function to get what you exactly need for analysis.
Analyzing:
This is the phase where you slice and dice the data sets. look it from various angles, ask several questions to it and get insights that is helpful. This is most exciting phase of the process where you discover many unexpected trends and patterns.
Once you have some sense out of data you connect with peers and managers to have different perspective.
Once everything is finalized you pack the product, visualize the data and build the data story with actionable conclusions.
Before sharing it with stakeholders/business you also get the story review with peers.
Share / build application:
This is when you have meeting with stakeholders to communicate the insights which you have discovered and help them with your recommendations, solution and optimal course of action.
This meeting will be genrally highly interactive and business/stakeholders will have lots of questions on your analysis methods. So it better to have code ready in your pocket, It could be rmarkdown file or Jupiter note book; just to show it to the stakeholders if they need additional explanation on analysis process.
It is often the case where stakeholders need the same analysis to be performed again for slightly different scenario, it could be for different time frame or different store or demography, but you know framework and base code remains same. This is when we meet developers and communicate the possibility of application building process. whether it is possible to build a stable job that could produce various analytical reports by passing 2-3 parameters.
If the team/organization is small you will be the analyst-cum-developer.
Action phase:
This is the phase where business stakeholders take action based on your analysis insights and share the feedback in subsequent meetings.
Again these steps are Ideal course of action in the game of data analysis but we often jump back and forth between steps based on requirements.