Data analytics lifecycle

What is the data analytics Lifecycle?

The scientific method helps give a framework for the data analytics lifecycle (Dietrich, 2013). According to Dietrich (2013), it is a cyclical life cycle that has iterative parts in each of its six steps:

  • Discovery
  • Pre-processing data
  • Model planning
  • Model building
  • Communicate results
  • Operationalize

However Erl, Buhler, & Khattak (2016), suggested that it is divided in nine steps:

  • Business case evaluation
  • Data identification
  • Data acquisition & filtering
  • Data extraction
  • Data validation & cleansing
  • Data aggregation & representation
  • Data analysis
  • Data visualization
  • Utilization of analysis results

Prajapati (2013), stated five steps:

  • Identifying the problem
  • Designing data requirements
  • Pre-processing data
  • Data analysis
  • Data visualizing

Between these three different lifecycle versions, there is a general pattern that emerges, but it also suggests that the field of data analytics is still too nascent to pin down an exact data analytics lifecycle.  For the purpose of this discussion the lifecycle that will be used is from Services (2015), which uses the Dietrich (2013) lifecycle. Note that both Services (2015) and Dietrich (2013) model is iterative and not static steps.  This lifecycle model allows all key team members to conduct planning work up front and towards the end of the data analytics project to drive success (Dietrich, 2013).

When is it beneficial for stakeholders to be involved?

If following an agile development processes the key stakeholders should be involved in all the lifecycles. That is because the key stakeholders are known as business user, project sponsor, project manager, business intelligence analyst, database administers, data engineer, and data scientist (Services, 2015).  Some of the benefits of applying the Agile development processes to this lifecycle is because it allows for iterative feedback for speed-to-market, improved first-time quality, visibility, risk management, flexibility to pivot when needed, controlling costs, and improved satisfaction through engagement (Waters, 2007).  Allowing the stakeholders to participate in most of these steps can allow the following work to be done to their specifications.

For the first step, discovery, the business learns its domain and its relevant history with lessons learned from previous projects (Services, 2015). Before proceeding ask: “Do I have enough information to draft an analytic plan and share for peer review?” (Dietrich, 2013; Services, 2015). Pre-processing data, also known as data preparation is where a copy of the data is placed in a sandbox (not the original), where the data scientists and team can extract, load and transform (ELT) the copied data (Services, 2015). In this stage, data could also be cleaned, aggregated, augmented, and formatted (Prajapati, 2013). Before proceeding ask: “Do I have enough good quality data to start building the model?” (Dietrich, 2013; Services, 2015). Model planning is when the data scientist and team determines the appropriate models, algorithms, workflow of the data, which helps identify hidden insights between the variables (Services, 2015).  Before proceeding ask: “Do I have a good idea about the type of model to try? Can I refine the analytic plan?” (Dietrich, 2013; Services, 2015). Model building helps sets roughly about 2/3 of the data for training the model and 1/3 of the data for testing the model for production purposes and discovering hidden insights (Prajapati, 2013; Services, 2015). Before proceeding ask: “Is the model robust enough? Have we failed for sure?” (Dietrich, 2013; Services, 2015).   Communicating results could be done visualization of data to the major stakeholders to see if the results are a success or failure (Services, 2015).  Visualization is done in this step is supposed to be interactive with all parties involved in this project (Prajapati, 2013). Finally, the operationalize step is when the data is ready to provide reports, documents, on a pre-defined time interval such that key decision makers could receive the vital data needed (Services, 2015).

References

What is Business Intelligence?

Business Intelligence (BI) is gathering, managing and analyzing data that is vital for the survival of business in this current hyper-competitive environment (Thomas, 2001 & Vuori, 2006). A BI practitioner is to help decision makers from being overwhelmed with a huge wealth of data. Thus they act as a filter because decision makers will ignore any information that is not useful or meaningful (Vuori, 2006).

The BI Cycle is a continuous cycle, which could be easily reduced to planning the information you want to collect, ethically collect reliable information, analyzing the data to form intelligence, and disseminating the intelligence in an understandable way (Thomas, 2001). It can be expanded into six steps, per Vuori (2006):

  1. Defining information needs
  2. Information gathering
  3. Information processing
  4. Analysis
  5. Dissemination
  6. Utilization and Feedback

A good BI system would make use of a knowledge database and a communication system (Thomas, 2001). With this system and cycle and the correct data, we can have information on our competitors, new technology, public policy, customer sentiment, market forces, supply chain information, etc. Having this information at the disposal of the decision maker will allow for data-driven decisions, to increase their company’s competitive advantage.

Three BI cycle characteristics that drive productivity

  1. Identifying Needs versus Wants in the first phase: Are we defining “Information that is wanted but that is not really needed”, “Information that lacks and that is recognized to be needed” or “Information that is needed but not known to be needed, wanted nor asked for” (Vuori, 2006)? The last two are extremely important. The second one satisfies the end-user of the BI; the other can identify huge revelations. In the last case, if a company only looks at the most active or their biggest competitor they may lose sight of the smaller competitor gaining traction. Getting the right information that is needed is key to not wasting time and increase productivity.
    • Influences the BI practitioner organization
    • Influences the Decision Makers (from first line managers to executive level)
    • Departments in which the data/information is collected from
  2. Protecting their Intellectual Capital: When companies have high turnover rates, or when sensitive/proprietary information is transported on drives or in the minds of the employees, or when senior personnel accidentally give out information in conferences/symposiums (Thomas, 2001), we run the risk of becoming vulnerable as a company. Another example is the supply chain if one company uses a key supplier and their competitor uses that same key supplier to produce a similar product mix. Then what guarantees are being used to ensure that information is being transported between the two companies through the supplier? Information leaks can lead to a loss of a competitive advantage. Protecting the intellectual capital will allow companies not to have to constantly create new products and focus on improving their current product mix.
    • All employees
    • Supply chain (horizontally and vertically)
    • Production lines
    • Human Resources/Legal
    • Management
  3. Dissemination of the correct analysis: This will allow managers to make data-driven decisions that should help protect the business, enter a new market space, etc. If the practitioners of BI, could give the decision maker the information they need based on their analysis and nothing more, we would be saving time, reducing decision fatigue, and time wasted on producing the analytics. Thus, constant communication must occur between the practitioner and decision makers to avoid non-value added work. Feedback cycles, help make future work/endeavors to become more productive over time.
    • Influences the BI practitioner organization
    • Influences the Decision Makers (from first line managers to executive level)
    • Communications departments

An example of an innovative use of BI, is DeKalb County, GA. The CIO, has leveraged BI and analytics, to set up smart policing initiatives (where police are being used more effectively and efficiently to prevent crimes and lowering crime rates), enhance public safety (develop and maintain green neighborhoods), promote jobs and economic development (Matelski, 2015). The CIO has taken data from multiple systems and followed the cycle above to ask the right questions, to identify the needs for particular data, its collection, processing, and analysis to its dissemination to the key decision makers (via intuitive dashboards and key performance indicators).

References: