## Business Intelligence: Targets, Probabilities, & Modeling

• Target Measures are used to improve marketing efforts through tracking measures like ROI, NVP, Revenue, lead generation, lag generations, growth rates, etc. (Liu, Laguna, Wright, & He, 2014). The goal is that after a marketing effort is conducted, there should be a change in Target Measures. Positive changes in these measures should be repeated.  Hoptroff and Kufyba (2001) stated that these measures could also be defect rates, default rates, survey ranking results, response rates, churn rate, the value of lost to the business, transaction amounts, products purchased, etc.
• Probability Mining is data mining using Logit Regression, neural networks, linear regression, etc. Using this helps determine the probability of an event, in our case meeting or failing to meet our Target Measures based on information on past events. (Hoptroff & Kufyba, 2001)
• Econometrics Modeling is a form of understanding the economy through a blend of economic theory with statistical analysis. Essentially, a way of modeling how certain independent variables act or influence the dependent variable using both economic and statistical theory tools to build the model.  Econometrics Modeling looks into the market power a business holds, game theory models, information theory models, etc.  It is rationalized that economic theory nor statistical theory can provide enough knowledge to solve/describe a certain variable/state, thus the blending of both are assumed to be better at solving/describing a certain variable/state (Reiss & Wolak, 2007)

In the end, an econometric models can contains elements of probability mining, but a probability miner doesn’t have to be is not an econometric model.  Each of these models and miners can track and report on target measures.

Econometrics Modeling is a way to understand price and the pricing model, which is central to generating profits through understanding both economic and statistical/probability principles to achieve a targeted measure.   Companies should use big data and a probability miner/econometric modeling to help them understand the meaning behind the data and extract actionable decisions one could make to either meet or exceed a current target measure, compare and contrast against their current competition, understand their current customers.

Two slightly different Applications

1. Probability mining has been used to see a customer’s affinity and responses towards a new product through profiling current and/or new customers (Hoptroff & Kufyba, 2001). Companies and marketing firms work on these models to assign a probability value of attracting new customers to a new or existing product or service. The results can give indications as to whether or not the company could met the Target Measures.
2. We have Marketing Strategies Plan A, B, and C, and we want to use econometric modeling to understand how cost effective each marketing strategy plan would be with respect to the same product/product mix at different price points. This would be a cause and effect modeling (Hoptroff, 1992). Thus, the model should help predict which strategy would produce the most revenue, which is one of our main target measures.

An example of using Probability Mining is Amazon’s Online shopping experience. As the consumer adds items to the shopping cart, Amazon in real-time begins to apply probabilistic mining to find out what other items this consumer would purchase (Pophal, 2014) based on what has happened before through the creation of profiles and say “Others who purchased X also bought Y, Z, and A.”  This quote, almost implies that these items are a set and will enhance your overall experience, buy some more.  For instance, buyers of a \$600 Orion Telescope also bought this \$45 Hydrogen-alpha filter (use to point the telescope towards the sun to see planets move in front of it).

The Federal Reserve Bank and its board members have been using econometric modeling in the past 30 years for forecasting economic conditions and quantitative policy analysis (Brayton. Levin, Tryon., & Williams, 1997).  The model began in 1966 with help of the academic community, Division of Research and Statistics with available technology, which became operational in 1970.  It had approximate 60 behavioral equations, with long-run neoclassical growth model, factor demands, and life-cycle model of consumption.  Brayton et al. in 1997 go on to say that this model was used for primarily the analysis of stabilization of monetary and fiscal policies, as well as other governmental policies effects onto the economy.

Resources:

## Business Intelligence: Effectiveness

Non-profit Hospitals are in a constant state of trying to improve their services and drive down costs. Thus, one of the ways they do this is by turning to Lean Six Sigma techniques and IT to identify opportunities to save money and improve the overall patient experience. Six Sigma relies on data/measurements to determine opportunities for continuous improvements, thus aiding in the hospitals goals, a Business Intelligence (BI) program was developed (Topaloglou & Barone, 2015).

Key Components of the structure

For an effective BI program the responsible people/stakeholders (Actors) are identified, so we define who is responsible for setting the business strategies (Goals).  The strategy must be supported by the right business processes (Objects), and the right people must be assigned as accountable for that process.  Each of these processes has to be measured (Indicators) to inform the right people/stakeholders on how the business strategy is doing.  All of this is a document in a key document (called AGIO), which is essentially a data definition dictionary that happens to be a common core solution (Topaloglou & Barone, 2015).  This means that there is one set of variables names and definitions.

Implementation of the above structure has to take into account the multi-level business and their needs.  Once the implementation is completed and buy off from all other stakeholders has occurred, that is when the business can experience its benefits.  Benefits are: end users can make strategic data based decisions and act on them, a shift in attitudes towards the use and usefulness of information, perception of data scientist from developers to problem solvers, data is an immediate action, continuous improvement is a byproduct of the BI system, real-time views with data details drill down features enabling more data-driven decisions and actions, the development of meaningful dashboards that support business queries, etc. (Topaloglou & Barone, 2015).

Knowledge management systems fit into the structure

“Healthcare delivery is a distributed process,” where patients can receive care from family doctors, clinicians, ER staff,  specialists, acute care, etc. (Topaloglou & Barone, 2015).  Each of these people involved in healthcare delivery have vital knowledge about the patient that needs to be captured and transferred correctly; thus hospital reports help capture that knowledge.  Knowledge also lies with how the patient flows in and out of sections in the hospital, and executives need to see metrics on how all of these systems work together.  Generating a knowledge management distributed database system (KMDBS), aids in tying all this data together from all these different sources to provide the best care for patients, identify areas for continual improvements, and provides this in a neat little portal (and dashboards) for ease of use and ease of knowledge extraction (Topaloglou & Barone, 2015).  The goal is to unify all the knowledge from multiple sources into one system, coming up with a common core set of definitions, variables, and metrics.  The common core set of definitions, variables, and metrics are done so that everyone can understand the data in the KMDBS, and look up information if there are any questions.  The development team took this into account and after meeting with different business levels, the solution that was developed in-house provided all staff a system which used their collective knowledge to draw out key metrics that would aid them in data-driven decisions for continuous improvement on the services they provide to their patients.

1 example

Topaloglou & Barone, (2015) present the following example below:

• Actor: Emergency Department Manger
• Goal: Reduce the percentage of patients leaving without being seen
• Indicator: Percentage of patients left without being seen
• Object: Physician initial assessment process

Resources

## Business Intelligence: data mining success

For data mining success one must follow a data mining process. There are many processes out there, and here are two:

• From Fayyad, Piatetsky-Shapiro, and Smyth (1996)
1. Data -> Selection -> Target Data -> Preprocessing -> Preprocesses Data -> Transformation -> Transformed Data -> Data Mining -> Patterns -> Interpretation/Evaluation -> Knowledge
• From Padhy, Mishra, and Panigrahi (2012)
1. Business understanding -> Data understanding -> Data Preparation -> Modeling -> Evaluation -> Deployment

Success has much to do with the events that lead to the main event as it does with the main event.  Thus, what is done to the data before data mining can proceed successfully. Fayyad et al. (1996) address that data mining is just a subset of the knowledge discovery process, where data mining provides the algorithms/math that helps reach the final goal.  Looking at each of the individual processes we can see that they are slightly different, yet the same.  Another key thing to note is that we can move back and forth (i.e. iterations) between the steps in these processes.  These two are supposing that data is being pulled from a knowledge database or data warehouse, where the data should be cleaned (uniformly represented, handling missing data, noise, and errors) and accessible (provided access paths to data).

Pros/Challenges

If the removal of the pre-processing stage or data preparation phase, we will never be able to reduce the high-dimensionality in the data sets (Fayyad et al., 1996).  High dimensionality increases the size of the data, thus increases the need for more processing time, which may not be as advantageous on a real-time data feed into the data mining derived model.  Also, with all this data, you run into the chances that the data model derived through the data mining process will pick up spurious patterns, which will not be easily generalizable or even understandable for descriptive purposes (Fayyad et al., 1996).  Descriptive purposes are data mining for the sake of understanding the data, whereas predictive purposes are for data mining for the sake of predicting the next result of an input of variables from a data source (Fayyad et al., 1996, Padhy et al., 2012).  Thus, to avoid this high-dimensionality problem, we must understand the problem, understand why we have the data we have, what data is needed and reduce the dimensions to the bare essentials.

Another challenge that would come from data mining if we did do the selection, data understanding, or data mining algorithm selection, the step is the issue is overfitting.  Fayyad et al. (1996), defines selection as selecting the key data you need to feed into the model, and selecting the right data mining algorithm which will influence the results.  Understanding the problem will allow you to select the right data dimensions as aforementioned as well as the data mining algorithm (Padhy et al., 2012).  Overfitting is when a data mining algorithm tries to not only derive general patterns in the data but also describes it with noisy data (Fayyad et al., 1996).  Through the selection process, you can pick data with reduced noise to avoid an overfitting problem.  Also, Fayyad et al. (1996) suggest that solutions should include: cross-validation, regularization, and other statistical analysis.  Overfitting issues though can be fixed through understanding what you are looking for before using data mining, will aid in the evaluation/interpretation process (Padhy et al., 2012).

Cons/Opportunities

Variety in big data that changes with time, while applying the same data mined model, will at one point, either be outdated (no longer relevant) or invalid.  This is the case in social media, if we try to read posts without focusing on one type of post, it would be hard to say that one particular data pattern model derived from data mining is valid.  Thus, previously defined patterns are no longer valid as data rapidly change with respect to time (Fayyad et al., 1996).  We would have to solve this, through incrementally modifying, deleting or augmenting the defined patterns in the data mining process, but as data can vary in real-time, in the drop of a hat, and this can be quite hard to do (Fayyad et al., 1996).

Missing data and noisy data is very prevalent in Meteorology; we cannot sample the entire atmosphere at every point at every time.  We send up weather balloons 2-4 times a day at two points in a US state at a time.  We then try to feed that into a model for predictive purposes.  However, we have a bunch of gaps in the data.  What happens if the weather balloon is a dud, and we get no data. Hence, we have missing data.  This is a problem with the data.  How are we supposed to rely on the solution derived through data mining if the data is either missing or noisy? Fayyad et al. (1996) said that missing values are “not designed with discovery in mind”, but we must include statistical strategies to define what these values should be.  One of the ones that meteorologist use is data interpolation.  There are many types of interpolation, revolving simple nearest neighbor ones, to complex Gaussian types.

Resources:

## Business Intelligence: Multilevel BI

Annotated Bibliography

Citation:

Curry, E., Hasan, S., & O’Riain, S. (2012, October). Enterprise energy management using a linked dataspace for energy intelligence. In Sustainable Internet and ICT for Sustainability (SustainIT), 2012 (pp. 1-6). IEEE.

Author’s Abstract:

“Energy Intelligence platforms can help organizations manage power consumption more efficiently by providing a functional view of the entire organization so that the energy consumption of business activities can be understood, changed, and reinvented to better support sustainable practices. Significant technical challenges exist in terms of information management, cross-domain data integration, leveraging real-time data, and assisting users to interpret the information to optimize energy usage. This paper presents an architectural approach to overcome these challenges using a Dataspace, Linked Data, and Complex Event Processing. The paper describes the fundamentals of the approach and demonstrates it within an Enterprise Energy Observatory.”

My Personal Summary:

Using BI as a foundation, a linked (key data is connected to each other to provide information and knowledge) dataspace (a huge data mart with data that is related to each other when needed) for energy intelligence was implemented for the Digital Enterprise Research Institute (DERI), which has ~130 staff located in one building.  The program was trying to measure the direct (electricity costs for data centers, lights, monitors, etc.) and indirect (cost of fuel burned, the cost of gas used by commuting staff) energy usage of the enterprise to become a more sustainable company (as climate change is a big topic these days).  It covered that a multi-level and holistic view of the business intelligence (on energy usage) was needed.  They talked about each of the individual types of information conveyed at each level.

My Personal Assessment:

However, this paper didn’t go into how effective was the implementation of this system.  What would have improved this paper, is saying something about the decrease in the CO2 emission DERI had over the past year.  They could have graphed a time series chart showing power consumption before implementation of this multi-level BI system and after.  This paper was objective but didn’t have any slant as to why we should implement a similar system.  They state that their future work is to provide more granularity in their levels, but nothing on what business value it has had on the company.  Thus, with no figures stating the value of this system, this paper seemed more like a conceptual, how-to manual.

My Personal Reflection:

This paper doesn’t fit well into my research topic.  But, it was helpful in defining a data space and multi-level and holistic BI system.  I may use the conceptual methodology of a data space in my methodology, where I collect secondary data from the National Hurricane Center into a big data warehouse and link the data as it seems relevant.  This, should save me time, and reduce labor intensive costs to data integration due to postponing it when they are required.  It has changed my appreciation of data science, as there is another philosophy to just bringing in one data set at a time into a data warehouse and make all your connections, before moving on to the next data set.

A multilevel business intelligence setup and how it affects the framework of an organization’s decision-making processes.

In Curry et al. (2012), they applied a linked data space BI system to a holistic and multi-level organization.  Holistic aspects of their BI system included Enterprise Resource Planning, finance, facility management, human resources, asset management and code compliance.  From a holistic standpoint, most of these groups had silo information that made it difficult to leverage across their domains.  However, this is different than multi-level BI system setup.  Defined in Table II in Curry et al (2012), in the multi-level set up, the data gets shown to the organization (stakeholders are executive members, shareholders, regulators, suppliers, consumers), functional (stakeholders are functional managers, organization manager), and individual level (stakeholders are the employees).  Each of these stakeholders has different information requirements and different levels of access to certain types of data. Thus, the multi-level BI system must take this into account.  Thus, different information requirements and access mean different energy metrics, i.e. Organizational Level Metrics could be Total Energy Consumption, % Renewable energy sources, versus Individual Level Metrics could be Business Travel, Individual IT consumption, Laptop electricity consumption, etc.  It wouldn’t make sense that an executive or a stake holder to look at every 130 staff members Laptop electricity consumption metric when they could get a company-wide figure.   However, the authors did note that the level organization data can be further drilled down, to see where the cause could be for a particular event in question.  Certain data that the executives can see will not be accessed by all individual employees. Thus, a multi-level BI system also addresses this.  Also, employee A cannot view employee B’s energy consumption because of lateral level view of the BI system data may not be permissible.

Each of the different levels of metrics reported out by this multi-level BI system allows that particular level to make data-driven decisions to reduce their carbon footprint.  An executive can look at the organizational level metrics, and institute a power down your monitors at night initiative to save power corporate wide.  But, at the individual level, they could choose to leave to go to work earlier, not to be in traffic too long and waste less gas, thus reducing their indirect carbon footprint for the company.  Managers can make decisions to a request for funding for energy efficient monitors and laptops for all their teams, or even a single power strip per person, to reduce their teams’ energy consumption cost, which is based on the level of metrics they can view.

## What is Business Intelligence?

Business Intelligence (BI) is gathering, managing and analyzing data that is vital for the survival of business in this current hyper-competitive environment (Thomas, 2001 & Vuori, 2006). A BI practitioner is to help decision makers from being overwhelmed with a huge wealth of data. Thus they act as a filter because decision makers will ignore any information that is not useful or meaningful (Vuori, 2006).

The BI Cycle is a continuous cycle, which could be easily reduced to planning the information you want to collect, ethically collect reliable information, analyzing the data to form intelligence, and disseminating the intelligence in an understandable way (Thomas, 2001). It can be expanded into six steps, per Vuori (2006):

1. Defining information needs
2. Information gathering
3. Information processing
4. Analysis
5. Dissemination
6. Utilization and Feedback

A good BI system would make use of a knowledge database and a communication system (Thomas, 2001). With this system and cycle and the correct data, we can have information on our competitors, new technology, public policy, customer sentiment, market forces, supply chain information, etc. Having this information at the disposal of the decision maker will allow for data-driven decisions, to increase their company’s competitive advantage.

Three BI cycle characteristics that drive productivity

1. Identifying Needs versus Wants in the first phase: Are we defining “Information that is wanted but that is not really needed”, “Information that lacks and that is recognized to be needed” or “Information that is needed but not known to be needed, wanted nor asked for” (Vuori, 2006)? The last two are extremely important. The second one satisfies the end-user of the BI; the other can identify huge revelations. In the last case, if a company only looks at the most active or their biggest competitor they may lose sight of the smaller competitor gaining traction. Getting the right information that is needed is key to not wasting time and increase productivity.
• Influences the BI practitioner organization
• Influences the Decision Makers (from first line managers to executive level)
• Departments in which the data/information is collected from
2. Protecting their Intellectual Capital: When companies have high turnover rates, or when sensitive/proprietary information is transported on drives or in the minds of the employees, or when senior personnel accidentally give out information in conferences/symposiums (Thomas, 2001), we run the risk of becoming vulnerable as a company. Another example is the supply chain if one company uses a key supplier and their competitor uses that same key supplier to produce a similar product mix. Then what guarantees are being used to ensure that information is being transported between the two companies through the supplier? Information leaks can lead to a loss of a competitive advantage. Protecting the intellectual capital will allow companies not to have to constantly create new products and focus on improving their current product mix.
• All employees
• Supply chain (horizontally and vertically)
• Production lines
• Human Resources/Legal
• Management
3. Dissemination of the correct analysis: This will allow managers to make data-driven decisions that should help protect the business, enter a new market space, etc. If the practitioners of BI, could give the decision maker the information they need based on their analysis and nothing more, we would be saving time, reducing decision fatigue, and time wasted on producing the analytics. Thus, constant communication must occur between the practitioner and decision makers to avoid non-value added work. Feedback cycles, help make future work/endeavors to become more productive over time.
• Influences the BI practitioner organization
• Influences the Decision Makers (from first line managers to executive level)
• Communications departments

An example of an innovative use of BI, is DeKalb County, GA. The CIO, has leveraged BI and analytics, to set up smart policing initiatives (where police are being used more effectively and efficiently to prevent crimes and lowering crime rates), enhance public safety (develop and maintain green neighborhoods), promote jobs and economic development (Matelski, 2015). The CIO has taken data from multiple systems and followed the cycle above to ask the right questions, to identify the needs for particular data, its collection, processing, and analysis to its dissemination to the key decision makers (via intuitive dashboards and key performance indicators).

References: