Business Intelligence: data mining success

This post discusses the necessary steps for data mining success. There are many options out there. this post describe the pros and cons of the steps

Advertisements

For data mining success one must follow a data mining process. There are many processes out there, and here are two:

  • From Fayyad, Piatetsky-Shapiro, and Smyth (1996)
    1. Data -> Selection -> Target Data -> Preprocessing -> Preprocesses Data -> Transformation -> Transformed Data -> Data Mining -> Patterns -> Interpretation/Evaluation -> Knowledge
  • From Padhy, Mishra, and Panigrahi (2012)
    1. Business understanding -> Data understanding -> Data Preparation -> Modeling -> Evaluation -> Deployment

Success has much to do with the events that lead to the main event as it does with the main event.  Thus, what is done to the data before data mining can proceed successfully. Fayyad et al. (1996) address that data mining is just a subset of the knowledge discovery process, where data mining provides the algorithms/math that helps reach the final goal.  Looking at each of the individual processes we can see that they are slightly different, yet the same.  Another key thing to note is that we can move back and forth (i.e. iterations) between the steps in these processes.  These two are supposing that data is being pulled from a knowledge database or data warehouse, where the data should be cleaned (uniformly represented, handling missing data, noise, and errors) and accessible (provided access paths to data).

Pros/Challenges

If the removal of the pre-processing stage or data preparation phase, we will never be able to reduce the high-dimensionality in the data sets (Fayyad et al., 1996).  High dimensionality increases the size of the data, thus increases the need for more processing time, which may not be as advantageous on a real-time data feed into the data mining derived model.  Also, with all this data, you run into the chances that the data model derived through the data mining process will pick up spurious patterns, which will not be easily generalizable or even understandable for descriptive purposes (Fayyad et al., 1996).  Descriptive purposes are data mining for the sake of understanding the data, whereas predictive purposes are for data mining for the sake of predicting the next result of an input of variables from a data source (Fayyad et al., 1996, Padhy et al., 2012).  Thus, to avoid this high-dimensionality problem, we must understand the problem, understand why we have the data we have, what data is needed and reduce the dimensions to the bare essentials.

Another challenge that would come from data mining if we did do the selection, data understanding, or data mining algorithm selection, the step is the issue is overfitting.  Fayyad et al. (1996), defines selection as selecting the key data you need to feed into the model, and selecting the right data mining algorithm which will influence the results.  Understanding the problem will allow you to select the right data dimensions as aforementioned as well as the data mining algorithm (Padhy et al., 2012).  Overfitting is when a data mining algorithm tries to not only derive general patterns in the data but also describes it with noisy data (Fayyad et al., 1996).  Through the selection process, you can pick data with reduced noise to avoid an overfitting problem.  Also, Fayyad et al. (1996) suggest that solutions should include: cross-validation, regularization, and other statistical analysis.  Overfitting issues though can be fixed through understanding what you are looking for before using data mining, will aid in the evaluation/interpretation process (Padhy et al., 2012).

Cons/Opportunities

Variety in big data that changes with time, while applying the same data mined model, will at one point, either be outdated (no longer relevant) or invalid.  This is the case in social media, if we try to read posts without focusing on one type of post, it would be hard to say that one particular data pattern model derived from data mining is valid.  Thus, previously defined patterns are no longer valid as data rapidly change with respect to time (Fayyad et al., 1996).  We would have to solve this, through incrementally modifying, deleting or augmenting the defined patterns in the data mining process, but as data can vary in real-time, in the drop of a hat, and this can be quite hard to do (Fayyad et al., 1996).

Missing data and noisy data is very prevalent in Meteorology; we cannot sample the entire atmosphere at every point at every time.  We send up weather balloons 2-4 times a day at two points in a US state at a time.  We then try to feed that into a model for predictive purposes.  However, we have a bunch of gaps in the data.  What happens if the weather balloon is a dud, and we get no data. Hence, we have missing data.  This is a problem with the data.  How are we supposed to rely on the solution derived through data mining if the data is either missing or noisy? Fayyad et al. (1996) said that missing values are “not designed with discovery in mind”, but we must include statistical strategies to define what these values should be.  One of the ones that meteorologist use is data interpolation.  There are many types of interpolation, revolving simple nearest neighbor ones, to complex Gaussian types.

Resources:

 

 

Business Intelligence: Multilevel BI

There are multiple styles for an annotated bibliography. This post shows one of those styles. This post explains what a multilevel business intelligence setup is, and describes how this type of arrangement affects the framework of an organization’s decision-making processes.

Annotated Bibliography

Citation:

Curry, E., Hasan, S., & O’Riain, S. (2012, October). Enterprise energy management using a linked dataspace for energy intelligence. In Sustainable Internet and ICT for Sustainability (SustainIT), 2012 (pp. 1-6). IEEE.

Author’s Abstract:

“Energy Intelligence platforms can help organizations manage power consumption more efficiently by providing a functional view of the entire organization so that the energy consumption of business activities can be understood, changed, and reinvented to better support sustainable practices. Significant technical challenges exist in terms of information management, cross-domain data integration, leveraging real-time data, and assisting users to interpret the information to optimize energy usage. This paper presents an architectural approach to overcome these challenges using a Dataspace, Linked Data, and Complex Event Processing. The paper describes the fundamentals of the approach and demonstrates it within an Enterprise Energy Observatory.”

 

My Personal Summary:

Using BI as a foundation, a linked (key data is connected to each other to provide information and knowledge) dataspace (a huge data mart with data that is related to each other when needed) for energy intelligence was implemented for the Digital Enterprise Research Institute (DERI), which has ~130 staff located in one building.  The program was trying to measure the direct (electricity costs for data centers, lights, monitors, etc.) and indirect (cost of fuel burned, the cost of gas used by commuting staff) energy usage of the enterprise to become a more sustainable company (as climate change is a big topic these days).  It covered that a multi-level and holistic view of the business intelligence (on energy usage) was needed.  They talked about each of the individual types of information conveyed at each level.

My Personal Assessment:

However, this paper didn’t go into how effective was the implementation of this system.  What would have improved this paper, is saying something about the decrease in the CO2 emission DERI had over the past year.  They could have graphed a time series chart showing power consumption before implementation of this multi-level BI system and after.  This paper was objective but didn’t have any slant as to why we should implement a similar system.  They state that their future work is to provide more granularity in their levels, but nothing on what business value it has had on the company.  Thus, with no figures stating the value of this system, this paper seemed more like a conceptual, how-to manual.

My Personal Reflection:

This paper doesn’t fit well into my research topic.  But, it was helpful in defining a data space and multi-level and holistic BI system.  I may use the conceptual methodology of a data space in my methodology, where I collect secondary data from the National Hurricane Center into a big data warehouse and link the data as it seems relevant.  This, should save me time, and reduce labor intensive costs to data integration due to postponing it when they are required.  It has changed my appreciation of data science, as there is another philosophy to just bringing in one data set at a time into a data warehouse and make all your connections, before moving on to the next data set.

A multilevel business intelligence setup and how it affects the framework of an organization’s decision-making processes. 

In Curry et al. (2012), they applied a linked data space BI system to a holistic and multi-level organization.  Holistic aspects of their BI system included Enterprise Resource Planning, finance, facility management, human resources, asset management and code compliance.  From a holistic standpoint, most of these groups had silo information that made it difficult to leverage across their domains.  However, this is different than multi-level BI system setup.  Defined in Table II in Curry et al (2012), in the multi-level set up, the data gets shown to the organization (stakeholders are executive members, shareholders, regulators, suppliers, consumers), functional (stakeholders are functional managers, organization manager), and individual level (stakeholders are the employees).  Each of these stakeholders has different information requirements and different levels of access to certain types of data. Thus, the multi-level BI system must take this into account.  Thus, different information requirements and access mean different energy metrics, i.e. Organizational Level Metrics could be Total Energy Consumption, % Renewable energy sources, versus Individual Level Metrics could be Business Travel, Individual IT consumption, Laptop electricity consumption, etc.  It wouldn’t make sense that an executive or a stake holder to look at every 130 staff members Laptop electricity consumption metric when they could get a company-wide figure.   However, the authors did note that the level organization data can be further drilled down, to see where the cause could be for a particular event in question.  Certain data that the executives can see will not be accessed by all individual employees. Thus, a multi-level BI system also addresses this.  Also, employee A cannot view employee B’s energy consumption because of lateral level view of the BI system data may not be permissible.

Each of the different levels of metrics reported out by this multi-level BI system allows that particular level to make data-driven decisions to reduce their carbon footprint.  An executive can look at the organizational level metrics, and institute a power down your monitors at night initiative to save power corporate wide.  But, at the individual level, they could choose to leave to go to work earlier, not to be in traffic too long and waste less gas, thus reducing their indirect carbon footprint for the company.  Managers can make decisions to a request for funding for energy efficient monitors and laptops for all their teams, or even a single power strip per person, to reduce their teams’ energy consumption cost, which is based on the level of metrics they can view.

 

What is Business Intelligence?

A short description of the Business Intelligence (BI) Cycle and discussion of three characteristics that can drive BI Cycle productivity.

Business Intelligence (BI) is gathering, managing and analyzing data that is vital for the survival of business in this current hyper-competitive environment (Thomas, 2001 & Vuori, 2006). A BI practitioner is to help decision makers from being overwhelmed with a huge wealth of data. Thus they act as a filter because decision makers will ignore any information that is not useful or meaningful (Vuori, 2006).

The BI Cycle is a continuous cycle, which could be easily reduced to planning the information you want to collect, ethically collect reliable information, analyzing the data to form intelligence, and disseminating the intelligence in an understandable way (Thomas, 2001). It can be expanded into six steps, per Vuori (2006):

  1. Defining information needs
  2. Information gathering
  3. Information processing
  4. Analysis
  5. Dissemination
  6. Utilization and Feedback

A good BI system would make use of a knowledge database and a communication system (Thomas, 2001). With this system and cycle and the correct data, we can have information on our competitors, new technology, public policy, customer sentiment, market forces, supply chain information, etc. Having this information at the disposal of the decision maker will allow for data-driven decisions, to increase their company’s competitive advantage.

Three BI cycle characteristics that drive productivity

  1. Identifying Needs versus Wants in the first phase: Are we defining “Information that is wanted but that is not really needed”, “Information that lacks and that is recognized to be needed” or “Information that is needed but not known to be needed, wanted nor asked for” (Vuori, 2006)? The last two are extremely important. The second one satisfies the end-user of the BI; the other can identify huge revelations. In the last case, if a company only looks at the most active or their biggest competitor they may lose sight of the smaller competitor gaining traction. Getting the right information that is needed is key to not wasting time and increase productivity.
    • Influences the BI practitioner organization
    • Influences the Decision Makers (from first line managers to executive level)
    • Departments in which the data/information is collected from
  2. Protecting their Intellectual Capital: When companies have high turnover rates, or when sensitive/proprietary information is transported on drives or in the minds of the employees, or when senior personnel accidentally give out information in conferences/symposiums (Thomas, 2001), we run the risk of becoming vulnerable as a company. Another example is the supply chain if one company uses a key supplier and their competitor uses that same key supplier to produce a similar product mix. Then what guarantees are being used to ensure that information is being transported between the two companies through the supplier? Information leaks can lead to a loss of a competitive advantage. Protecting the intellectual capital will allow companies not to have to constantly create new products and focus on improving their current product mix.
    • All employees
    • Supply chain (horizontally and vertically)
    • Production lines
    • Human Resources/Legal
    • Management
  3. Dissemination of the correct analysis: This will allow managers to make data-driven decisions that should help protect the business, enter a new market space, etc. If the practitioners of BI, could give the decision maker the information they need based on their analysis and nothing more, we would be saving time, reducing decision fatigue, and time wasted on producing the analytics. Thus, constant communication must occur between the practitioner and decision makers to avoid non-value added work. Feedback cycles, help make future work/endeavors to become more productive over time.
    • Influences the BI practitioner organization
    • Influences the Decision Makers (from first line managers to executive level)
    • Communications departments

An example of an innovative use of BI, is DeKalb County, GA. The CIO, has leveraged BI and analytics, to set up smart policing initiatives (where police are being used more effectively and efficiently to prevent crimes and lowering crime rates), enhance public safety (develop and maintain green neighborhoods), promote jobs and economic development (Matelski, 2015). The CIO has taken data from multiple systems and followed the cycle above to ask the right questions, to identify the needs for particular data, its collection, processing, and analysis to its dissemination to the key decision makers (via intuitive dashboards and key performance indicators).

References:

Innovation: Decision making tools

This post discusses 2 methods for decision making methods: Nominal Grouping Technique and Delphi Method.

Decision making tools:

To provide opportunities for creative and innovative thinking one must (Hashim et al., 2016):

  • Keep asking and looking for answers
  • Making associations and observing correlation
  • Anticipating on future events and happenings
  • Making speculation on possibilities
  • Exploring ideas, actions, and results

Nominal Grouping Technique

A tool for decision making is known as Nominal Grouping Technique (NTG), where it can be used to identify elements of a problem, identify and rank goals by priorities, identify experts, involve people from all levels to promote buy-in of the results (Deip, Thensen, Motiwalla, & Seshardi, 1997; Hashim et al., 2016; Pulat, 2014).  Pulat (2014) describes the process as listing and prioritizing a list of options that is created through a normal brainstorming session, where the list of ideas is generated without criticism or evaluation.  Whereas Deip et al. (1977) describe the process as one that taps into the experiences of all people by asking them all to state their idea on a list, and no discussion is permitted until all ideas are listed, from which after a discussion on each item on the list can ranking each idea can begin. Finally, Hashim et al. (2016) stated that the method is best used to help a small team to reach consensus by gathering ideas from all and exciting buy-in of ideas.

Deip (1977) and Hashim et al. (2016) lists the following advantages and disadvantages to the process:

+     Dominance by high-status, aggressive, or verbal people can participate along with everyone in an equal manner.

+     gain group consensus when everyone is involved

+     The focus remains on the problem and avoids premature evaluation of ideas

+     Minimal interruptions of creative ideas during the silence phase

+     Discussions only clarify items and eliminate misunderstanding

–      Cross fertilization of ideas is diminished

–      May reduce flexibility

–      Bringing everyone to the table may be costly

Delphi method

Dalkey and Helmer (1963), described that the Delphi project was a way to use expert opinion, with the hopes of getting the most strong consensus of a group of experts.  Pulat (2014) states that ideas are listed, and prioritized by a weighted point system to help reduce the number of possible solutions with no communication between the experts or of the results during the process until the very end.  However, Dalkey and Helmer (1963) described the process as repeated interviewing or questioning individual experts while avoiding confrontation of other experts.  Questions are centered on some central problem and between each round of questioning consists of available data requested by one expert to be shown to all experts, or new information that is considered potentially relevant by an expert (Dalkey & Helmer, 1963; Pulat, 2014).  The solution from this technique improves with soliciting experts with a range of experiences (Okoli & Pawlowski, 2004; Pulat, 2014).

Benefits and limitations (Dalkey & Helmer, 1963; Okoli & Pawlowski, 2004):

+     Encourage independent thought

+     Decreases group thought bias (predisposition to be swayed by another person or an entire group)

+     Minimize confrontation of opposing views

+     Easy to correct misconceptions that a person harbored over certain facts or theoretic assumptions

+     Ensuring that relevant data gets feed to all the experts

+     Allows experts to change their mind to obtain results that are free from bias

+     More penetrative analysis on the problem, through each round

–      Very costly on time and resources due to the multiple rounds and seeing each expert 1 on 1

–       Vague questions invite critical comments while providing little value to solving the problem

The main difference from the Delphi technique and nominal grouping is the avoidance of conflict through conducting decision-making processes on a one on one fashion rather than in a group setting.  Given that ideas can be triggered by words (or a particular word order), the nominal approach could, in theory, generate more solutions than the Delphi technique (Hashim et al., 2016; Deip et al., 1977).  Hashim et al. (2016) stated that other triggers for imagination/creativity/ideas could be images, events, possible events, conflict events, conflict occurrences, emotions, environment, culture, games, music, etc. But, with independent meetings rather than a group meeting, solutions are well thought out and avoid group thought bias (Dalkey & Helmer, 1963).  When, selecting between these two techniques, the type of problem and desired outcome of the process should drive the methodology.  However, there are many other different types of decision-making techniques as well, like multi-voting, basic brainstorming, etc. (Pulat, 2014).

Resources:

  • Dalkey, N., & Helmer, O. (1963). An experimental application of the Delphi method to the use of experts. Management science9(3), 458-467.
  • Deip, P., Thesen, A., Motiwalla, J., & Seshardi, N. (1977). Nominal group technique.
  • Hashim, A. T., Ariffin, A., Razalli, A. R., Shukor, A. A., NizamNasrifan, M., Ariffin, A. K., … & Yusof, N. A. A. (2016). Nominal Group Technique: a Brainstorming Tool for Identifying Learning Activities Using Musical Instruments to Enhance Creativity and Imagination of Young Children.International Advisory Board23, 80.
  • Okoli, C., & Pawlowski, S. D. (2004). The Delphi method as a research tool: an example, design considerations and applications. Information & management42(1), 15-29.
  • Pulat, B. (2014) Lean/six sigma black belt certification workshop: body of knowledge. Creative Insights, LLC.

Innovation: Technology and Trends in museums

This post will discuss 1 technology and 1 key trend in museums.

Definition of Museum: term applied to zoos, historical sites, botanical gardens, aquariums, planetariums, children’s museums, and science and technology centers (US DoJ, 2009)

Museum Edition: Key trend: Short-term trend: Driving Ed Tech adoption in museums for the next one to two years

With the introduction of mobile technology, increasing in processing speed every year and the introduction of Artificial Reality (AR) through Pokémon Go, there is a huge opportunity to create discoverable museums displays serviceable through mobile devices (CNET, 2016; New Horizons, 2016; Bonnington, 2015). The AR technology uses the mobile device camera and interlaces pocket monster’s called Pokémon in real time through some creative coding, therefore through a mobile device these Pokémon are made visible, even though they do not exist (CNET, 2016). Mobile devices are not just for gaming they have become the primary computing device for most people across the globe as well as a primary way to access information (Bonnington, 2015; New Horizons, 2016).  Adding in, Pokémon Go’s added benefit, which promotes end users to walk to key areas have been designated to be either Pokestops (for getting key items for game play) or Pokémon Gym (to either build up a team’s gym or take it down) therefore enhancing the experience (CNET, 2016).  It is projected that in the next 5-years mobile devices could have enough processing power to handle 4K streaming, immersive virtual reality gaming, and seamless multi-tasking (Bonnington, 2015).  Therefore, creating a new museum experience using an AR system similar to Pokémon Go, with interactive museum displays similar to Pokestops or Pokémon Gyms could become a reality, enhance exploration, interpretation, and sharing.  This would essentially be a more interactive self-guided virtual tour, similar to what has been implemented in the Broad Museum in Los Angeles and is a prioritized strategy for San Francisco’s Museum of Modern Art (New Horizons, 2016).  If we can centralize/core up all of the museums into one interface similar to what Israel is doing with their museums (so far they have represented 60 museums), we could see bigger adoption rates (Museums in Israel, n.d.). According to New Horizons (2016), hyper zoom features on particular displays, gamification, location-based services, AR, ad social networking integration can increase patron’s experiences.  This area all aspects that Pokémon Go is trying to promote through their mobile device game.

Forces that impact the trend

  • Technological: There is a need to update the WiFi Infrastructure in museums to handle the increase in demand, which is a key force negatively impacting this technology (New Horizons, 2016; Government of Canada, n.d.). Though, computer codes and infrastructure designs are becoming more open source which is a force of positive impact.
  • Safety: There is added need to improve design and flow of a museum to accommodate distracted patrons using this new AR system.
  • Cultural: Museums at one point use to ban cameras, but now with many mobile devices and the proposed AR system above, it would be hard to enforce now (New Horizons, 2016). Also, given the fact that museums are wanting to increase participation.

Museum Edition: Technology: Improving Accessibility for Disabled populations

One in 10 people lives with a disability or approximately 0.65 Billion people (Disabled World, n.d.).  It is imperative and ethical that museums create exhibits for all their patrons. Deviations from societal norms have caused people with disabilities in the past to be considered as signs of divine disapproval, with the end thoughts and actions stating that they need to be fixed (Grandin, 2016), when there is nothing wrong with them, to begin with.  A few of the many areas for improvements with technology are:

  • Websites and online programming: making them more accessible and eliminating barriers through the incorporation of universally good design (New Horizons, 2016; Grandin, 2016).
  • Addressing Article 30 of the UN Disability Convention: Implementing technology to allow enjoyed access to performances, exhibits, or services (UN, 2006). This would allow, encourage, and promote all people to participate to the fullest extent possible (New Horizons, 2016; UN, 2006).
  • Use of software to create alternative formats for printed brochures: Braille, CDs, large print (US DoJ, 2009). Also, using that same software to create Braille exhibit guides (New Horizons, 2016).
  • Using closed captions for video displays (New Horizons, 2016).

An excellent way to test universally good design is for museums to partner with disabled students to test their design’s usability and provided meaningful feedback (New Horizons, 2016). Essentially, one way to approach universally good design is to ask the three questions (Wyman, Timpson, Gillam, & Bahram, 2016):

  1. “Where am I?”
  2. “Where can I go from here?”
  3. “How can I get there?” or “How can I make that happen?”

 

Forces that impact the technology

  • Educational: There is a lack of disability responsiveness training by the staff of a museum, which is leading to a lack of knowledge of best practices, how best to serve the disable population, etc. (New Horizons, 2016).
  • Financial: Lack of resources to design or even implement new programs for people with disabilities is a key force negatively impacting this technology (New Horizons, 2016; Grandin, 2016). However, the best designs are simple, intuitive, flexible, and equitable, therefore making accessible design a universally good design (Grandin, 2016; Wyman et al., 2016). How do museums know about universally good design? Museums are able to accomplish this by working with the disable community and advocacy organizations (New Horizons, 2016). So, as museums begin making their updates on exhibits, or to their building, they should take into account accessible design. For people with disabilities, a universally good design is one where there is no additional modifications are needed for them (Grandin, 2016).

Resources

Big Data Analytics: Compelling Topics

This post reviews and reflects on the knowledge shared for big data analytics and my opinions on the current compelling topics in the field.

Big Data and Hadoop:

According to Gray et al. (2005), traditional data management relies on arrays and tables in order to analyze objects, which can range from financial data, galaxies, proteins, events, spectra data, 2D weather, etc., but when it comes to N-dimensional arrays there is an “impedance mismatch” between the data and the database.    Big data, can be N-dimensional, which can also vary across time, i.e. text data (Gray et al., 2005). Big data, by its name, is voluminous. Thus, given the massive amounts of data in Big Data that needs to get processed, manipulated, and calculated upon, parallel processing and programming are there to use the benefits of distributed systems to get the job done (Minelli, Chambers, & Dhiraj, 2013).  Parallel processing allows making quick work on a big data set, because rather than having one processor doing all the work, you split up the task amongst many processors.

Hadoop’s Distributed File System (HFDS), breaks up big data into smaller blocks (IBM, n.d.), which can be aggregated like a set of Legos throughout a distributed database system. Data blocks are distributed across multiple servers. Hadoop is Java-based and pulls on the data that is stored on their distributed servers, to map key items/objects, and reduces the data to the query at hand (MapReduce function). Hadoop is built to deal with big data stored in the cloud.

Cloud Computing:

Clouds come in three different privacy flavors: Public (all customers and companies share the all same resources), Private (only one group of clients or company can use a particular cloud resources), and Hybrid (some aspects of the cloud are public while others are private depending on the data sensitivity.  Cloud technology encompasses Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS).  These types of cloud differ in what the company managers on what is managed by the cloud provider (Lau, 2011).  Cloud differs from the conventional data centers where the company managed it all: application, data, O/S, virtualization, servers, storage, and networking.  Cloud is replacing the conventional data center because infrastructure costs are high.  For a company to be spending that much money on a conventional data center that will get outdated in 18 months (Moore’s law of technology), it’s just a constant sink in money.  Thus, outsourcing the data center infrastructure is the first step of company’s movement into the cloud.

Key Components to Success:

You need to have the buy-in of the leaders and employees when it comes to using big data analytics for predictive, prescriptive or descriptive purposes.  When it came to buy-in, Lt. Palmer had to nurture top-down support as well as buy-in from the bottom-up (ranks).  It was much harder to get buy-in from more experienced detectives, who feel that the introduction of tools like analytics, is a way to tell them to give up their long-standing practices and even replace them.  So, Lt. Palmer had sold Blue PALMS as “What’s worked best for us is proving [the value of Blue PALMS] one case at a time, and stressing that it’s a tool, that it’s a compliment to their skills and experience, not a substitute”.  Lt. Palmer got buy-in from a senior and well-respected officer, by helping him solve a case.  The senior officer had a suspect in mind, and after feeding in the data, the tool was able to predict 20 people that could have done it in an order of most likely.  The suspect was on the top five, and when apprehended, the suspect confessed.  Doing, this case by case has built the trust amongst veteran officers and thus eventually got their buy in.

Applications of Big Data Analytics:

A result of Big Data Analytics is online profiling.  Online profiling is using a person’s online identity to collect information about them, their behaviors, their interactions, their tastes, etc. to drive a targeted advertising (McNurlin et al., 2008).  Profiling has its roots in third party cookies and profiling has now evolved to include 40 different variables that are collected from the consumer (Pophal, 2014).  Online profiling allows for marketers to send personalized and “perfect” advertisements to the consumer, instantly.

Moving from online profiling to studying social media, He, Zha, and Li (2013) stated their theory, that with higher positive customer engagement, customers can become brand advocates, which increases their brand loyalty and push referrals to their friends, and approximately 1/3 people followed a friend’s referral if done through social media. This insight came through analyzing the social media data from Pizza Hut, Dominos and Papa Johns, as they aim to control more of the market share to increase their revenue.  But, is this aiding in protecting people’s privacy when we analyze their social media content when they interact with a company?

HIPAA described how we should conduct de-identification of 18 identifiers/variables that would help protect people from ethical issues that could arise from big data.   HIPAA legislation is not standardized for all big data applications/cases; it is good practice. However, HIPAA legislation is mostly concerned with the health care industry, listing those 18 identifiers that have to be de-identified: Names, Geographic data, Dates, Telephone Numbers, VIN, Fax, Device ID and serial numbers, emails addresses, URLs, SSN, IP address, Medical Record Numbers, Biometric ID (fingerprints, iris scans, voice prints, etc), full face photos, health plan beneficiary numbers, account numbers, any other unique ID number (characteristic, codes, etc), and certifications/license numbers (HHS, n.d.).  We must be aware that HIPAA compliance is more a feature of the data collector and data owner than the cloud provider.

HIPAA arose from the human genome project 25 years ago, where they were trying to sequence its first 3B base pair of the human genome over a 13 year period (Green, Watson, & Collins, 2015).  This 3B base pair is about 100 GB uncompressed and by 2011, 13 quadrillion bases were sequenced (O’Driscoll et al., 2013). Studying genomic data comes with a whole host of ethical issues.  Some of those were addressed by the HIPPA legislation while other issues are left unresolved today.

One of the ethical issues that arose were mentioned in McEwen et al. (2013), for people who have submitted their genomic data 25 years ago can that data be used today in other studies? What about if it was used to help the participants of 25 years ago to take preventative measures for adverse health conditions?  However, ethical issues extend beyond privacy and compliance.  McEwen et al. (2013) warn that data has been collected for 25 years, and what if data from 20 years ago provides data that a participant can suffer an adverse health condition that could be preventable.  What is the duty of the researchers today to that participant?

Resources:

FUTURING & INNOVATION: WHAT IS INNOVATION?

What is innovation and how it is different from inventions.

One could define Innovation as an idea, value, service, technology, method, or thing that is new to an individual, a family, a firm, a field, an industry, or a country (Jeryaraj & Sabhewal, 2014; Rogers, 1962; Rogers, 2010; Sáenz-Royo, Gracia-Lázaro, & Moreno, 2015). Based on this definition above an invention can be seen as an innovation, but not all innovations are inventions (Robertson, 1967).  Also, even though something may not be considered as an innovation by one entity, it can still be considered as innovative if adopted by a completely different entity (Newby, Nguyen, & Waring, 2014).

Innovation moving from one entity to another can be considered as Diffusion of innovation.  Diffusion of Innovation is a theory that is concerned with the why, what, how, and rate of innovation dissemination and adoption between entities, which are carried out through different communication channels over a period of time (Ahmed, Lakhani, Rafi, Rajkumar, & Ahmed, 2014; Bass, 1969; Robertson, 1967; Rohani & Hussin, 2015; Rogers, 1967; Rogers 2010).  However, there are possible forces that can act on an innovation that can influence the likelihood of the innovation success, for example financial, technological, cultural, economical, legal, ethical, temporal, social, global, national, local, etc.  Therefore, when viewing a new technology or innovation for the future, one must think critically about it and evaluate it from different forces/lenses.

Resources:

  • Ahmed, S., Lakhani, N. A., Rafi, S. K., Rajkumar, & Ahmed, S. (2014). Diffusion of innovation model of new services offerings in universities of karachi.International Journal of Technology and Research, 2(2), 75-80.
  • Bass, F. M. (1969). A new product growth for model consumer durables. Management science15(5), 215-227.
  • Jeyaraj, A., & Sabherwal, R. (2014). The bass model of diffusion: Recommendations for use in information systems research and practice.JITTA : Journal of Information Technology Theory and Application, 15(1), 5-30.
  • Newby, M., Nguyen, T.,H., & Waring, T.,S. (2014). Understanding customer relationship management technology adoption in small and medium-sized enterprises. Journalof Enterprise Information Management, 27(5), 541.
  • Robertson, T. S. (1967). The process of innovation and the diffusion of innovation. The Journal of Marketing, 14-19.
  • Rogers, E. M. (1962). Diffusion of innovations. (1st ed.). New York: Simon and Schuster.
  • Rogers, E. M. (2010). Diffusion of innovations. (4st ed.). New York: Simon and Schuster.
  • Rohani, M. B., & Hussin, A. R. C. (2015). An integrated theoretical framework for cloud computing adoption by universities technology transfer offices (TTOs).Journal of Theoretical and Applied Information Technology,79(3), 415-430.
  • Sáenz-Royo, C., Gracia-Lázaro, C., & Moreno, Y. (2015). The role of the organization structure in the diffusion of innovations.PLoS One, 10(5). doi: http://dx.doi.org/10.1371/journal.pone.0126078