Attention Deficit Hyperactivity Disorder: A gift

In 2016, I was diagnosed with Attention Deficit Hyperactivity Disorder (ADHD). ADHD brains have abnormally low levels of dopamine activating the frontal cortex (Flippin, n.d.) According to the NIH (n.d.), this involves

  • inattention where I would find doing persistent tasks rather difficult
  • hyperactivity where I would find it rather difficult to stop and tend to wear people out
  • impulsive where I conduct content switching multiple time per hour

However, ADHD shouldn’t be seen as a lack of attention, but having a deregulation of the attention system (Flippin, n.d.). ADHD occurs across a spectrum and I have a high functioning form of it, such that as long as a task or job description is interesting enough for me, I will be a high performer, but at this side of the spectrum and my analytical mind, it also comes with the drawback of over analysis.

So, sharing this in this post isn’t easy, but the key part of why I mention it is because of the line “… as long as a task or job description is interesting enough from me, I will be a high performer…” This is an ability under ADHD called hyperfocus. Hyperfocus is the other side of the coin of ADHD, where the person with ADHD finds something really interesting that the person will intently focus on that one item without interruption (Flippin, n.d.).  Over the past year, I have been identifying job roles and statements of works where I can be at a state of hyperfocus. So, one method to encourage self-reflection with me is having a someone to ask me focused questions to engage my thoughts.

Due to hyperfocus, I have been using to do my dissertation, and this is why Colorado Technical University (CTU) was the best choice of school for me. Hyperfocus can be channeled into productive work like focusing on a dissertation where the topic is of utmost interest to me (Flippin, n.d.). This ability has allowed me to have completed my dissertation at a record pace because hours seemed like minutes to me when I am in this headspace. At other universities, you have to do research for your dissertation under a supervisor who has grant funding under a particular area, so you don’t control all the variables in picking your topic. This is where I thrived at CTU; I was able to pick anything of interest that fell under Computer Science and data analytics, which were my other passions.

Cognitive behavioral therapy like meditation and mindfulness were two other techniques my therapist has advised me to use to lessen the symptoms of ADHD, and the NIH (n.d.) agrees. It allows me to make incremental improvements in focus, but it also gives me the second method to encourage self-reflection. During meditation, I focus on breath control, let my thoughts come in, observe them pass by, and refocus on my breath. I have been doing this for 10 minutes every day since June 2015, with about an 80% consistency rate. With all new habits, if you don’t do it 100% of the time, just dust yourself off and say, ok, I missed 1-5 days, but I will start it up again.

Meditation, mindfulness, and therapist focused questions have been effective methods to encourage self-reflection for me in the present.

Resources:

Different Types of Leadership Styles

Leadership Theories:

  • Chapman and Sisodia (2015) define leadership as the value they bring to people. The author’s primary guiding value is that “We measure success by the way we touch the lives of people.” This type of leadership practice stems from treating their followers the similarly to how someone would like their kids to be treated in the work environment. This type of leadership relies on coaching the leader’s followers to build on the follower’s greatness. Then recognition is done that shake employees to the core by involving the employee’s family, so that the employee’s family could be proud of their spouse or parent. The goal of this type of leadership is to have the employee seen, valued, and heard such that they want to be their best and do their best not just for the company but for their coworkers as well.
  • Cashman (2010) defines leadership from an inside-out approach of personal mastery. This type of leadership style is focused on self-awareness of the leader’s conscious beliefs and shadow beliefs to grow and deepen the leader’s authenticity. Cashman pushes the leader to identify, reflect and recognize their core talents, values and purpose. With the purpose of any leadership is understanding “How am I going to make a difference?” and “How am I going to enhance other people’s lives?” Working from the leader’s core purpose releases more of that untapped leader’s energy to do more meaning work that frees the leader and opens leaders up to different possibilities, more so than just working towards a leader’s goals.
  • Open Leadership: Has five rules, which allow for respect and empowerment of the customers and employees, to consistently build trust, nurtures curiosity and humility, holding openness accountable, and allows for forgiving failures (Li, 2010).  These leaders must let go of the old mentality of micromanaging, because once they do let go of micromanagement, these leaders are now open to grow into new opportunities. This thought process is shares commonalities with knowledge sharing, if people were to share the knowledge that they accumulated, these people would be able to let go of your current tasks, such that these people can focus on new and better opportunities. Li stated that open Leadership allows for leaders to build, deepen, and nurture relationships with the customers and employees.  Open leadership is a theory of leadership that is customer and employee centered.
  • Values based leadership requires four principles: self-reflection, balance, humble, and self-confidence (Kraemer, 2015). Through self-reflection, leaders identify their core beliefs and values that matters to the leader. Leaders that view situations from multiple perspectives to gain a deeper understanding of the situation is considered balanced. Humility in leaders refers to not forgetting who the leader is and where the leaders come from to gain appreciation for each person. Finally, self-confidence is the leader accepting themselves as they are, warts and all.

Parts of these leadership theories that resonates

Each of these leadership theories above have a few concepts in common. Most of the leadership theories agree with each other because each leadership theory has a focus on growing the leader’s followers (Cashman, 2010; Chapman & Sisodia, 2015; Li, 2010; Kraemer, 2015). Cashman and Kraemer focuses on self-reflection, so that the leader can understand personal values, strengths, and weaknesses. For Cashman, self-reflection focuses on purpose, which is where there is an unbound level of energy. Whereas Kraemer, self-reflection focuses on defining the leader’s values and constant assessment and realigning the leader’s roles towards the leader’s value.

Resources:

  • Cashman, K. (2010) Leadership from the inside out: Becoming a leader for life. (2nd ed.). San Francisco, Berrett-Koehler Publishing, Inc.
  • Chapman, B. & Sisodia, R. (2015) Everybody matters: The extraordinary power of caring for your people like family. New York, Penguin.
  • Li, C. (2010). Open Leadership: How Social Technology Can Transform the Way You Lead, (1st ed.). Vitalbook file.
  • Kraemer, H. M. J. (2015). Becoming the best. (1st ed.). New Jersey, Wiley.

Compelling Topics in Advance Topics

  • The evolution of data to wisdom is defined by the DIKW pyramid, where Data is just facts without any context, but when facts are used to understand relationships it generates Information (Almeyer-Stubbe & Coleman, 2014). That information can be used to understand patterns, it can then help build Knowledge, and when that knowledge is used to understand principles, it builds Wisdom (Almeyer-Stubbe & Coleman, 2014; Bellinger, Castro, Mills, n.d.). Building an understanding to jump from one level of the DIKW pyramid, is an appreciation of learning “why” (Bellinger et al., n.d.).
  • The internet has evolved into a socio-technical system. This evolution has come about in five distinct stages:
    • Web 1.0: Created by Tim Berners-Lee in the 1980s, where it was originally defined as a way of connecting static read-only information hosted across multiple computational components primarily for companies (Patel, 2013).
    • Web 2.0: Changed the state of the internet from a read-only state to a read/write state and had grown communities that hold a common interest (Patel, 2013). This version of the web led to more social interaction, giving people and content importance on the web, due to the introduction of social media tools through the introduction of web applications (Li, 2010; Patel, 2013; Sakr, 2014). Web applications can include event-driven and object-oriented programming that are designed to handle concurrent activities for multiple users and had a graphical user interface (Connolly & Begg, 2014; Sandén, 2011).
  • Web 3.0: This is the state the web at 2017. Involves the semantic web that is driven by data integration through the uses of metadata (Patel, 2013). This version of the web supports a worldwide database with static HTML documents, dynamically rendered data, next standard HTML (HTML5), and links between documents with hopes of creating an interconnected and interrelated openly accessible world data such that tagged micro-content can be easily discoverable through search engines (Connolly & Begg, 2014; Patel, 2013). This new version of HTML, HTML5 can handle multimedia and graphical content, which are great for semantic content (Connolly & Begg, 2014). Also, end-users are beginning to build dynamic web applications for others to interact with (Patel, 2013).
    • Web 4.0: It is considered the symbiotic web, where data interactions occur between humans and smart devices, the internet of things (Atzori, 2010; Patel, 2013). These smart devices can be wired to the internet or connected via wireless sensors through enhanced communication protocols (Atzori, 2010). Thus, these smart devices would have read and write concurrently with humans, where the largest potential of web 4.0 has these smart devices analyze data online and begin to migrate the online world into the real world (Patel, 2013).
    • Web 5.0: Previous iterations of the web do not perceive people’s emotion, but one day it could be able to understand a person’s emotional (Patel, 2013). Kelly (2007) predicted that in 5,000 days the internet would become one machine and all other devices would be a window into this machine. In 2007, Kelly stated that this one machine “the internet” has the processing capability of one human brain, but in 5,000 days it will have the processing capability of all the humanity.
  • MapReduce is a framework that uses parallel sequential algorithms that capitalize on cloud architecture, which became popular under the open source Hadoop project, as its main executable analytic engine (Lublinsky et al., 2014; Sadalage & Fowler, 2012; Sakr, 2014). Another feature of MapReduce is that a reduced output can become another’s map function (Sadalage & Fowler, 2012).
  • A sequential algorithm is a computer program that runs on a sequence of commands, and a parallel algorithm runs a set of sequential commands over separate computational cores (Brookshear & Brylow, 2014; Sakr, 2014).
  • A parallel sequential algorithm runs a full sequential program over multiple but separate cores (Sakr, 2014).
    • Shared memory distributed programming: Is where serialized programs run on multiple threads, where all the threads have access to the underlying data that is stored in shared memory (Sakr, 2014). Each thread should be synchronized as to ensure that read and write functions aren’t being done on the same segment of the shared data at the same time. Sandén (2011) and Sakr, (2014) stated that this could be achieved via semaphores (signals other threads that data is being written/posted and other threads should wait to use the data until a condition is met), locks (data can be locked or unlocked from reading and writing), and barriers (threads cannot run on this next step until everything preceding it is completed).
    • Message passing distributed programming: Is where data is stored in one location, and a master thread helps spread chunks of the data onto sub-tasks and threads to process the overall data in parallel (Sakr, 2014). There are explicitly direct send and receive messages that have synchronized communications (Lublinsky et al., 2013; Sakr, 2014).         At the end of the runs, data is the merged together by the master thread (Sakr, 2014).
  • A scalable multi-level stochastic model-based performance analysis has been proposed by Ghosh, Longo, Naik and Trivedi (2012) for Infrastructure as a Service (IaaS) cloud computing. Stochastic analysis and models are a form of predicting how probable an outcome will occur using a form of chaotic deterministic models that help in dealing with analyzing one or more outcomes that are cloaked in uncertainty (Anadale, 2016; Investopedia, n.d.).
  • Three-pool cloud architecture: In IaaS cloud scenario, a request for resources initiates a request for one or more virtual machines to be deployed to access a physical machine (Ghosh et al., 2012). The architecture assumes that physical machines are grouped in pools hot, warm, and cool (Sakr, 2014). The hot pool consists of physical machines that are constantly on, and virtual machines are deployed upon request (Ghosh et al., 2012; Sakr, 2014).   Whereas a warm pool has the physical machines in power saving mode, and cold pools have physical machines that are turned off. For both warm and cold pools setting up a virtual machine is delayed compared to a hot pool, since the physical machines need to be powered up or awaken before a virtual machine is deployed (Ghosh et al., 2012). The optimal number of physical machines in each pool is predetermined by the information technology architects.
  • Data-at-rest is probably considered easier to analyze; however, this type of data can also be problematic. If the data-at-rest is large in size and even if the data does not change or evolve, its large size requires iterative processes to analyze the data.
  • Data-in-motion and streaming data has to be iteratively processed until there is a certain termination condition is reached and it can be reached between iterations (Sakr, 2014). However, Sakr (2014) stated that MapReduce does not support iterative data processing and analysis directly.
    • To deal with datasets that require iterative processes to analyze the data, computer coders need to create and arrange multiple MapReduce functions in a loop (Sakr, 2014). This workaround would increase the processing time of the serialized program because data would have to be reloaded and reprocessed, because there is no read or write of intermediate data, which was there for preserving the input data (Lusblinksy et al., 2014; Sakr, 2014).
  • Data usually gets update on a regular basis. Connolly and Begg (2014) defined that data can be updated incrementally, only small sections of the data, or can be updated completely. This data update can provide its own unique challenges when it comes to data processing.
    • For processing incremental changes on big data, one must split the main computation to its sub-computation, logging in data updates in a memoization server, while checking the inputs of the input data to each sub-computation (Bhatotia et al., 2011; Sakr, 2014). These sub-computations are usually mappers and reducers (Sakr, 2014). Incremental mappers check against the memoization servers, and if the data has already been processed and unchanged it will not reprocess the data, and a similar process for incremental reducers that check for changed mapper outputs (Bhatotia et al., 2011).
  • Brewer (2000) and Gilbert and Lynch (2012) concluded that for a distributed shared-data system you could only have at most two of the three properties: consistency, availability, partition-tolerance (CAP theory). Gilbert and Lynch (2012) describes these three as akin to the safety of the data, live data, and reliability of the data.
    • In a NoSQL distributed database systems (DDBS), it means that partition-tolerance should exist, and therefore administrators should then select between consistency and availability (Gilbert & Lynch, 2012; Sakr, 2014). However, if the administrators focus on availability they can try to achieve weak consistency, or if the administrators focus on consistency, they are planning on having a strong consistency system. Strong consistency ensures that all copies of the data are updated in real-time, whereas weak consistency means that eventually all the copies of the data will be updated (Connolly and Begg, 2014; Sakr, 2014). An availability focus is having access to the data even during downtimes (Sakr, 2014).
  • Volume visualization is used to understand large amounts of data, in other words, big data, where it can be processed on a server or in the cloud and rendered onto a hand-held device to allow for the end user to interact with the data (Johnson, 2011). Tanahashi, Chen, Marchesin and Ma (2010), define a framework for creating an entirely web-based visualization interface (or web application), which leverages the cloud computing environment. The benefit of using this type of interface is that there is no need to download or install the software, but that the software can be accessed through any mobile device with a connection to the internet.

Resources:

  • Brookshear, G., Brylow, D. (2014). Computer Science: An Overview, (12th ed.). Vitalbook file.
  • Connolly, T., & Begg, C. (2014). Database Systems: A Practical Approach to Design, Implementation, and Management, (6th ed.). Pearson Learning Solutions. VitalBook file.
  • Ghosh, R., Longo, F., Naik, V. K., & Trivedi, K. S. (2013). Modeling and performance analysis of large scale IaaS clouds. Future Generation Computer Systems, 29 (5), 1216-1234.
  • Gilbert, S., and Lynch N. A. (2012). Perspectives on the CAP Theorem. Computer 45(2), 30–36. doi: 10.1109/MC.2011.389
  • Investopedia (n.d.). Stochastic modeling. Retrieved from http://www.investopedia.com/terms/s/stochastic-modeling.asp
  • Johnson, C. (2011) Visualizing large data sets. TEDx Salt Lake City. Retrieved from https://www.youtube.com/watch?v=5UxC9Le1eOY
  • Kelly, K. (2007). The next 5,000 days of the web. TED Talk. Retrieved from https://www.ted.com/talks/kevin_kelly_on_the_next_5_000_days_of_the_web
  • Li, C. (2010). Open Leadership: How Social Technology Can Transform the Way You Lead, (1st ed.). VitalBook file.
  • Lublinsky, B., Smith, K. T., & Yakubovich, A. (2013). Professional Hadoop Solutions. Vitalbook file.
  • Patel, K. (2013). Incremental journey for World Wide Web: Introduced with Web 1.0 to recent Web 5.0 – A survey paper. International Journal of Advanced Research in Computer Science and Software Engineering, 3(10), 410–417.
  • Sadalage, P. J., Fowler, M. (2012). NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence, (1st ed.). Vitalbook file.
  • Sakr, S. (2014). Large Scale and Big Data, (1st ed.). Vitalbook file.
  • Sandén, B. I. (2011). Design of Multithreaded Software: The Entity-Life Modeling Approach. Wiley-Blackwell. VitalBook file.
  • Tanahashi, Y., Chen, C., Marchesin, S., & Ma, K. (2010). An interface design for future cloud-based visualization services. Proceedings of 2010 IEEE Second International Conference on Cloud Computing Technology and Service, 609–613. doi: 10.1109/CloudCom.2010.46

 

Adv Topics: Possible future study

Application of Artificial Intelligence for real-time cybersecurity threat identification and resolution for network vulnerabilities in the cloud

Motivation: Artificial Intelligence (AI) is an embedded technology, based off of the current infrastructure (i.e. supercomputers), big data, and machine learning algorithms (Cyranoski, 2015; Power, 2015). AI can make use of data hidden in “dark wells” and silos, where the end-user had no idea that the data even existed, to begin with (Power, 2015). The goal of AI is to use huge amounts of data to draw out a set of rules through machine learning that will effectively supplement cyber security experts in identifying and remediating cyberattacks (Cringely, 2013; Power, 2015).

Problem statement: Must consider an attacker’s choices are unknown, if they will be successful in their targets and goals and the physical paths for an attack in the explicit and abstract form, which are hard to do without the use of big data analysis coupled with AI for remediation.

Hypothesis statement:

  • Null: The use of Bayesian Networks and AI cannot be used for both identification and remediation of cyber-attacks that deal with the network infrastructure on a cloud environment.
  • Alternative: The use of Bayesian Networks and AI can be used for both identification and remediation of cyber-attacks that deal with the network infrastructure on a cloud environment.

Proposed solution:

  • New contribution made to the body of knowledge by your proposed solution: The merging of these two technologies can be a first line of defense that can work 24×7 and learn new remediation and identification techniques as time moves forward.

2 research questions:

  • Can the merger of Bayesian Networks and AI be used for both identification and remediation of cyber-attacks that deal with the network infrastructure on a cloud environment? –> This is just based off of the hypothesis.
  • Can the use of Bayesian Networks and AI can be used for both identification and remediation of cyber-attacks that deal with multiple network attacks from various white hat hackers at the same time? –> This is taken from real life. A fortune 500 company is constantly bombarded with thousand if not millions of attempted cyber attackers at a given day. If there is a vulnerability found, it might result in multiple people entering in through that vulnerability and doing serious damage. Could this proposed system handle multiple attacks coming right at the cloud network infrastructure? Essentially providing practitioners some tangible results.

Resources:

Adv Topics: Big data addressing Security Issues

Cybersecurity attacks are limited by their physical path, the network connectivity and reachability limits, and the attack structure, which is by exploiting a vulnerability that enables an attack (Xie et al., 2010). Previously, automated systems and tools were implemented to deal with moderately skilled cyber-attackers, plus white hat hackers are used to identify security vulnerabilities, but it is not enough to keep up with today’s threats (Peterson, 2012). Preventative measure only deals with newly discoverable items, not the ones that have yet to be discoverable (Fink, Sharifi, Carbonell, 2011). These two methods are preventative measures, with the goal of protecting the big data and cyberinfrastructure used to store and process big data from malicious intent. Setting up these preventative measures are no longer good enough to protect big data and its infrastructure. Thus there has been a migration towards using real-time analysis on monitored data (Glick, 2013). Real-time analysis is concerned with “What is really happening?” (Xie et al., 2010).

If algorithms used to process big data can be pointed towards cyber security, Security Information and Event Management (SIEM), it can add another solution towards identifying cyber security threat (Peterson, 2012). All that big data cyber security analysis will do is make security teams faster to react if they have the right context to the analysis, but it won’t make the security teams act in a more proactive way (Glick, 2013). SIEM has gone above and beyond current cyber security prevention measures, usually by collecting the log data in real time that is generated and processing the log data in real time using algorithms like correlation, pattern recognition, behavioral analysis, and anomaly analysis (Glick, 2013; Peterson, 2012). Glick (2013), reported that data from a variety of sources help build a cyber security risk and threat profile in real-time that can be taken to cyber security teams to react to each threat in real time, but it works on small data sets.

SIEM couldn’t handle the vast amounts of big data and therefore analyzing the next cyber threats came from using tools like Splunk to identify anomalies amongst the data (Glick, 2013). SIEM was proposed for use in the Olympics games, but Splunk was being used for investment banking purposes (Glick, 2013; Peterson, 2012). FireEye is another big data analytics security tool that was used for identifying network threats (Glick, 2013).

  • Xie et al. (2010), proposed the use of Bayesian networks for cyber security analysis. This solution considers that modeling cyber security profiles are difficult to construct and uncertain, plus they built the tool for near real-time systems. That is because Bayesian models try to model cause-and-effect relationships. Using deterministic security models are unrealistic and do not capture the full breadth of a cyber attack and cannot capture all the scenarios for real-time analysis. If the Bayesian models are built to reflect reality, then it could be used for near real-time analysis. In real-time cyber security analysis, analysts must consider an attacker’s choices are unknown or if they will be successful in their targets and goals. Building a modular graphical attack model can help calculate uncertainties, which can be done by decomposing the problem into finite small parts, where realistic data can be used to pre-populate all the parameters. These modular graphical attack models should consider the physical paths in the explicit and abstract form. Thus, the near real-time Bayesian network considers the three important uncertainties introduced in a real-time attack (italicized). Using this method is robust as determined by a holistic sensitivity analysis.
  • Fink et al. (2011), proposed a mashup of crowdsourcing, machine learning, and natural language processing to dealing both vulnerabilities and careless end user actions, for automated threat detection. In their study, they focused on scam websites and cross-site request forgeries. For scam website identification, the concept of using crowdsourced end users to flag certain websites as a scam is key to this process. The goal is that when a new end user approaches the scam website, a popup appears stating “This website is a scam! Do not provide personal information.” The authors’ solution ties data from heterogeneously common web scam blacklist databases. This solution has high precisions (98%), and high recall (98.1%) on their test of 837 manually labeled sites that was cross-validated using a ten-fold cross -validation analysis between the blacklisted database. The current system’s limitation does not address new threats and different sets of threats.

These studies and articles illustrate that the benefit of using big data analytics for cybersecurity analysis provides the following benefits (Fink et al., 2011; Glick, 2013; IBM Software, 2013; Peterson, 2012; Xie et al., 2010):

(a) moving away from preventative cybersecurity and moving towards real-time analysis to become reactive faster to a current threat;

(b) creating security models that more accurately reflect the reality and uncertainty that exists between the physical paths, successful attacks, and unpredictability of humans for near real-time analysis;

(c) provide a robust identification technique; and

(d) reduction of identifying false positives, which eat up the security team’s time.

Thus, helping security teams to solve difficult issues in real-time. However, this is a new and evolving field that is applying big data analytics. Thus it is expected that many tools will be developed, and the most successful tool would be able to provide real-time cybersecurity data analysis with the huge set of algorithms each aimed at studying different types of attacks. It is even possible for one day to see artificial intelligence to become the next new phase of providing real-time cyber security analysis and resolutions.

Resources:

Adv Topics: Security Issues with Cloud Technology

Big data requires huge amounts of resources to analyze it for data driven decisions, thus there has been a gravitation towards cloud computing to work in this era of big data (Sakr, 2014). Cloud technology is different than personal systems that place different demands on cyber security, where personal systems could have single authority systems and cloud computing systems, have no individual owners, have multiple users, groups rights, and shared responsibility (Brookshear & Brylow, 2014; Prakash & Darbari, 2012). Cloud security can be just as good or better than personal systems because cloud providers could have the economies of scales that can support a budget to have an information security team that many organizations may not be able to afford (Connolly & Begg, 2014). Cloud security can be designed to be independently modular, which is great for heterogenous distributed systems (Prakash & Darbari, 2012).

For cloud computing eavesdropping, masquerading, message tampering, replaying the message, and denial of services are security issues that should be addressed (Prakash & Darbari, 2012). Sakr (2014) stated that exploitation of co-tenancy, a secure architecture for the cloud, accountability for outsourced data, confidentiality of data and computation, privacy, verifying outsourced computation, verifying capability, cloud forensics, misuse detection, and resource accounting and economic attacks are big issues for cloud security. This post will discuss the exploitation of co-tendency and confidentiality of data and computation.

Exploitation of Co-Tenancy: An issue with cloud security is within one of its properties, that it is a shared environment (Prakash & Darbari, 2012; Sakr, 2014). Given that it is a shared environment, people with malicious intent could pretend to be someone they are not to gain access, in other words masquerading (Prakash & Darbari, 2012). Once inside, these people with malicious intent tend to gather information about the cloud system and the data contained within it (Sakr, 2014). Another way these services could be used by malicious people is to use the computational resources of the cloud to carry out denial of service attacks on other people.   Prakash and Darbari (2012) stated that two-factor authentications were used on personal devices and for shared distributed systems, there has been proposed a use of a three-factor authentication. The first two factors are the use passwords and smart cards. The last one could be either biometrics or digital certificates. Digital certificates can be used automatically to reduce end-user fatigue on using multiple authentications (Connolly & Begg, 2014). The third level of authentication helps to create a trusted system. Subsequently, a three-factor authentication could primarily mitigate masquerading. Sakr (2014), proposed using a tool that hides the IP addresses the infrastructure components that make up the cloud, to prevent the cloud for being used if the entry is granted to a malicious person.

Confidentiality of data and computation: If data in the cloud is accessed malicious people can gain information, and change the content of that information. Data stored on the distributed systems are sensitive to the owners of the data, like health care data which is heavily regulated for privacy (Sakr, 2014). Prakash and Darbari (2012) suggested the use of public key cryptography, software agents, XML binding technology, public key infrastructure, and role-based access control are used to deal with eavesdropping and message tampering. This essentially hides the data in such a way that it is hard to read without key items that are stored elsewhere in the cloud system. Sakr (2014) suggested homomorphic encryption may be needed, but warns that the use of encryption techniques increases the cost and time of performance. Finally, Lublinsky, Smith, and Yakubovich (2013), stated that encrypting the network to protect data-in-motion is needed.

Overall, a combination of data encryption, hiding IP addresses of computational components, and three-factor authentication may mitigate some of the cloud computing security concerns, like eavesdropping, masquerading, message tampering, and denial of services. However, using these techniques will increase the time it takes to process big data. Thus a cost-benefit analysis must be conducted to compare and contrast these methods while balancing data risk profiles and current risk models.

Resources:

  • Brookshear, G., & Brylow, D. (2014). Computer Science: An Overview, (12th ed.). Pearson Learning Solutions. VitalBook file.
  • Connolly, T., & Begg, C. (2014). Database Systems: A Practical Approach to Design, Implementation, and Management, 6th Edition. Pearson Learning Solutions. VitalBook file.
  • Lublinsky, B., Smith, K., & Yakubovich, A. (2013). Professional Hadoop Solutions. Wrox. VitalBook file.
  • Prakash, V., & Darbari, M. (2012). A review on security issues in distributed systems. International Journal of Scientific & Engineering Research, 3(9), 300–304.
  • Sakr, S. (2014). Large scale and big data: Processing and management. Boca Raton, FL: CRC Press.

Adv Topics: Security Issues associated with Big Data

The scientific method helps give a framework for the data analytics lifecycle (Dietrich, 2013). Per Khan et al. (2014), the entire data lifecycle consists of the following eight stages:

  • Raw big data
  • Collection, cleaning, and integration of big data
  • Filtering and classification of data usually by some filtering criteria
  • Data analysis which includes tool selection, techniques, technology, and visualization
  • Storing data with consideration of CAP theory
  • Sharing and publishing data, while understanding ethical and legal requirements
  • Security and governance
  • Retrieval, reuse, and discovery to help in making data-driven decisions

Prajapati (2013), stated the entire data lifecycle consists of the following five steps:

  • Identifying the problem
  • Designing data requirements
  • Pre-processing data
  • Data analysis
  • Data visualizing

It should be noted that Prajapati includes steps that first ask what, when, who, where, why, and how with regards to trying to solve a problem. It doesn’t just dive into getting data. Combining both Prajapati (2013) and Kahn et al. (2014) data lifecycles, provides a better data lifecycle. However, there are 2 items to point out from the above lifecycle: (a) the security phase is an abstract phase because security considerations are involved in stages (b) storing data, sharing and publishing data, and retrieving, reusing and discovery phase.

Over time the threat landscape has gotten worse and thus big data security is a major issue. Khan et al. (2014) describe four aspects of data security: (a) privacy, (b) integrity, (c) availability, and (d) confidentiality. Minelli, Chambers, and Dhiraj (2013) stated that when it comes to data security a challenge to it is understanding who owns and has authority over the data and the data’s attributes, whether it is the generator of that data, the organization collecting, processing, and analyzing the data. Carter, Farmer, and Siegel (2014) stated that access to data is important, because if competitors and substitutes to the service or product have access to the same data then what advantage does that provide the company. Richard and King (2014), describe that a binary notion of data privacy does not exist.  Data is never completely private/confidential nor completely divulged, but data lies in-between these two extremes.  Privacy laws should focus on the flow of personal information, where an emphasis should be placed on a type of privacy called confidentiality, where data is agreed to flow to a certain individual or group of individuals (Richard & King, 2014).

Carter et al. (2014) focused on data access where access management leads to data availabilities to certain individuals. Whereas, Minelli et al. (2013) focused on data ownership. However, Richard and King (2014) was able to tie those two concepts into data privacy. Thus, each of these data security aspects is interrelated to each other and data ownership, availability, and privacy impacts all stages of the lifecycle. The root causes of the security issues in big data are using dated techniques that are best practices but don’t lead to zero-day vulnerability action plans, with a focus on prevention, focus on perimeter access, and a focus on signatures (RSA, 2013). Specifically, certain attacks like denial of service attacks are a threat and root cause to data availability issues (Khan, 2014). Also, RSA (2013) stated that from a sample of 257 security officials felt that the major challenges to security were the lack of staffing, large false positive amounts which creates too much noise, lack of security analysis skills, etc. Subsequently, data privacy issues arise from balancing compensation risks, maintaining privacy, and maintaining ownership of the data, similar to a cost-benefit analysis problem (Khan et al., 2014).

One way to solve security concerns when dealing with big data access, privacy, and ownership is to place a single entry point gateway between the data warehouse and the end-users (The Carology, 2013). The single entry point gateway is essentially middleware, which help ensures data privacy and confidentiality by acting on behalf of an individual (Minelli et al., 2013). Therefore, this gateway should aid in threat detection, assist in recognizing too many requests to the data which can cause a denial of service attacks, provides an audit trail and doesn’t require to change the data warehouse (The Carology, 2013). Thus, the use of middleware can address data access, privacy, and ownership issues. RSA (2013) proposed a solution to use data analytics to solve security issues by automating detection and responses, which will be covered in detail in another post.

Resources:

  • Carter, K. B., Farmer, D., and Siegel, C. (2014). Actionable Intelligence: A Guide to Delivering Business Results with Big Data Fast! John Wiley & Sons P&T. VitalBook file.
  • Khan, N., Yaqoob, I., Hashem, I. A. T., Inayat, Z. Ali, W. K. M., Alam, M., Shiraz, M., & Gani., A. (2014). Big data: Survey, technologies, opportunities, and challenges. The Scientific World Journal, 2014. Retrieved from http://www.hindawi.com/journals/tswj/2014/712826/
  • Minelli, M., Chambers, M., & Dhiraj, A. (2013). Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today’s Businesses. John Wiley & Sons P&T. VitalBook file.