Compelling Topics in Advance Topics

  • The evolution of data to wisdom is defined by the DIKW pyramid, where Data is just facts without any context, but when facts are used to understand relationships it generates Information (Almeyer-Stubbe & Coleman, 2014). That information can be used to understand patterns, it can then help build Knowledge, and when that knowledge is used to understand principles, it builds Wisdom (Almeyer-Stubbe & Coleman, 2014; Bellinger, Castro, Mills, n.d.). Building an understanding to jump from one level of the DIKW pyramid, is an appreciation of learning “why” (Bellinger et al., n.d.).
  • The internet has evolved into a socio-technical system. This evolution has come about in five distinct stages:
    • Web 1.0: Created by Tim Berners-Lee in the 1980s, where it was originally defined as a way of connecting static read-only information hosted across multiple computational components primarily for companies (Patel, 2013).
    • Web 2.0: Changed the state of the internet from a read-only state to a read/write state and had grown communities that hold a common interest (Patel, 2013). This version of the web led to more social interaction, giving people and content importance on the web, due to the introduction of social media tools through the introduction of web applications (Li, 2010; Patel, 2013; Sakr, 2014). Web applications can include event-driven and object-oriented programming that are designed to handle concurrent activities for multiple users and had a graphical user interface (Connolly & Begg, 2014; Sandén, 2011).
  • Web 3.0: This is the state the web at 2017. Involves the semantic web that is driven by data integration through the uses of metadata (Patel, 2013). This version of the web supports a worldwide database with static HTML documents, dynamically rendered data, next standard HTML (HTML5), and links between documents with hopes of creating an interconnected and interrelated openly accessible world data such that tagged micro-content can be easily discoverable through search engines (Connolly & Begg, 2014; Patel, 2013). This new version of HTML, HTML5 can handle multimedia and graphical content, which are great for semantic content (Connolly & Begg, 2014). Also, end-users are beginning to build dynamic web applications for others to interact with (Patel, 2013).
    • Web 4.0: It is considered the symbiotic web, where data interactions occur between humans and smart devices, the internet of things (Atzori, 2010; Patel, 2013). These smart devices can be wired to the internet or connected via wireless sensors through enhanced communication protocols (Atzori, 2010). Thus, these smart devices would have read and write concurrently with humans, where the largest potential of web 4.0 has these smart devices analyze data online and begin to migrate the online world into the real world (Patel, 2013).
    • Web 5.0: Previous iterations of the web do not perceive people’s emotion, but one day it could be able to understand a person’s emotional (Patel, 2013). Kelly (2007) predicted that in 5,000 days the internet would become one machine and all other devices would be a window into this machine. In 2007, Kelly stated that this one machine “the internet” has the processing capability of one human brain, but in 5,000 days it will have the processing capability of all the humanity.
  • MapReduce is a framework that uses parallel sequential algorithms that capitalize on cloud architecture, which became popular under the open source Hadoop project, as its main executable analytic engine (Lublinsky et al., 2014; Sadalage & Fowler, 2012; Sakr, 2014). Another feature of MapReduce is that a reduced output can become another’s map function (Sadalage & Fowler, 2012).
  • A sequential algorithm is a computer program that runs on a sequence of commands, and a parallel algorithm runs a set of sequential commands over separate computational cores (Brookshear & Brylow, 2014; Sakr, 2014).
  • A parallel sequential algorithm runs a full sequential program over multiple but separate cores (Sakr, 2014).
    • Shared memory distributed programming: Is where serialized programs run on multiple threads, where all the threads have access to the underlying data that is stored in shared memory (Sakr, 2014). Each thread should be synchronized as to ensure that read and write functions aren’t being done on the same segment of the shared data at the same time. Sandén (2011) and Sakr, (2014) stated that this could be achieved via semaphores (signals other threads that data is being written/posted and other threads should wait to use the data until a condition is met), locks (data can be locked or unlocked from reading and writing), and barriers (threads cannot run on this next step until everything preceding it is completed).
    • Message passing distributed programming: Is where data is stored in one location, and a master thread helps spread chunks of the data onto sub-tasks and threads to process the overall data in parallel (Sakr, 2014). There are explicitly direct send and receive messages that have synchronized communications (Lublinsky et al., 2013; Sakr, 2014).         At the end of the runs, data is the merged together by the master thread (Sakr, 2014).
  • A scalable multi-level stochastic model-based performance analysis has been proposed by Ghosh, Longo, Naik and Trivedi (2012) for Infrastructure as a Service (IaaS) cloud computing. Stochastic analysis and models are a form of predicting how probable an outcome will occur using a form of chaotic deterministic models that help in dealing with analyzing one or more outcomes that are cloaked in uncertainty (Anadale, 2016; Investopedia, n.d.).
  • Three-pool cloud architecture: In IaaS cloud scenario, a request for resources initiates a request for one or more virtual machines to be deployed to access a physical machine (Ghosh et al., 2012). The architecture assumes that physical machines are grouped in pools hot, warm, and cool (Sakr, 2014). The hot pool consists of physical machines that are constantly on, and virtual machines are deployed upon request (Ghosh et al., 2012; Sakr, 2014).   Whereas a warm pool has the physical machines in power saving mode, and cold pools have physical machines that are turned off. For both warm and cold pools setting up a virtual machine is delayed compared to a hot pool, since the physical machines need to be powered up or awaken before a virtual machine is deployed (Ghosh et al., 2012). The optimal number of physical machines in each pool is predetermined by the information technology architects.
  • Data-at-rest is probably considered easier to analyze; however, this type of data can also be problematic. If the data-at-rest is large in size and even if the data does not change or evolve, its large size requires iterative processes to analyze the data.
  • Data-in-motion and streaming data has to be iteratively processed until there is a certain termination condition is reached and it can be reached between iterations (Sakr, 2014). However, Sakr (2014) stated that MapReduce does not support iterative data processing and analysis directly.
    • To deal with datasets that require iterative processes to analyze the data, computer coders need to create and arrange multiple MapReduce functions in a loop (Sakr, 2014). This workaround would increase the processing time of the serialized program because data would have to be reloaded and reprocessed, because there is no read or write of intermediate data, which was there for preserving the input data (Lusblinksy et al., 2014; Sakr, 2014).
  • Data usually gets update on a regular basis. Connolly and Begg (2014) defined that data can be updated incrementally, only small sections of the data, or can be updated completely. This data update can provide its own unique challenges when it comes to data processing.
    • For processing incremental changes on big data, one must split the main computation to its sub-computation, logging in data updates in a memoization server, while checking the inputs of the input data to each sub-computation (Bhatotia et al., 2011; Sakr, 2014). These sub-computations are usually mappers and reducers (Sakr, 2014). Incremental mappers check against the memoization servers, and if the data has already been processed and unchanged it will not reprocess the data, and a similar process for incremental reducers that check for changed mapper outputs (Bhatotia et al., 2011).
  • Brewer (2000) and Gilbert and Lynch (2012) concluded that for a distributed shared-data system you could only have at most two of the three properties: consistency, availability, partition-tolerance (CAP theory). Gilbert and Lynch (2012) describes these three as akin to the safety of the data, live data, and reliability of the data.
    • In a NoSQL distributed database systems (DDBS), it means that partition-tolerance should exist, and therefore administrators should then select between consistency and availability (Gilbert & Lynch, 2012; Sakr, 2014). However, if the administrators focus on availability they can try to achieve weak consistency, or if the administrators focus on consistency, they are planning on having a strong consistency system. Strong consistency ensures that all copies of the data are updated in real-time, whereas weak consistency means that eventually all the copies of the data will be updated (Connolly and Begg, 2014; Sakr, 2014). An availability focus is having access to the data even during downtimes (Sakr, 2014).
  • Volume visualization is used to understand large amounts of data, in other words, big data, where it can be processed on a server or in the cloud and rendered onto a hand-held device to allow for the end user to interact with the data (Johnson, 2011). Tanahashi, Chen, Marchesin and Ma (2010), define a framework for creating an entirely web-based visualization interface (or web application), which leverages the cloud computing environment. The benefit of using this type of interface is that there is no need to download or install the software, but that the software can be accessed through any mobile device with a connection to the internet.

Resources:

  • Brookshear, G., Brylow, D. (2014). Computer Science: An Overview, (12th ed.). Vitalbook file.
  • Connolly, T., & Begg, C. (2014). Database Systems: A Practical Approach to Design, Implementation, and Management, (6th ed.). Pearson Learning Solutions. VitalBook file.
  • Ghosh, R., Longo, F., Naik, V. K., & Trivedi, K. S. (2013). Modeling and performance analysis of large scale IaaS clouds. Future Generation Computer Systems, 29 (5), 1216-1234.
  • Gilbert, S., and Lynch N. A. (2012). Perspectives on the CAP Theorem. Computer 45(2), 30–36. doi: 10.1109/MC.2011.389
  • Investopedia (n.d.). Stochastic modeling. Retrieved from http://www.investopedia.com/terms/s/stochastic-modeling.asp
  • Johnson, C. (2011) Visualizing large data sets. TEDx Salt Lake City. Retrieved from https://www.youtube.com/watch?v=5UxC9Le1eOY
  • Kelly, K. (2007). The next 5,000 days of the web. TED Talk. Retrieved from https://www.ted.com/talks/kevin_kelly_on_the_next_5_000_days_of_the_web
  • Li, C. (2010). Open Leadership: How Social Technology Can Transform the Way You Lead, (1st ed.). VitalBook file.
  • Lublinsky, B., Smith, K. T., & Yakubovich, A. (2013). Professional Hadoop Solutions. Vitalbook file.
  • Patel, K. (2013). Incremental journey for World Wide Web: Introduced with Web 1.0 to recent Web 5.0 – A survey paper. International Journal of Advanced Research in Computer Science and Software Engineering, 3(10), 410–417.
  • Sadalage, P. J., Fowler, M. (2012). NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence, (1st ed.). Vitalbook file.
  • Sakr, S. (2014). Large Scale and Big Data, (1st ed.). Vitalbook file.
  • Sandén, B. I. (2011). Design of Multithreaded Software: The Entity-Life Modeling Approach. Wiley-Blackwell. VitalBook file.
  • Tanahashi, Y., Chen, C., Marchesin, S., & Ma, K. (2010). An interface design for future cloud-based visualization services. Proceedings of 2010 IEEE Second International Conference on Cloud Computing Technology and Service, 609–613. doi: 10.1109/CloudCom.2010.46

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s