Adv Topics: CAP Theory and NoSQL Databases

Brewer (2000) and Gilbert and Lynch (2012) concluded that for a distributed shared-data system you could only have at most two of the three properties: consistency, availability, partition-tolerance (CAP theory). Gilbert and Lynch (2012) describes these three as akin to the safety of the data, live data, and reliability of the data. Thus, systems that are giving up

  • consistency creates a system that needs expirations, conflict resolution, and optimistic locking (Brewer, 2000). A lack of consistency means that there is a chance that the data or processes may not return the right response to a request (Gilbert & Lynch, 2012).
  • availability creates a system that needs pessimistic locking and making some partitions unavailable (Brewer, 2000). A lack of availability means that there is a chance that a request may not get a response (Gilbert & Lynch, 2012).
  • Partition-tolerance creates a system that needs a 2-phase commit and cache validation profiles (Brewer, 2000). A lack of partition-tolerance means that there is a chance that messages between servers, tasks, threads, can be lost forever and never are committed (Gilbert & Lynch, 2012).

Therefore, in a NoSQL distributed database systems (DDBS), it means that partition-tolerance should exist, and therefore administrators should then select between consistency and availability (Gilbert & Lynch, 2012; Sakr, 2014). However, if the administrators focus on availability they can try to achieve weak consistency, or if the administrators focus on consistency, they are planning on having a strong consistency system. An availability focus is having access to the data even during downtimes (Sakr, 2014). However, providing high levels of availability can cost money. Per the web application Uptime.is:

Availability Level Monthly downtime Yearly downtime
99.9% 43m 49.7s 8h 45m 75.0s
99.99% 4m 23.0s 52m 35.7s
99.999% 26.3s 5m 15.6s
99.9999% 2.6s 31.6s

To achieve high levels of availability means having a set of fail-safe systems to build for fault tolerance.

From the previous paragraph, there is both strong and weak consistency. Strong consistency ensures that all copies of the data are updated in real-time, whereas weak consistency means that eventually all the copies of the data will be updated (Connolly and Begg, 2014; Sakr, 2014). Thus, there is a resource cost to have stronger consistency over weaker consistency due to how fast the data needs to be updated (Gilbert & Lynch, 2012). Consequently, this is where the savings come from when handling for overhead in a NoSQL DDBS.

Finally, the table below illustrates some of the NoSQL databases that are either an AP or CP system (Hurst, 2010).

Availability & Partition Tolerance

NoSQL systems

Consistency & Partition Tolerance

NoSQL systems

Dynamo, Voldemort, Tokyo Cabinet, KAI, Riak, CouchDB, SimpleDB, Cassandra Big Table, MongoDB, Terrastore, Hypertable, Hbase, Scalaris, Berkley DB, MemcacheDB, Redis

 Resources

  • Brewer, E. (2000). Towards robust distributed systems. Proceedings of 19th Annual ACM Symposium Principles of Distributed Computing (PODC00). 7–10.
  • Connolly, T., & Begg, B. (2014). Database Systems: A Practical Approach to Design, Implementation, and Management, (6th ed.). Pearson Learning Solutions. VitalBook file.
  • Gilbert, S., and Lynch N. A. (2012). Perspectives on the CAP Theorem. Computer 45(2), 30–36. doi: 10.1109/MC.2011.389

 

Advertisements

Quant: Validity and Reliability

the construction process of a survey that would ensure a valid & reliable assessment instrument

Most flaws in research methodology exist because the validity and reliability weren’t established (Gall, Gall, & Borg, 2006). Thus, it is important to ensure a valid and reliable assessment instrument.  So, in using any existing survey as an assessment instrument, one should report the instrument’s: development, items, scales, reports on reliability, and reports on validity through past uses (Creswell, 2014; Joyner, 2012).  Permission must be secured for using any instrument and placed in the appendix (Joyner, 2012).    The validity of the assessment instrument is key to drawing meaningful and useful statistical inferences (Creswell, 2014). Creswell (2014), stated that there are multiple types of validity that can exist in the instruments: content validity (measuring what we want), predictive or concurrent validity (measurements aligned with other results), construct validity (measuring constructs or concepts).  Establishing validity in the assessment instrument helps ensure that it’s the best instrument to use for the right situation.  Reliability in assessments instruments is when authors report that the assessment instrument has internal consistency and have been tested multiple times to ensure stable results every single time (Creswell, 2014).

Unfortunately, picking up an assessment instrument that doesn’t match the content exactly will not benefit anyone, nor will the results be accepted by the greater community.  Modifying an assessment instrument that doesn’t quite match completely, can damage the reliability of this new version of the instrument, and it can take huge amounts of time to establish validity and reliability on this new version of the instrument (Creswell, 2014).  Also creating a brand new assessment instrument would mean extensive pilot studies and tests, along with an explanation of how it was developed to help establish the instrument’s validity and reliability (Joyner, 2012).

Selecting a target group for the administration of the survey

Through sampling of a population and using a valid and reliable survey instrument for assessment, attitudes and opinions about a population could be correctly inferred from the sample (Creswell, 2014).  Thus, not only is validity and reliability important but selecting the right target group for the survey is key.  A targeted group for this survey means that the population in which information will be inferred from must be stratified, which means that the characters of the population are known ahead of time (Creswell, 2014; Gall et al. 2006). From this stratified population, is where a random sampling of participants should be selected from, to ensure that statistical inference could be made for that population (Gall et al., 2006). Sometimes a survey instrument doesn’t fit those in the target group. Thus it would not produce valid nor reliable inferences for the targeted population. One must select a targeted population and determine the size of that stratified population (Creswell, 2014).  Finally, one must consider the sample size of the targeted group.

Administrative procedure to maximize the consistency of the survey

Once a stratified population and a random sample from that population have been carefully selected, there is a need to maximize the consistency of the survey.  Thus, researchers must take into account the availability of sampling, through either mail, email, website, or other survey tools like SurveyMonkey.com are ways to gather data (Creswell 2014). However, mail has a low rate of return (Miller, n.d.), so face-to-face methods or online the use of online providers may be the best bet to maximize the consistency of the survey.

References

Creswell, J. W. (2014) Research design: Qualitative, quantitative and mixed method approaches (4th ed.). California, SAGE Publications, Inc. VitalBook file.

Gall, M. D., Joyce Gall, Walter Borg. Educational Research: An Introduction (8th ed.). Pearson Learning Solutions. VitalBook file.

Joyner, R. L. (2012) Writing the Winning Thesis or Dissertation: A Step-by-Step Guide (3rd ed.). Corwin. VitalBook file.

Miller, R. (n.d.). Week 5: Research study construction. [Video file]. Retrieved from http://breeze.careeredonline.com/p8v1ruos1j1/?launcher=false&fcsContent=true&pbMode=normal