Adv DB: Transaction management and concurrency control

Transaction management, in a nutshell, is keeping track (serialized or scheduled) changes made to a database.  An overly simplistic example is debiting and crediting $100 and $110 dollars (respectively).  If the account balance is currently at $90, the order of this transaction is vital to avoid overdraft fees.  Now, concurrency control is used to ensure data integrity when a transaction occurs.  Thus making the two events interconnected.  Thus, in our example, serializing the transaction (all actions are done consecutively) is key.  You want to add the $110 dollars first so you have $200 in the account to then debit $100.  To do this you will need a timestamp ordering/serialization.  This became a terrible issue back in 2010 and is still an issue in 2014 (Kristof), where a survey of 44 major banks in which, half still re-order the transactions, which can result in draining account balances and causing overdraft fees.  The way they get around all of this is usually having processing times for deposits, which are typically longer than the processing times for charges.  Thus, even if done correctly serially, the processing time can per transaction vary so significantly that these issues happen.  According to Kristof (2014), banks say they do this to process payments in order of priority.

In the case above, it illustrates why this is why an optimistic concurrency control method is not helpful.  It is not helpful because they don’t check for serialization when doing the transactions initially (causing high cost on resources).  However, transactions in optimistic situations are done locally and validated against serialization before finalizing.  Here, if we started at the first of the month and paid a bunch of bills and then realized we were close to $0 so we deposited $110 and continued paying bills to the sum of $100, this can eat up a lot of processing time.  Thus it can get quite complicated quite quickly.  Conservative concurrency controls have the fewest number of abort and eliminates waste in processing via doing things in a serial nature, but you cannot run things in a parallel manner.

Huge amounts of data coming in like those from the internet of things (where databases need to be flexible and extensive because a projected trillion of different items would be producing data) would benefit greatly from the optimistic concurrency control.  Take the example of a Fitbit/Apple watch/Microsoft band.  It records data on you throughout the day.  However, the massive data is time-stamped and heterogeneous, it doesn’t matter if the data for sleep and walking are processed in parallel, but in the end, it is still validated.  This allows for a faster upload time through blue tooth and/or wifi environments.  Data can be actively extracted and explained in real-time, but when there are many sensors on the device, the data and sensors all have different forms of reasoning rules and semantic links between data, where existing or deductive links between sources exist (Sun & Jara, 2014) and that is where the true meaning of the generated data lies.  Sun & Jara suggests that a solid mathematics basis will help in ensuring correct and efficient data storage system and model.

Resources

Sponsored Post Learn from the experts: Create a successful blog with our brand new courseThe WordPress.com Blog

Are you new to blogging, and do you want step-by-step guidance on how to publish and grow your blog? Learn more about our new Blogging for Beginners course and get 50% off through December 10th.

WordPress.com is excited to announce our newest offering: a course just for beginning bloggers where you’ll learn everything you need to know about blogging from the most trusted experts in the industry. We have helped millions of blogs get up and running, we know what works, and we want you to to know everything we know. This course provides all the fundamental skills and inspiration you need to get your blog started, an interactive community forum, and content updated annually.

Words matter: Customize to configure

Let’s look at some definitions when it comes to software development and the nuances each one plays:

Customize: to modify or supplement with code development internally to match end-user requests, it may not be preserved during an upgrade. This could be analogous to hacking into a game like Pokemon Go and enabling end-users the ability to spoof their locations, to obtain regional exclusive pocket monsters.

Tailoring: modifying or supplementing without code to enable a system into an environment.  Analogous to downloading Pokemon Go a Google play store or Apple app store, where the right version of the app is downloaded into the right environment.

Personalization: meeting the customers’ needs effectively and efficiently.  This is achieved by analyzing customer data and using predictive analytics.  A great way is using the Active Sync tool to encourage players of Pokemon Go to be more active, but realizing there are three tiers to active players and personalizing the rewards based on those levels that are achieved.  Personalization can also be seen with character customizations, clothing, poses, and buddy pokemon.

Configure: it is the process of setting up options and features tailored to meet implementation of business requirements.  In pokemon go, some people want to achieve a full pokedex, some like the gym system, some like the 1:1 battles, 1:1 trades, side quests, beating the villains, etc. You can configure your goals in the game by doing one or all, and you can do it to the amount that you want, meeting your requirements for satisfaction in playing the game.

Now if we want to think of these concepts on a continuum:

Customize <——- Tailoring ——- Personalization ——-> Configuring

where the cost of complexity decreases from right to left, constriction in growth decreases from right to left, and a decrease in profit margin occurs from right to left.

The question now becomes, its the additional complexity on this spectrum worth the extra cost incurred?

That is for you to decide.

 

 

Literature reviews

Side Note: This particular post was on my to-do list for a long time.

A literature review as a process containing a deep consideration of the current literature, to aid in identifying the current gaps in the existing knowledge, as well as building up the context for your research project (Gall, Gall, & Borg, 2006).  The literature review helps the researcher to build upon the works of other researchers, for the purpose of contributing to the collective knowledge. Our goal in the literature review will be undermined if we conduct any of the following common flaws (Gall et al., 2006):

  1. A literature review that becomes a standalone piece in the final document
  2. Analyzing results from studies that are not sound in their methodology
  3. Include the search procedures used to create this literature review
  4. Having only one study on particular ideas in the review, which may suggest the idea is not mature enough

For a literature review, one should be learning their field by reviewing the collective knowledge in the field by studying:

  • The beginning of {your topic}
  • The essence of {your topic}
  • Historical overview {your topic}
  • Politics of {your topic}
  • The Technology of {your topic}
  • Leaders in {your topic}
  • Current literature findings of {your topic}
  • Overview of research techniques {your topic}
  • The 21st century {your topic} Strategy

Creswell’s (2014), proposed that a literature map (similar to a mind map) of the research is a useful way to organize the literature, identify ideas with a small number of sources, determine the current issues in the existing knowledge, and determine the reviewers current gap in their understanding of the existing knowledge.  Finally, Creswell in 2014, listed what a good outline for a quantitative literature review should have:

  1. Introduction paragraph
  2. Review of topic one, which contains the independent variable(s).
  3. Review of topic two, which contains the dependent variable(s).
  4. Review of topic three, which provides how the independent variable(s) relate to the dependent variable(s).
  5. Summarize with highlights of key studies/major themes, to state why more research is needed.

Cresswell’s is generally a good method, but not the only one.  You can use a chronological literature review, where you build your story from the beginning to the present. In my dissertation, my literature review had to tie multiple topics into one: Big Data, Financial forecasting, and Hurricane forecasts.  I had to use the diffusion of innovation theory to transition between Financial and Hurricane forecast, to make the leap and justify the methodologies I will use later on.  In the end, you are the one that will be writing your literature review and the more of them you read, the easier it will be to define how you should write yours.

Here is a little gem I found during my second year in my dissertation: Dr. Guy White (2014) in the following youtube video has described a way to effectively and practically build your literature review. I use this technique all the time.  All of my friends that have seen this video have loved this method of putting together their literature reviews.

References

Internal validity in qualitative studies

Internal validity is determining the accuracy of the findings in qualitative research from the viewpoints of the researcher, participants or reader (Creswell, 2013). There are many validity strategies like: Triangulation of different data sources, member checking, rich thick description of the findings, clarifying any bias, presenting negative or discrepant information, prolong the time in the field, peer debriefing, external auditor to review the project, etc.

Triangulation of different data sources for observational work is an idea where I would examine evidence from multiple sources of data to justify the themes that I create through coding.  Converging themes from multiple sources of data and/or perspectives from participants would add to the validity of the study.  Thus, in order to increase the validity of the thematic codes would be to present the thematic codes from analysis of multiple sources like:

  • Interviews from N number of participants (until data saturation is reached)
  • Observations of the participants
    • Repeated observations will be taken, during multiple different types of shifts, with or without the same participants and during different random days of the week over a one-month period.
    • Observational Goals: Tracking what information is used (type and time stamps, instrumentations, etc.)
    • Observational Goals 2: Through videotaping, I hope to track conversations between participants sharing the same shift. Field notes would contain: “Why the conversation was initiated?”, “What was discussed?”, “Were there decisions made regarding the area of study”, “What is the bodily-based behavior portrayed by the specialists in the discussion?”, and “What was the outcome of that discussion?”
  • Document Analysis

The aforementioned, in particular, will help ensure internal validity in quiet a few studies.

 References:

Ethical issues involving human subjects

In Creswell (2013), it is stated that ethical issues can occur at all phases of the study (prior to the study, in the beginning, during data collection, analysis, and reporting).  Since we deal with data from people about people, we as researchers need to protect our participants and promote the integrity of research by guarding against misconduct and improperly reflecting the data.  Because we deal with people, it is our obligation to assure that interviewees do not get harmed as a result of our research (Rubin, 2012). The following anticipated risks are from Crewell (2013) and Rubin (2012):

  • Prior to conducting the study
    • We must seek an Institutional Review Board (IRB) approval before we conduct a study.
    • I must gain local permission from the agency, organization, corporation for which the study will take place and from the participants to conduct this study.
  • Beginning the study
    • We will not pressure participants to sign consent forms. To make sure that you have high participation rates, you need to make sure that the purpose of this study is compelling enough that the participants will see that it would be a value-added experience to them as well as to the field of study that they don’t want to say no.
      • We should also conduct an informal needs assessment to ensure that the participant’s needs are addressed in the study, to ensure a high participation rate.
      • But, we will tell the participants that they have the right not to sign the consent form.
    • Collecting data
      • Respecting the site and keep disruption to a minimum, especially if I am conducting observations. The goal of the observation in this study is not to be an active participant, but taking field notes of key interactions that occur while the participants are doing what they need to do.
      • Make sure that all the participants in the study receive the same treatment to avoid data quality issues while collecting it.
      • We should be respectful and straightforward to the participants.
      • Discuss the purpose of this study and how the data will be used with the participants is key to establishing trust and this would allow them to start thinking about the topic of the study. This can be accomplished by sending them an email prior to the interview as to the purpose of the study and the time we are requesting of them.
      • As we are asking our interviewing questions, we should avoid leading questions. That is why questions may be asked in a particular order.  In some cases, questions can build on one another.
      • We should avoid sharing personal impressions. Given that we know what the final questions in the interview are, as we should ask them questions while not giving any indication of what we are looking for so that they don’t end up contaminating our data.
      • Avoid disclosing sensitive or proprietary information.
    • Analyzing data
      • Avoid only disclosing one set of results, thus we must report on multiple perspectives and report contrary findings.
      • Keeping the privacy of the participants, assuring that the names have been removed from the results as well as any other identifying indicators.
      • Honor promises, if I offer to the participant a chance to read and correct their interviews, I should do so as soon as possible after the interview.
    • Reporting, sharing and storing data
      • Avoid situations where there is a temptation to falsify evidence, data, findings or conclusions. This can be accomplished through using unbiased language appropriate for audiences.
      • Avoid disclosing harmful information of the specialist.
      • Be able to have data in a shareable format, however with keeping the privacy of the specialist as the main priority, while keeping the raw data and other materials for 5 years in a secure location. Part of this data should consist of the complete proof of compliance, IRB, lack of conflict of interest, for if and when that is requested.

References:

Observational protocol and qualitative documentations

As a researcher, you could be a non-participant to a full-on participant when observing your subjects in a study.  Thus, the observed/empathized behavioral and activities of individuals in the study are jotted down in field notes (Creswell, 2013).  Most researchers use an observational protocol to jotting down these notes as they observe their subjects.  According to Creswell (2013), this protocol could consist of: “separate descriptive notes (portraits of the participants, a reconstruction of dialogue, a description of the physical setting, accounts of particular events, or activities) [to] reflective notes (the researcher’s personal thoughts, such as “speculation, feelings, problems, ideas, hunches, impressions, and prejudices), … this form might [have] demographic information about the time, place, and date of the field setting where the observation takes place.”

Whereas, observational work can be combined with in-depth interviewing, and sometimes the observational work (which can be an everyday activity) can help prepare the researcher for the interviews (Rubin, 2012).  Doing so can increase the quality of the interviews because the interviewers know what the researcher has seen or read and can provide more information on those materials.  This can also allow the researcher to master the terminology before entering the interview. Finally, Rubin (2012) also states that cultural norms become more visible through observation rather than just a pure in-depth interview.

In Creswell (2013), Qualitative Documents are information contained within documents that could help a researcher out in their study that could be either public (newspapers, meeting minutes, official reports) and/or private (personal journals/diaries, letters, emails, internal manuals, written procedures, etc.) documents.  This can also include pictures, videos, educational materials, books, files. Whereas, Artifact Analysis is the analysis of the written text, usually are charts, flow sheets, intake forms, reports, etc.

The main analysis approach of this document would be to read the document to gain a subject matter understanding.  Document analysis would aid in quickly grouping, sorting and resort the data obtained for a study.  This manual will not be included in the coded dataset, but will help provide appropriate codes/categories for the interview analysis, in other words give me suggestions about what might be related to what.   Finally, one way to interpret this document would be for triangulation of data (data from multiple sources that are highly correlated) between the observation, interviews and this document.   

References

Organizational research & Participant Observer

For organizational research, some of their major goals for research are to examine their formation, recruitment of talent, adaption to constraints, types and causes, factors for growth, change and demise, which all fall under ethnographic studies (Lofland, 2005).  Ethnographic studies lend themselves much more nicely to participant-observers.

Participant observer is where the researcher/observer is not just only watching their subjects, but also actively participates (joins in) with their subject. The level of participation of the observer might impact what is observed (the more participation the harder it is to observe and take notes), thus low-key role participation is preferred.  Participating before the interviews will allow the observer to be sensitive to important issues otherwise missed. It is a more in-depth version of interviewing building on a regular conversation.  Participation may occur after watching for a while, focusing on a specific topic/question. (Rubin, 2012)

References:

Data Analysis of Qualitative data

Each of the methods has at its core a thematic analysis of data, which is methodically and categorically linking data, phrases, sentences, paragraphs, etc. into a particular theme.  Coring up these themes by their thematic properties helps in understanding the data and developing meaningful themes aiding in building a conclusion to the central question.

Ethnographic Content Analysis (Herron, 2015):  Thick descriptions (collection of field notes that describe and recorded learning and a collection of perceptions of the researcher) help in the creation of cultural themes (themes related to behaviors on an underlying action) from which information was interpreted.

Phenomenological data analysis (Kerns, 2014): Connections among different classes of data through a thematic analysis were used for which results could be derived from.

Case study analysis (Hartsock, 2014): Through the organization of data within a specific case design and treating each distinct data set as a case study, one could derive some general themes within each individual case.  Once, all these general themes are identified, we should look for some cross-case themes.

Grounded Theory Data Analysis (Falciani-White, 2013): Code data through comparing incidents/data to a category (by breaking down, analyzing, comparing, labeling and categorizing data into meaningful units of data), and integrating categories by their properties, in order to help you identify a few themes in order to drive a theory in a systematic manner.

References:

Interviewing strategy and qualitative sampling

As an interviewing strategy, open-ended questions leave the responses open to participant experience and categories and don’t close down the discussion or allow the participant to answer the question in one word (Snow et al, 2005).  Though in the past it was rejected because it did not involve a precise measurement, sometimes data that may not be easily measurable or counted, have value because of its intrinsic complexity and showcase of the “conditional nature of reality” (Rubin, 2012).  A whole field of text-analytics is aiming to prove that this data, considered as unstructured data, is an important part of knowledge discovery and knowledge sharing. Thus, Rubin (2012) says that open-ended questions grant the participant the chance to respond to the question in any way they choose, as elaborated on a response, allow participants to raise issues that are important to them, or even raise new issues not thought of by the interviewer.  Creswell (2013), further states that the more open the questions the better because it will allow the interviewer to listen to what people say and how they say, which can allow the participants to share their own views.  Usually, there are a few open-ended questions.  Finally, open-ended questions are used primarily in qualitative studies, but a mixture of both close-ended and open-ended questions could be asked in mixed methods studies.

One thing is to have the right questions as part of your interviewing strategies, it is another thing to have the right qualitative sampling plan.

Sampling Plans {purposeful/judgmental sampling, maximum variation sampling, sampling extreme or deviant cases, theoretical sampling, snowball/chain-referral sampling, cluster sampling, single-stage sampling, random sampling} (Creswell, 2013, Rubin, 2012, & Lofland et al, 2005). Here are just three of the many sampling plans listed in the sampling plan space.

  • Purposeful/judgmental sampling: In order to learn about a selective character, group, or category or their variations, you group the population into different characters, groups, or categories to collect data from with the participants now representing those divisions. (Creswell, 2013 & Lofland et al, 2005)
  • Maximum variation sampling: Allows for an analysis of error and bias in a phenomenon, through sampling and discovering the widest range of diversity in the phenomena of interest. (Lofland et al, 2005)
  • Snowball/chain-referral sampling: Asking your initial set of contacts with characteristics X, if they can refer to you their network that has the same characteristics X that you are studying. This is a means to enlarge your sample size and break down barriers to the entrance of your future participant. (Lofland et al, 2005). Depending on the characteristic X, like domestic violence, sexual assault, etc., this technique may run into IRB issues (Rubin, 2012).  Rubin (2012), stated that the way to avoid IRB issues if you have the current participants contact the future participants on your behalf to participate in the interview process, but this can drastically reduce the number maximum number of participants you could have gotten.

References:

Some Qualitative Methodologies

This blog post will differentiate among the following qualitative designs:

    • Phenomenology (e.g. Georgi, Moustakas, etc.)
    • Grounded theory (e.g. Glaser, Strauss, etc.)
    • Ethnography (e.g. White, Benedict, Mead, etc.)
    • Case Studies (e.g. Yin, etc.)

The Implicit goal of qualitative data analysis is truth, objectivity, trustworthiness, and accuracy of data (Glaser, 2004). All methods have the observer usually exercising little bias in their thoughts to help further their analysis or development of their core theory.  Researchers here are observers taking notes to help them in their study.

Phenomenology (Giorgi, 2006): It is the study of experiential phenomena through encountering an instance of it, describing it, and using free imagination variation to determine its essence. Thus, making the phenomena more generalizable.  Though it should be noted that the experience should exist without preconceived biases (a neutral party), and one way of doing so is listing out your entire biases related to the phenomena.  This removal of biases will help limit the claims to the way we experienced the phenomena.

Grounded Theory (Glaser, 2004): It is the study of a set of grounded concepts, which create a core theory/category that forms a hypothesis.  Data is collected, but as it is analyzed “line by line”, the researcher asks: “What is this data a study of?”, “What category does this incident indicate?”, “What is actually happening in the data?”, “What is the main concern being faced by the participants?”, and “What Accounts for the continual resolving of this concern?”  These questions are asked within the most minimum of preconception.  The use of literature is treated as another source of data to be integrated into the analysis and core theory/category.  However, literature is not used before the emergence of a core theory/category arises from the data.

Ethnography (Atkinson & Hammersley, 1994, Mead, 1933): It is studying the customs of people and cultures, usually on a few numbers of cases (maybe one case), through analyzing unstructured data (not previously coded) with no aim of testing a hypothesis.  Analysis of the data may revolve quantification and statistics on the explicit interpretation of the data.

Thus, grounded theory seeks to find meaning in data and find a core concept/category/theory/variable.  Ethnography tends to seek meaning in the customs of people, which can exist in a single case study.  Phenomenology seeks to study the phenomena that have occurred while keeping in mind all the possible variables that can influence it.  So, a certain topic can be explored using each of these methods, and they are looking at the same problem just with different preconceptions (or lack thereof), thus adding to the further understanding of that topic.  These are all collection of data methods, whereas case studies are a research strategy.

A problem needs to arise in order for research to occur.  A gap in knowledge can be seen as a problem.  Thus, case studies are a strategy that can be used to help shine some light at that gap and using any of the techniques aforementioned the research can try to fill in that gap of knowledge.  If you are aiming for grounded theory, you may have a ton of case studies to look through to seek common themes, whereas ethnography may be concerned about one or two cases and what happened in those cases.  Phenomenology can use as many case studies necessary to explore any particular phenomena in question.

Case Studies Research (Yin, 1981): Can contain both qualitative and quantitative data (e.g. fieldwork, records, reports, verbal reports, observations, memos, etc.), and it is independent of any particular data collection method.  Case studies concern themselves in a real-life phenomenon, and when the boundaries between phenomenon and context are not known, yet aim to be either exploratory, descriptive and/or explanatory.  It is a strategy similar to experiments, simulations, and histories.

Since, case studies can be “an accurate rendition of the facts of the case” (Yin, 1981), most of that data cannot be described quantitatively in a quick manner. Sometimes, descriptions and qualitative data paint the picture of what is being studied much more clearly than if we were to do this with just numbers.  Can you picture that over a million people saw the ball drop on Time Square in 2015, or 14 blocks of thousands of people adorned in foam Planet Fitness hats and waving purple noodle balloons, eagerly cheered as the ball dropped on Time Square in 2015. This is why most case study research involves the collection of qualitative data.

References:

  • Atkinson, P., & Hammersley, M. (1994). Ethnography and participant observation. Handbook of qualitative research, 1(23), 248-261.
  • Glaser, B. G., & Holton, J. (2004, May). Remodeling grounded theory. In Forum Qualitative Sozialforschung/Forum: Qualitative Social Research (Vol. 5, No. 2).
  • Giorgi, A. (2008). Difficulties encountered in the application of the phenomenological method in the social sciences. Indo-Pacific Journal of Phenomenology, 8(1).
  • Mead, M. (1933). More comprehensive field methods. American Anthropologist, 35(1), 1-15.
  • Yin, R. K. (1981). The case study crisis: Some answers. Administrative science quarterly, 58-65.