Adv DB: Transaction management and concurrency control

Transaction management, in a nutshell, is keeping track (serialized or scheduled) changes made to a database.  An overly simplistic example is debiting and crediting $100 and $110 dollars (respectively).  If the account balance is currently at $90, the order of this transaction is vital to avoid overdraft fees.  Now, concurrency control is used to ensure data integrity when a transaction occurs.  Thus making the two events interconnected.  Thus, in our example, serializing the transaction (all actions are done consecutively) is key.  You want to add the $110 dollars first so you have $200 in the account to then debit $100.  To do this you will need a timestamp ordering/serialization.  This became a terrible issue back in 2010 and is still an issue in 2014 (Kristof), where a survey of 44 major banks in which, half still re-order the transactions, which can result in draining account balances and causing overdraft fees.  The way they get around all of this is usually having processing times for deposits, which are typically longer than the processing times for charges.  Thus, even if done correctly serially, the processing time can per transaction vary so significantly that these issues happen.  According to Kristof (2014), banks say they do this to process payments in order of priority.

In the case above, it illustrates why this is why an optimistic concurrency control method is not helpful.  It is not helpful because they don’t check for serialization when doing the transactions initially (causing high cost on resources).  However, transactions in optimistic situations are done locally and validated against serialization before finalizing.  Here, if we started at the first of the month and paid a bunch of bills and then realized we were close to $0 so we deposited $110 and continued paying bills to the sum of $100, this can eat up a lot of processing time.  Thus it can get quite complicated quite quickly.  Conservative concurrency controls have the fewest number of abort and eliminates waste in processing via doing things in a serial nature, but you cannot run things in a parallel manner.

Huge amounts of data coming in like those from the internet of things (where databases need to be flexible and extensive because a projected trillion of different items would be producing data) would benefit greatly from the optimistic concurrency control.  Take the example of a Fitbit/Apple watch/Microsoft band.  It records data on you throughout the day.  However, the massive data is time-stamped and heterogeneous, it doesn’t matter if the data for sleep and walking are processed in parallel, but in the end, it is still validated.  This allows for a faster upload time through blue tooth and/or wifi environments.  Data can be actively extracted and explained in real-time, but when there are many sensors on the device, the data and sensors all have different forms of reasoning rules and semantic links between data, where existing or deductive links between sources exist (Sun & Jara, 2014) and that is where the true meaning of the generated data lies.  Sun & Jara suggests that a solid mathematics basis will help in ensuring correct and efficient data storage system and model.


Words matter: Customize to configure

Let’s look at some definitions when it comes to software development and the nuances each one plays:

Customize: to modify or supplement with code development internally to match end-user requests, it may not be preserved during an upgrade. This could be analogous to hacking into a game like Pokemon Go and enabling end-users the ability to spoof their locations, to obtain regional exclusive pocket monsters.

Tailoring: modifying or supplementing without code to enable a system into an environment.  Analogous to downloading Pokemon Go a Google play store or Apple app store, where the right version of the app is downloaded into the right environment.

Personalization: meeting the customers’ needs effectively and efficiently.  This is achieved by analyzing customer data and using predictive analytics.  A great way is using the Active Sync tool to encourage players of Pokemon Go to be more active, but realizing there are three tiers to active players and personalizing the rewards based on those levels that are achieved.  Personalization can also be seen with character customizations, clothing, poses, and buddy pokemon.

Configure: it is the process of setting up options and features tailored to meet implementation of business requirements.  In pokemon go, some people want to achieve a full pokedex, some like the gym system, some like the 1:1 battles, 1:1 trades, side quests, beating the villains, etc. You can configure your goals in the game by doing one or all, and you can do it to the amount that you want, meeting your requirements for satisfaction in playing the game.

Now if we want to think of these concepts on a continuum:

Customize <——- Tailoring ——- Personalization ——-> Configuring

where the cost of complexity decreases from right to left, constriction in growth decreases from right to left, and a decrease in profit margin occurs from right to left.

The question now becomes, its the additional complexity on this spectrum worth the extra cost incurred?

That is for you to decide.



3 conferences in Computer Science and 3 conferences in Big Data

3 scholarly conferences that focus on algorithms, programming languages, managing telecommunications software engineering, managing corporate information resources, and managing partnership-based IT operations:

  1. Advance International Conference on Telecommunications:
  2. IEEE International Conference on Software, Telecommunications and Computer Networks (SoftCOM)
  3. IEEE Global Communication Conference Exhibition & Industry Forum (GLOBECOM)

3 conferences that cover Big Data:

1. IEEE International Conference on BigData:

A conference that provides student travel awards to help subsidize the cost, thanks to the National Science Foundation. Held in Washington DC in 2014. They also provide a doctoral symposium. Keynote speeches include Never-ending language learning; Smart Data – How you and I will exploit Big Data for personalized digital health and many other activities; and Addressing Human Bottlenecks in Big data. Reading the keynote speeches’ abstracts I found this quote to be true at my job in the past year “… the key bottlenecks lie with data analysis and data engineers, who are routinely asked to work with data that cannot possibly be loaded into tools for statistical analytics or visualization.” (IEEE, 2014). Another Keynote talks about an Artificial Intelligence learning machine NELL (Never-Ending Language Learner) that runs 24 hours per day learning to read the web and extracting knowledge and creating beliefs. It is starting to reason over its extracted knowledge. It recently learned that “inaccuracy is an event outcome” (NELL, 2015)

2. Data Lead:

Held in Paris, France in 2015 and Berkeley, California near October and November months. It is their second year of this annual conference. It is in partnership with the University of California (Berkeley) Hass School of Business. Their goal is to spark an international conversation on the application of big data on business processes and issues. There is a particular focus on issues revolving around finance and a marketing, though they cater to the sciences, education, government, etc. They see big data as an economic commodity.

3. IARIA International Conference on Data Analytics:

Fourth conference held by IARIA, in Nice, France for 2015, during the middle of the year. Topics in their conferences deal with: Fundamentals, mechanisms, and features, sentiment/opinion analytics, target analytics, big data, knowledge discovery, visualization, filtering data, relevant/redundant/obsolete analytics, predictive, trust in data, legal issues, cyber threats, etc. They have two biannual peer-reviewed journals since 2008 associated with this group: International Journal on Advances in Software & International Journal on Advances in Intelligent Systems. Other conferences from IARIA: SoftNet, InfoWare, NetWare, NexTech, DataSys, BioSciencesWorld, Infosys, and NexComm, all in Europe.

3 Journals in Computer Science and 3 Journals in Big Data

3 journals that focus on algorithms, programming languages, managing telecommunications software engineering, managing corporate information resources, and managing partnership-based information technology (IT) operations:

Internal Journal of e-Collaboration:

It is a peer-reviewed journal that studies both in theory and practical findings that relate to implementation and design of collaboration tools: email, listservs, teleconferences, automate workflow, and demand management. This is extremely vital for those managers who feel that you can be a differentiator between groups and other companies via your application of collaboration tools. It is published quarterly, since 2005.

Journal of the ACM:

Editor and peer-reviewed process that covers articles about the design, semantics, implementation, and application of programming languages. Topics discussed: Parsing, compiling, optimization (like High-Performance Computing), run-time organization, data abstraction, modularity, parallelism, concurrency, domain and category theory, database systems and theory, algorithms and data structures, Artificial Intelligence, etc. Published bimonthly, since 1954.

European Association for Programming Languages and Systems (EAPLS E-journal):

A peer-reviewed process, which covers function and logic programming, with a focus on the integration of paradigms. Been in publication since 1995 and it publishes yearly.

3 journals that cover big data:

Big Data:

Since 2013, this is a quarterly peer-reviewed journal. Reports on the current state of storing, organizing, protecting, manipulating large data sets. It also explores challenges and opportunities in data discovery.

Intelligent Data Analysis:

Focused on Artificial Intelligence techniques in data analytics across all disciplines: visualization of data, data pre-processing, mining techniques, tools, and apps, machine learning, neural nets, fuzzy logic, stats pattern recognition, filtering, etc. 70% of papers are applications oriented and 30% is theoretical work. Published bimonthly and since 1996.

CODATA Data Science Journal:

Biannual peer-reviewed e-journal since 2002, which covers: data, databases, processing, complexity, scalability, distribution, interaction, application, interface with experiments, models, and information complexes, etc.