Big Data Analytics: Installing R

I didn’t have any problems with the installation thanks to a video produced by Dr. Webb (2014).  It is a bigger package than what I thought it would be, so it can take a few minutes to download, depending on your download speed and internet connection. Thus,

(1)    For proper installation of R, you need to have administrative access on your computer.

(2)    Watch this video, to get a step-by-step instructions and an online tutorial to installing R and its graphical Integrated Development Environment (IDE).

  1. Note: The application for R 32x and 64x can be found at http://cran.r-project.org/
  2. Note: The Rstudio free “Desktop” graphical IDE can be found at http://www.rstudio.com/

(3)    Once installed Use the manual for this application at this site: http://cran.r-project.org/doc/manuals/R-intro.html

Once, I installed the software and the graphical IDE, I continued to follow along with the video to use the prepopulated Cars data under the “datasets” Packages, and I got the same result as shown in the video.  I also would like to note that Dr. Webb (2014) also had checked the Packages: “datasets,” “graphics,” “grDevices,” “methods,” and “stats” in the video, which can be hard to see depending on your video streaming resolution.

Resources:

Webb, J. (2014). Installing and Using the “R” Programming Language and RStudio. Retrieved from https://www.youtube.com/watch?v=77PgrZSHvws&feature=youtu.be

Big Data Analytics: Hadoop®

Hadoop® Distributed File System (HFDS):

HFDS big data is broken up into smaller blocks (IBM, n.d.), which can be aggregated like a set of Legos throughout a distributed database system. Data blocks are distributed across multiple servers.  This block system provides an easy way to scale up or down the data needs of the company and allows for MapReduce to do it tasks on the smaller sets of the data for faster processing (IBM, n.d). Blocks are small enough that they can be easily duplicated (for disaster recovery purposes) in two different servers (or more, depending on your data needs).

Example 1:

An example of HFDS stored data, is to think of a deck of cards, which each card holds information about what it is, value, color, symbol, etc.  HFDS can divide the data into blocks by A, 2, 3 … J, Q, & K, thus each block will hold about four card data each.  Thus, there are 13 distinct data blocks, which have been parsed by their value and placed on 13 different servers.  Let’s also assume I need higher than average availability, so rather than two copies, I need four copies of the J, Q, & K values, and 2 for A, 1, 2 … 10.  This is possible.  Each of the copies could be clustered in similar servers, or each can have one server on its own.  This type of redundancies in my data within HFDS has the benefit of higher availability of my data.  Thus, when I need to analyze my data on my deck of cards, I can say, the important values J, Q, & K have a higher chance of being available than my perceived lower value cards A, 2 … 10.

MapReduce:

MapReduce contains two job types that work in parallel on distributed systems: (1) Mappers which creates & processes transactions on the system by mapping/aggregating data by key values, and (2) Reducers which know what that key value is, will take all those values stored in a map and reduce the data to what is relevant (Hortonworks, 2013 & Sathupadi, 2010). Reducers can work on different keys.  Huge amounts of data are entered into MapReduce, then the Mapper maps the data, then the data is shuffled and sorted before it is reduced.  Once the data is reduced, we get the output that we sought.

IBM’s (n.d.) MapReduce functions using the HFDS will run its procedures on the server in which the data is stored (also known as data locality).  Keeping in mind that HFDS has at least two backup copies, if one server goes down, which can happen, it can continue doing the tasks on the same data on a different server that is working.  This backup system for disaster recovery allows for high data availability.

Example 2:

Adjusted from Sathupadi (2010), is to look at how MapReduce can calculate the sum of all of Harvard Law Students and Medical Students current outstanding school loans per degree type.  Thus, the final output from our example would be Juris Doctorate (JD) Students Current Outstanding School Loan Amount and Latin Legum Magister (LLM) Students Current Outstanding School Loan Amount, and Doctor of Medicine (MD) School Loan Amount and Doctor of Osteopathic Medicine (DO) School Loan Amount.

If I ran this in Hadoop, a single copy of the data can be stored in 50 servers, and thus 50 nodes could be used to process this transaction request in parallel, speeding up the time it would take significantly but not by 50 fold.  The reason as to why not 50 fold is because it takes the time to reduce from mapping and nodes need to talk to each other, which slows down the speed of transaction.  So, running on X amount parallel never really is like saying we are X times faster, in reality, we are X-e times faster (where e is the transaction cost).

The bad data that gets thrown out in the mapper phase would be the Undergraduate Students, Doctorate of Philosophy Students, Master Degree Students, etc.  Only JD, LLM, MD, and DO Students will get one key each assigned to them, keys that are similar to all nodes, so that way the sum of all current outstanding school loan amounts get processed under the correct group.  If data is duplicated at least twice on different servers, if a server were to go down, the MapReduce function will move on to a copy of that data in which can still be mapped and reduced.

 Resources:

 

Big Data Analytics: Cloud Computing

Clouds come in three different privacy flavors: Public (all customers and companies share the all same resources), Private (only one group of clients or company can use a particular cloud resources), and Hybrid (some aspects of the cloud are public while others are private depending on the data sensitivity.

Cloud technology encompasses Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS).  These types of cloud differ in what the company managers with respect to what is managed by the cloud provider.  For IaaS the company manages the applications, data, runtime, and middleware, whereas the provider administers the O/S, virtualization, servers, storage, and networking.  For PaaS the company manages the applications, and data, whereas the vendor, administers the runtime, middleware, O/S, virtualization, servers, storage, and networking.  Finally SaaS the provider manages it all: application, data, O/S, virtualization, servers, storage, and networking (Lau, 2011).  This differs from the conventional data centers where the company managed it all: application, data, O/S, virtualization, servers, storage, and networking.

Examples of IaaS are Amazon Web Services, Rack Space, and VMware vCloud.  Examples of PaaS are Google App Engine, Windows Azure Platform, and force.com. Examples of SaaS are Gmail, Office 365, and Google Docs (Lau, 2011).

There are benefits of cloud is this pay-as-you-go business model.  One, the company can pay for as much (SaaS) or as little (IaaS) of the service that they need and how much in space they require. Two, the company can go on an On-Demand model, which businesses can scale up and down as they need (Dikaiakos, Katsaros, Mehra, Pallis, & Vakali, 2009).  For example, if a company would like a development environment for 3 weeks, they can build it up in the cloud for that time period and spend money for using the service for 3 weeks rather than buying a new set of infrastructure and setting up all the libraries.  This can help speed up the development speed in a ton of applications moving forward when you elect the cloud versus buying a new infrastructure.  These models are like renting a car.  Renting a car for what you need, but you are paying for what you use (Lau, 2011).

Replacing Conventional Data Center?

Infrastructure costs are really high.  For a company to be spending that much money on something that will get outdated in 18 months (Moore’s law of technology), it’s just a constant sink in money.  Outsourcing, infrastructure is the first step of company’s movement into the cloud.  However, companies need to understand the different privacy flavors well, because if data is stored in a public cloud, it will be hard to destroy the hardware, because you will destroy not only your data, but other people’s and company’s data.  Private clouds are best for government agencies which may need or require physical destruction of the hardware.  Government agencies may even use hybrid structures, keeping private data in the private clouds and the public stuff in a public cloud.  Companies that contract with the government could migrate to hybrid clouds in the future, and businesses without contracts with the government could go onto a public cloud.  There may always be a need to store the data on a private server, like patents, of KFC’s 7 herbs and spices recipe, but for the majority of the data, personally the cloud may be a grand place to store and work off of.

Note: Companies that do venture into moving into a cloud platform and storing data, they should focus on migrating data and data dictionaries slowly and with uniformity.  Data variables should have the same naming convention, one definition, a list of who is responsible for the data, meta-data, etc.  This would be a great chance for companies, while in migration to a new infrastructure to clean up their data.

Resources:

 

Big Data Analytics: Advertising

Advertising went from focusing on sales to a consumer focus, to social media advertising, to now trying to establish a relationship with consumers.  In the late 1990s and early 2000s, third party cookies were used on consumers to help deliver information to the company and based on the priority level of those cookies banner ads will appear selling targeted products on other websites (sometimes unrelated to the current search).  Sometimes you don’t even have to click on the banner for the cookies to be stored (McNurlin, Sprague, & Bui, 2008).  McNurlin et al. (2008) then talk about how current consumer shopping data was collected by loyalty cards, through BlockBuster, Publix, Winn-Dixie, etc.

Before all of this in the 1980s-today, company credit cards like a SEARS Master Card could have captured all this data, even though they had a load of other data that was collected that may not have helped them with selling/advertising a particular product mix that they carry.  They would help influence the buyer with giving them store discounts if the card was used in their location to drive more consumption.  Then they could target ads/flyers/sales based on the data they have gathered through each swipe of the card.

Now, in today’s world we can see online profiling coming into existence.  Online profiling is using a person’s online identity to collect information about them, their behaviors, their interactions, their tastes, etc. to drive a targeted advertising (McNurlin et al., 2008).  Online profiling straddles the point of becoming useful, annoying, or “Big Brother is watching” (Pophal, 2014).  Profiling began as third party cookies and have evolved with the times to include 40 different variables that could be sent off from your mobile device when the consumer uses it while they shop (Pophal, 2014).  This online profiling now allows for marketers to send personalized and “perfect” advertisements to the consumer, instantly.  However, as society switches from device to device, marketers must find the best way to continue the consumer’s buying experience without becoming too annoying, which can turn the consumer away from using the app and even buying the product (Pophal, 2014).  The best way to describe this is through this quote by a modern marketer in Phophal (2014): “So if I’m in L.A., and it’s a pretty warm day here-85 degrees-you shouldn’t be showing me an ad for hot coffee; you should be showing me a cool drink.” Marketers are now aiming to build a relationship with the consumers, by trying to provide perceived value to the customer, using these types of techniques.

Amazon tries a different approach, as items get attached to the shopping cart and before purchases, they use aggregate big data to find out what other items this consumer would purchase (Pophal, 2014) and say “Others who purchased X also bought Y, Z, and A.”  This quote, almost implies that these items are a set and will enhance your overall experience, buy some more.

Resources:

 

Big Data Analytics: Privacy & HIPAA

Since its inception 25 years ago, the human genome project has been sequenced many 3B base pair of the human genomes (Green, Watson, & Collins, 2015).  This project has given rise of a new program, the Ethical, Legal and Social Implication (ELSI) project.  ELSI got 5% of the National Institute of Health Budget, to study ethical implications of this data, opening up a new field of study (Green et al., 2015 & O’Driscoll, Daugelaite, & Sleator, 2013).  Data sharing must occur, to leverage the benefits of the genome projects and others like it.  Poldrak and Gorgolewski (2014) stated that the goals of sharing data help out with the advancement of the field in a few ways: maximizing the contribution of research subjects, enabling responses to new questions, enabling the generation of new questions, enhance research results reproducibility (especially when the data and software used are combined), test bed for new big data analysis methods, improving research practices (development of a standard of ethics), reducing the cost of doing the science (what is feasible for one scientist to do), and protecting valuable scientific resources (via indirectly creating a redundant backup for disaster recovery).  Allowing for data sharing of genomic data can present ethical challenges, yet allow for multiple countries and disciplines to come together and analyze data sets to come up with new insights (Green et al., 2015).

Richards and King (2014), state that concerning privacy, we must think of it regarding the flow of personal information.  Privacy cannot be thought of as a binary, as data is private and public, but within a spectrum.  Richards and Kings (2014) argue that the data as exchanged between two people has a certain level of expectation of privacy and that data can remain confidential, but there is never a case were data is in absolute private or public.  Not everyone in the world would know or care about every single data point, nor will any data point be kept permanently secret if it is uttered out loud from the source.  Thus, Richards and Kings (2014) stated that transparency can help prevent abuse of the data flow.  That is why McEwen, Boyer, and Sun (2013) discussed that there could exist options for open-consent (your data can be used for any other future research project), broad-consent (describe various ways the data could be used, but it is not universal), or an opt-out-consent (where participants can say what their data shouldn’t be used for).

Attempts are being made through the enactment of Genetic Information Nondiscrimination Act (GINA) to protect identifying data for fears that it can be used to discriminate against a person with a certain type of genomic indicator (McEwen et al., 2013).  Internal Review Boards and Common Rules, with the Office of Human Research Protection (OHRP), have guidance on information flow that is de-identified.  De-identified information can be shared and is valid under current Health Insurance Portability and Accounting Act of 1996 (HIPAA) rules (McEwen et al, 2013).  However, fear of loss of data flow control comes from increase advances in technological decryption and de-anonymisation techniques (O’Driscoll et al., 2013 and McEwen et al., 2013).

Data must be seen and recognized as a person’s identity, which can be defined as the “ability of individuals to define who they are” (Richards & Kings, 2014). Thus, the assertion made in O’Driscoll et al. (2013) about how the ability to protect medical data, with respects to bid data and changing concept, definitional and legal landscape of privacy is valid.  Thanks to HIPAA, cloud computing, is currently on a watch list. Cloud computing can provide a lot of opportunity for cost savings. However, Amazon cloud computing is not HIPAA compliant, hybrid clouds could become HIPAA, and commercial cloud options like GenomeQuest and DNANexus are HIPAA compliant (O’Driscoll et al., 2013).

However, ethical issues extend beyond privacy and compliance.  McEwen et al. (2013) warn that data has been collected for 25 years, and what if data from 20 years ago provides data that a participant can suffer an adverse health condition that could be preventable.  What is the duty of the researchers today to that participant?  How far back in years should that go through?

Other ethical issues to consider: When it comes to data sharing, how should the researchers who collected the data, but didn’t analyze it should be positively incentivized?  One way is to make them co-author of any publication revolving their data, but then that makes it incompatible with standards of authorships (Poldrack & Gorgolewski, 2013).

 

Resources:

  • Green, E. D., Watson, J. D., & Collins, F. S. (2015). Twenty-five years of big biology. Nature, 526.
  • McEwen, J. E., Boyer, J. T., & Sun, K. Y. (2013). Evolving approaches to the ethical management of genomic data. Trends in Genetics, 29(6), 375-382.
  • Poldrack, R. A., & Gorgolewski, K. J. (2014). Making big data open: data sharing in neuroimaging. Nature Neuroscience, 17(11), 1510-1517
  • O’Driscoll, A., Daugelaite, J., & Sleator, R. D. (2013). ‘Big data,’ Hadoop and cloud computing in genomics. Journal of biomedical informatics, 46(5), 774-781.
  • Richards, N. M., & King, J. H. (2014). Big data ethics. Wake Forest L. Rev., 49, 393.

 

Big Data Analytics: Health Care Industry

Since its inception 25 years ago, the human genome project has been trying to sequence its first 3B base pair of the human genome over a 13 year period (Green, Watson, & Collins, 2015).  This 3B base pair is about 100 GB uncompressed and by 2011, 13 quadrillion bases were sequenced (O’Driscoll, Daugelaite, & Sleator, 2013).  With the advancement in technology and software as a service, the cost of sequencing a human genome has been drastically cut from $1M to $1K in 2012 (Green et al., 2015 and O’Driscoll et al., 2013).  It is so cheap now that 23andMe and others were formed as a consumer drove genetic testing industry that has been developed (McEwen, Boyer, & Sun, 2013).  At the beginning of this project, the researcher was wondering what insights the sequencing could bring to understanding decease, to the now explosion of research dealing with studying millions of other genomes from biological pathways, cancerous tumors, microbiomes, etc. (Green et al., 2015 and O’Driscoll et al., 2013).  Storing 1M genomes will exceed 1 Exabyte (O’Driscoll et al., 2013).  Based on the definition of Volume (size like 1 EB), Variety (different types of genomes), and Velocity (processing huge amounts of genomic data), we can classify that the whole genomic project in the health care industry as big data.

This project has paved the way for other projects like sharing MRI data from 511 participants, (exceeding 18 TB) to be shared and analyzed (Poldrak & Gorgolewski, 2014).  Green et al. (2015) have stated that the genome project has led to huge innovation in tangent fields, not directly related to biology, like chemistry, physics, robotics, computer science, etc.  It was due to this type of research that a capillary-based DNA sequencing instruments were invented to be used for sequencing genomes (Green et al., 2015).  The Ethical, legal and Social Implication project, got 5% of the National Institute of Health Budget, to study ethical implications of this data, opening up a new field of study (Green et al., 2015 & O’Driscoll et al., 2013).  O’Driscoll et al. (2013), suggested that solutions like Hadoop’s MapReduce would greatly advance this field.  However, he argues that current java intensive knowledge is needed, which can be a bottleneck on the biologist.   Luckily, this field is helping to provide a need to create a Guided User Interface, which will allow scientist to conduct research and not learn to program.  O’Driscoll et al. (2013), also state that the biggest drawback of using Hadoop MapReduce function is that it reduces data line by line, whereas genomic data needs to be reduced in groups.  This project, should, with time improve the service offering of Hadoop to other fields outside of biomedical research.

In the medical field, cancer diagnosis and treatments will now be possible due to this project (Green et al., 2015).  Green et al. (2015) also predict that a maturation of the microbiome science, routine use of stem-cell therapies could result from this.  These predictions are not far from becoming reality and are the foundation of predictive and preventative medicine.  This is not so far into the future that McEwen et al. (2013) have stated what are the ethical issues, for people who have submitted their genomic data 25 years ago, and they found data that could help the participants take preventative measures for adverse health conditions.  Mostly because clinical versions of this data are starting to become available like from companies like 23andMe. This information so far has yield genealogy data, a few predictive medical measures (to a certain confidence interval).  Predictive and preventative medical advances are still primary and currently in the research phase (McEwen et al., 2013).  Finally, genomics research will pave the way for metagenomics, which is the study of microbiome data of as many of the ~4-6* 10^30 bacterial cells (O’Driscoll et al., 2013).

From this discussion, there is no doubt that genomic data can fall under the classification of big data.  The analysis of this data has yielded advances in the medical fields and other tangential fields.  Future work, to expanding the predictive and preventative medicine is still needed; it is only in research studies, where the participants can learn about their genomic indicators that may lead them to certain types of adverse health conditions.

Resources:

  • Green, E. D., Watson, J. D., & Collins, F. S. (2015). Twenty-five years of big biology. Nature, 526.
  • McEwen, J. E., Boyer, J. T., & Sun, K. Y. (2013). Evolving approaches to the ethical management of genomic data. Trends in Genetics, 29(6), 375-382.
  • O’Driscoll, A., Daugelaite, J., & Sleator, R. D. (2013). ‘Big data,’ Hadoop and cloud computing in genomics. Journal of biomedical informatics, 46(5), 774-781.
  • Poldrack, R. A., & Gorgolewski, K. J. (2014). Making big data open: data sharing in neuroimaging. Nature neuroscience, 17(11), 1510-1517.

 

Big Data Analytics: Pizza Industry

Pizza, pizza! A competitive analysis was completed on Dominos, Pizza Hut, and Papa Johns.  Competitive analysis is gathering external data that is available freely, i.e. social media like Twitter tweets and Facebook posts.  That is what He, Zha, and Li (2013) studied, approximately 307 total tweets (266 from Dominos, 24 from Papa John, 17 from Pizza Hut) and 135 wall post (63 from Dominos, 37 from Papa Johns, 35 from Pizza Hut), for the month October 2011(He et al, 2013).  It should be noted that these are the big three pizza chain controlling 23% of the total market share (7.6% from Dominos, 4.23% from Papa Johns, 11.65% from Pizza Hut)(He et al., 2013) (He et al., 2013). Posts and tweets contain text data, videos, and pictures.  All the data collected was text-based data and collected manually, and SPSS Clementine tool was used to discover themes in their text (He et al., 2013).

He et al. (2013), found that Domino’s Pizza was using social media to engage their customers the most.  Domino’s Pizza did the most to reply to as many tweets and posts.  The types of posts in all three companies varied from the promotion to marketing to polling (i.e. “What is your favorite topping?”), facts about pizza, Halloween-themed posts, baseball themed posts, etc. (He et al., 2013).  Results from the text mining of all three companies: Ordering and delivery was key (customers shared the experience and feelings about their experience), Pizza Quality (taste & quality), Feedback on customers’ purchase decisions, Casual socialization posts (i.e. Happy Halloween, Happy Friday), and Marketing tweets (posts on current deals, promotions and advertisement) (He et al, 2013).  Besides text mining, there was also content analysis on each of their sites (367 pictures & 67 videos from Dominos, 196 pictures & 40 videos from Papa Johns, and 106 pictures and 42 videos from Pizza Hut), which showed that the big three were trying to drive customer engagement (He et al., 2013).

He et al. (2013) lists the theory that with higher positive customer engagement, customers can become brand advocates, which increases their brand loyalty and push referrals to their friends, and approximately 1/3 people followed a friend’s referral if done through social media.  Thus, evaluating the structure and unstructured data provided to an organization about their own product and theirs of their competitors, they could use it to help increase their customer services, driving improvements in their own products, and driving more customers to their products (He et al., 2013).  Key lessons from this study, which would help any organization gain an advantage in the market are to (1) Constantly monitor your social media and those of your competitors, (2) Establish a benchmark of how many posts, likes, shares, etc. between you and your competitors, (3) Mine the conversational data for content and context, and (4) analyze the impact of your social media footprint to your own business (when prices rise or fall what is the response, etc.) (He et al, 2013).

Resources:

  • He, W., Zha, S., & Li, L. (2013). Social media competitive analysis and text mining: A case study in the pizza industry. International Journal of Information Management, 33(3), 464-472.

 

What is Big Data Analytics?

 

What makes big data different from conventional data that you use every day?
The differentiation exists where big data and conventional deals with data storage and data analysis. Big data is complex, challenging, and significant (Ward & Barker, 2013). Ward and Barker (2013) traced back the definition of Volume, Velocity, and Variety from Gartner. They then compare its definition to Oracle’s, which is data to mean the value derived from merging relational database with unstructured data that can vary in size, structure, format, etc. Finally, the authors state that Intel big data definition is a company generating about 300 TB weekly, and typically it can come from transactions, documents, emails, sensor data, social media, etc. They use all of this information to state that the true definition should lie with the size of the data, a complexity of the data, and the technologies used to analyze the data. This is how you can differentiate it from conventional data.

Davenport, Barth, and Bean (2012), stated that IT companies define big data as “more insightful data analysis”, but if used properly companies can gain a competitive edge. Companies that use big data: are aware of data flows (customer-facing data, continuous process data, network relationships, which is dynamic and always changing in a continuous flow), rely on data scientists (upgraded data management skill, programing, math, stats, business acumen, and effective communication) and move away from IT functions (concerned with automation) into ops or prod functions (since its goals is to present information to the business first). Data in a continuous flow needs to have business processes set up for obtaining/gathering/capturing, storing, extracting, filtering, manipulating, structuring, monitoring, analyzing and interpreting, to help facilitate data-driven decisions.

Finally, Lazer, Kennedy, King, and Vespignani (2014), talked about big data hubris, where the assumption that big data can do it all and is a great substitute for conventional data analysis. They state that errors in measurement, validity, reliability and dependencies in the data cannot be ignored. Big data analysis can overfit its analysis to a small number of cases. Greater value to any big dataset is to marry it with other near-real-time data from different sources, but continuous evaluation and improvement should always be incorporated. Sources of errors in analysis can arise from measurement (is it stable and comparable across cases and over time, are there systematic errors), algorithm dynamics, search algorithms, and changes in the data-generating process. The authors finally state that transparency and replicability of data analysis (especially secondary or aggregate data, since there are fewer privacy concerns in that), could help improve the results of big data analysis. Without transparency and replicability, how will other scientist learn and build on the knowledge (thus destroying the accumulation of knowledge)?

There is a difference between big data and conventional data. But, no matter how big, fast, and different the data sets are, one cannot deny that because of big data, conventional data gathering, analysis, and techniques are not influenced either. Improvements have been made, to allow doctoral students to conduct surveys at a much faster rate, gather more unstructured data through interview processes, and transcription software used for audio files in big data can also be used in smaller conventional data. Though vastly different, and can come with their errors as we improve one, we inadvertently improve the other.

Public Sites that provide free access to big data sets:

References:

  • Davenport, T. H., Barth, P., & Bean, R. (2012). How big data is different. MIT Sloan Management Review, 54(1), 43.
  • Lazer, D., Kennedy, R., King, G., & Vespignani, A. (2014). The parable of Google Flu: Traps in big data analysis. Science, 343(14 March).
  • Ward, J. S., & Barker, A. (2013). Undefined by data: a survey of big data definitions. arXiv preprint arXiv:1309.5821.

Zeno’s Paradox

Some infinities are bigger than others.

A paradox to motion:

Zeno described a paradox of motion, which helps describes the one type of many infinities. Zeno’s paradox is described below (Stanford Encyclopedia of Philosophy, 2010):

“Imagine Achilles chasing a tortoise, and suppose that Achilles is running at 1 m/s, that the tortoise is crawling at 0.1 m/s and that the tortoise starts out 0.9 m ahead of Achilles. On the face of it Achilles should catch the tortoise after 1s, at a distance of 1m from where he starts (and so 0.1m from where the Tortoise starts). We could break Achilles’ motion up … as follows: before Achilles can catch the tortoise he must reach the point where the tortoise started. But in the time he takes to do this the tortoise crawls a little further forward. So next Achilles must reach this new point. But in the time it takes Achilles to achieve this the tortoise crawls forward a tiny bit further. And so on to infinity: every time that Achilles reaches the place where the tortoise was, the tortoise has had enough time to get a little bit further, and so Achilles has another run to make, and so Achilles has in infinite number of finite catch-ups to do before he can catch the tortoise, and so, Zeno concludes, he never catches the tortoise.”

This paradox was used to illustrate that not all infinities are the same, and one infinity can indeed be bigger than another.  An interpretation of this paradox was written poetically in a eulogy for the book of The Fault in Our Stars (Green, 2012):

“There is an infinite between 0 and 1. There’s .1 and .12 and .112 and an infinite collection of others. Of course there is a bigger infinite set of numbers between 0 and 2, or between 0 and a million. Some infinities are bigger than other infinities. … There are days, many days of them, when I resent the size of my unbounded set. I want more numbers than I’m likely to get, and God, I want more numbers for Augustus Waters than he got. But, Gus, my love, I cannot tell you how thankful I am for our little infinity. I wouldn’t trade it for the world. You have me a forever within the numbered days, and I’m grateful.” (pg. 259-260)

So to my readers out there, I want to thank you in advance for the little infinity(ies) I will get to share with each of you through this blog, and for that I am grateful.

Resources:

  • Green, J. (2012). The fault in our stars.  New York, New York: Penguin Group (USA) Inc.
  • Stanford Encyclopedia of Philosophy (2010). Zeno’s Paradoxes. Retrieved from http://plato.stanford.edu/entries/paradox-zeno/#AchTor