International health care data laws

Governing the way that health is dealt with internationally since 1969 is the International Health Regulations (IHR) and it had been updated in 2005 (Georgetown Law, n.d.; World Health Organization [WHO], 2005). Under Article 45 of the IHR deals with the treatment of personal data (WHO, 2005):

  • Personal identifiable data and information that has been collected or received shall be confidential and processed anonymously.
  • Data can be disclosed for purposes that are vital for public health. However, the data that is transferred must be adequate, accurate, relevant, up-to-date, and not excessive data that has to be processed fairly and lawfully.
  • Bad or incompatible data is either corrected or deleted.
  • Personal data is not kept any longer than what is necessary.
  • WHO will provide data of the patient to the patient upon request in a timely fashion and allow for data correction from the patients

The European Union has the Directive on Data Protection of 1998 (DDP), and Canada has Personal Information Protection and Electronic Documents Act of 2000 (PIPEDA) that is similar to the U.S. HIPAA regulations set forth by the U.S. Department of Health and Human Services (Guiliano, 2014). Eventually, the EU in 2012 proposed the addition of the Data Protection Regulation (DPR) of 2016 (Hordern, 2015, Justice, n.d.).

EU’s DDP allows (Guiliano, 2014):

  • It is outlawed to transfer data to any non-EU entity that doesn’t meet EU data protection standards.
  • The government must give consent before gathering sensitive data for certain situations only
  • Only data that is needed at the time that has an explicit and reasonPable purpose.
  • Patients should be allowed to correct errors in personal data, and if the data is outdated or useless, they must be discarded.
  • People with access to this data must have been properly trained.

EU’s DPR allows (Hordern, 2015; Justice, n.d.):

  • People can allow for data to be used for future scientific research where the purpose is still unknown as long as the research is conducted by “recognized ethical ”
  • Processing data for scientific studies based on the data that has already been collected is legal without the need to get additional consent
  • Health data may be used without the consent of the individual for public health
  • Health data cannot be used by employers, insurance, and banking companies
  • If data is being or will be used for future research, data can be retained further than current regulations

Canadian’s PIPEDA allows (Guiliano, 2014):

  • Patients should know the business justification for using their personal and medical data.
  • Patients can review their data and have errors corrected
  • Organizations must request from their patients the right to use their data for each situation except in criminal cases or emergencies
  • Organizations cannot collect patient and medical data that is not needed for the current situation unless they ask for permission from their patients and telling them how it will be used and who will use it.

Other Internal laws or regulations regard big data from Australia, Brazil, China, France, Germany, India, Israel, Japan, South Africa and the United Kingdom are summarized in the International and Comparative Study on Big Data (der Sloot & van Schendel, 2016).  When it comes to transferring U.S. collected and processed data internationally, the U.S. holds all U.S. regulated entities liable to all U.S. data regulations (Jolly, 2016).  Some states in the U.S. further restrict the export of personal data to international entities (Jolly, 2016).  Thus, any data exported or imported from other countries must deal with the regulations of the country (or state) of origin and those of the country (or state) to which it is exported in.

In the United Kingdom, a legal case on health care data was presented and was ruled upon.  This case dealt with the rate of de-identifiable primary care physician prescription habits data breached confidentiality laws because of the lack of consent (Knoppers, 2000).  The consent had to cover both commercial and public issues purposes.  This lack of both types of consent meant that there was a misuse of data. In the Supreme Court of Canada, consent was not collected properly and violated the expectation of privacy between the patients and private healthcare provider (Knoppers, 2000).  All of these laws and regulations amongst international and domestic views of data usage, consent, and expectation of privacy with healthcare data all are trying to protect people from the misuse of data.

References

Advertisements

Big Data Analytics: Compelling Topics

This post reviews and reflects on the knowledge shared for big data analytics and my opinions on the current compelling topics in the field.

Big Data and Hadoop:

According to Gray et al. (2005), traditional data management relies on arrays and tables in order to analyze objects, which can range from financial data, galaxies, proteins, events, spectra data, 2D weather, etc., but when it comes to N-dimensional arrays there is an “impedance mismatch” between the data and the database.    Big data, can be N-dimensional, which can also vary across time, i.e. text data (Gray et al., 2005). Big data, by its name, is voluminous. Thus, given the massive amounts of data in Big Data that needs to get processed, manipulated, and calculated upon, parallel processing and programming are there to use the benefits of distributed systems to get the job done (Minelli, Chambers, & Dhiraj, 2013).  Parallel processing allows making quick work on a big data set, because rather than having one processor doing all the work, you split up the task amongst many processors.

Hadoop’s Distributed File System (HFDS), breaks up big data into smaller blocks (IBM, n.d.), which can be aggregated like a set of Legos throughout a distributed database system. Data blocks are distributed across multiple servers. Hadoop is Java-based and pulls on the data that is stored on their distributed servers, to map key items/objects, and reduces the data to the query at hand (MapReduce function). Hadoop is built to deal with big data stored in the cloud.

Cloud Computing:

Clouds come in three different privacy flavors: Public (all customers and companies share the all same resources), Private (only one group of clients or company can use a particular cloud resources), and Hybrid (some aspects of the cloud are public while others are private depending on the data sensitivity.  Cloud technology encompasses Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS).  These types of cloud differ in what the company managers on what is managed by the cloud provider (Lau, 2011).  Cloud differs from the conventional data centers where the company managed it all: application, data, O/S, virtualization, servers, storage, and networking.  Cloud is replacing the conventional data center because infrastructure costs are high.  For a company to be spending that much money on a conventional data center that will get outdated in 18 months (Moore’s law of technology), it’s just a constant sink in money.  Thus, outsourcing the data center infrastructure is the first step of company’s movement into the cloud.

Key Components to Success:

You need to have the buy-in of the leaders and employees when it comes to using big data analytics for predictive, prescriptive or descriptive purposes.  When it came to buy-in, Lt. Palmer had to nurture top-down support as well as buy-in from the bottom-up (ranks).  It was much harder to get buy-in from more experienced detectives, who feel that the introduction of tools like analytics, is a way to tell them to give up their long-standing practices and even replace them.  So, Lt. Palmer had sold Blue PALMS as “What’s worked best for us is proving [the value of Blue PALMS] one case at a time, and stressing that it’s a tool, that it’s a compliment to their skills and experience, not a substitute”.  Lt. Palmer got buy-in from a senior and well-respected officer, by helping him solve a case.  The senior officer had a suspect in mind, and after feeding in the data, the tool was able to predict 20 people that could have done it in an order of most likely.  The suspect was on the top five, and when apprehended, the suspect confessed.  Doing, this case by case has built the trust amongst veteran officers and thus eventually got their buy in.

Applications of Big Data Analytics:

A result of Big Data Analytics is online profiling.  Online profiling is using a person’s online identity to collect information about them, their behaviors, their interactions, their tastes, etc. to drive a targeted advertising (McNurlin et al., 2008).  Profiling has its roots in third party cookies and profiling has now evolved to include 40 different variables that are collected from the consumer (Pophal, 2014).  Online profiling allows for marketers to send personalized and “perfect” advertisements to the consumer, instantly.

Moving from online profiling to studying social media, He, Zha, and Li (2013) stated their theory, that with higher positive customer engagement, customers can become brand advocates, which increases their brand loyalty and push referrals to their friends, and approximately 1/3 people followed a friend’s referral if done through social media. This insight came through analyzing the social media data from Pizza Hut, Dominos and Papa Johns, as they aim to control more of the market share to increase their revenue.  But, is this aiding in protecting people’s privacy when we analyze their social media content when they interact with a company?

HIPAA described how we should conduct de-identification of 18 identifiers/variables that would help protect people from ethical issues that could arise from big data.   HIPAA legislation is not standardized for all big data applications/cases; it is good practice. However, HIPAA legislation is mostly concerned with the health care industry, listing those 18 identifiers that have to be de-identified: Names, Geographic data, Dates, Telephone Numbers, VIN, Fax, Device ID and serial numbers, emails addresses, URLs, SSN, IP address, Medical Record Numbers, Biometric ID (fingerprints, iris scans, voice prints, etc), full face photos, health plan beneficiary numbers, account numbers, any other unique ID number (characteristic, codes, etc), and certifications/license numbers (HHS, n.d.).  We must be aware that HIPAA compliance is more a feature of the data collector and data owner than the cloud provider.

HIPAA arose from the human genome project 25 years ago, where they were trying to sequence its first 3B base pair of the human genome over a 13 year period (Green, Watson, & Collins, 2015).  This 3B base pair is about 100 GB uncompressed and by 2011, 13 quadrillion bases were sequenced (O’Driscoll et al., 2013). Studying genomic data comes with a whole host of ethical issues.  Some of those were addressed by the HIPPA legislation while other issues are left unresolved today.

One of the ethical issues that arose were mentioned in McEwen et al. (2013), for people who have submitted their genomic data 25 years ago can that data be used today in other studies? What about if it was used to help the participants of 25 years ago to take preventative measures for adverse health conditions?  However, ethical issues extend beyond privacy and compliance.  McEwen et al. (2013) warn that data has been collected for 25 years, and what if data from 20 years ago provides data that a participant can suffer an adverse health condition that could be preventable.  What is the duty of the researchers today to that participant?

Resources:

Big Data Analytics: Privacy & HIPAA

Although the use of big data offers many advantages in the health care field, it also poses many concerns with regard to privacy and compliance with the Health Insurance Portability and Accountability Act (HIPAA). This post discusses concerns about big data analytics with regard to privacy and HIPAA compliance.

Since its inception 25 years ago, the human genome project has been sequenced many 3B base pair of the human genomes (Green, Watson, & Collins, 2015).  This project has given rise of a new program, the Ethical, Legal and Social Implication (ELSI) project.  ELSI got 5% of the National Institute of Health Budget, to study ethical implications of this data, opening up a new field of study (Green et al., 2015 & O’Driscoll, Daugelaite, & Sleator, 2013).  Data sharing must occur, to leverage the benefits of the genome projects and others like it.  Poldrak and Gorgolewski (2014) stated that the goals of sharing data help out with the advancement of the field in a few ways: maximizing the contribution of research subjects, enabling responses to new questions, enabling the generation of new questions, enhance research results reproducibility (especially when the data and software used are combined), test bed for new big data analysis methods, improving research practices (development of a standard of ethics), reducing the cost of doing the science (what is feasible for one scientist to do), and protecting valuable scientific resources (via indirectly creating a redundant backup for disaster recovery).  Allowing for data sharing of genomic data can present ethical challenges, yet allow for multiple countries and disciplines to come together and analyze data sets to come up with new insights (Green et al., 2015).

Richards and King (2014), state that concerning privacy, we must think of it regarding the flow of personal information.  Privacy cannot be thought of as a binary, as data is private and public, but within a spectrum.  Richards and Kings (2014) argue that the data as exchanged between two people has a certain level of expectation of privacy and that data can remain confidential, but there is never a case were data is in absolute private or public.  Not everyone in the world would know or care about every single data point, nor will any data point be kept permanently secret if it is uttered out loud from the source.  Thus, Richards and Kings (2014) stated that transparency can help prevent abuse of the data flow.  That is why McEwen, Boyer, and Sun (2013) discussed that there could exist options for open-consent (your data can be used for any other future research project), broad-consent (describe various ways the data could be used, but it is not universal), or an opt-out-consent (where participants can say what their data shouldn’t be used for).

Attempts are being made through the enactment of Genetic Information Nondiscrimination Act (GINA) to protect identifying data for fears that it can be used to discriminate against a person with a certain type of genomic indicator (McEwen et al., 2013).  Internal Review Boards and Common Rules, with the Office of Human Research Protection (OHRP), have guidance on information flow that is de-identified.  De-identified information can be shared and is valid under current Health Insurance Portability and Accounting Act of 1996 (HIPAA) rules (McEwen et al, 2013).  However, fear of loss of data flow control comes from increase advances in technological decryption and de-anonymisation techniques (O’Driscoll et al., 2013 and McEwen et al., 2013).

Data must be seen and recognized as a person’s identity, which can be defined as the “ability of individuals to define who they are” (Richards & Kings, 2014). Thus, the assertion made in O’Driscoll et al. (2013) about how the ability to protect medical data, with respects to bid data and changing concept, definitional and legal landscape of privacy is valid.  Thanks to HIPAA, cloud computing, is currently on a watch list. Cloud computing can provide a lot of opportunity for cost savings. However, Amazon cloud computing is not HIPAA compliant, hybrid clouds could become HIPAA, and commercial cloud options like GenomeQuest and DNANexus are HIPAA compliant (O’Driscoll et al., 2013).

However, ethical issues extend beyond privacy and compliance.  McEwen et al. (2013) warn that data has been collected for 25 years, and what if data from 20 years ago provides data that a participant can suffer an adverse health condition that could be preventable.  What is the duty of the researchers today to that participant?  How far back in years should that go through?

Other ethical issues to consider: When it comes to data sharing, how should the researchers who collected the data, but didn’t analyze it should be positively incentivized?  One way is to make them co-author of any publication revolving their data, but then that makes it incompatible with standards of authorships (Poldrack & Gorgolewski, 2013).

 

Resources:

  • Green, E. D., Watson, J. D., & Collins, F. S. (2015). Twenty-five years of big biology. Nature, 526.
  • McEwen, J. E., Boyer, J. T., & Sun, K. Y. (2013). Evolving approaches to the ethical management of genomic data. Trends in Genetics, 29(6), 375-382.
  • Poldrack, R. A., & Gorgolewski, K. J. (2014). Making big data open: data sharing in neuroimaging. Nature Neuroscience, 17(11), 1510-1517
  • O’Driscoll, A., Daugelaite, J., & Sleator, R. D. (2013). ‘Big data,’ Hadoop and cloud computing in genomics. Journal of biomedical informatics, 46(5), 774-781.
  • Richards, N. M., & King, J. H. (2014). Big data ethics. Wake Forest L. Rev., 49, 393.