Bioinformatics is essential for the management of data in modern biology and medicine. By finding more secure solutions we will be creating greater opportunity and more efficiency in medical research. ~Audrey Bentley

Bioinformatics is essential for the management of data in modern biology and medicine. By finding more secure solutions we will be creating greater opportunity and more efficiency in medical research. ~Audrey Bentley

Consumer Genetic Testing and Privacy Concerns Part 2. Updated July 14th


“Science has always been my preoccupation and when you think a breakthrough is possible, it is terribly exciting.”
— James D. Watson

Bioinformatics is typically used in the health care systems to manage large amounts of patient data and is also now used to help understand disease states, normal physiology and also for commercial purposes. The reason for so much concern for privacy and security in this space isn’t just because of the possibility of re-identification..but what could happen AFTER your data is re-identified. There is so much basic information readily available online that once you identify an individual it’s not hard to find out personal information about them, between social media sites, genealogy forums, and publicly available pedigrees most information that was once considered personal or private is just a few clicks away for anyone who wants it. Another potential privacy hazard is the combination of Genetic Data and EMRs (Electronic Medical Records) being used together to advance medical research. Doing so has the potential to make groundbreaking links between diseases and genes. However, the combination and availability of such data causes severe concern when it comes to privacy. Our privacy laws are NOT where they need to be.


The Federal Genetic Informant Nondiscrimination Act (GINA) of 2008 prohibits genetic discrimination by health insurers or employers. The genetic information protected by this law includes: Family health history, the results of genetic tests, the use of genetic counseling and also participation in any genetic research. Here’s the big problem: This law does NOT apply to life insurance companies, long-term care, disability insurance or mortgage lenders.

There are other legislations to help keep your genetic info private and help to protect it when the info is disclosed for research purposes: Common Rule, Health Information Portability and Accountability Act (HIPAA) and EU Data Protection Directive.


The protections covered by HIPAA are defined as Protected Health Information (PHI) and there are limits to who this information may be shared with. In 2013 it became required by the Genetic Information Non-discrimination Act for genetic information to be considered PHI, that is individually identifiable. However, there are NO restrictions on the use or disclosure of PHI that has been de-identified.

In 2013 in Maryland V. King the U.S. Supreme Court rules in a 5-4 decision that law enforcement may collect DNA samples from suspects who have been arrested for a crime. The determination was basically implying the collection of DNA is no different than fingerprinting or photographing. The justices who didn’t agree with the majority were outraged. Justice Scalia commented “Make no mistake about it, your DNA can be taken and entered into a national database if you are ever arrested, rightly OR wrongly, and for whatever reason.”


June 2019- Senators Amy Klobuchar (D-MN) and Lisa Murkowski(R-AK) introduced the Protecting Personal Health Data Act Click here for details This legislation would provide new privacy and security rules from the Department of Health and Human Services. This doesn’t just include direct-to-consumer genetic testing services, but also wearable fitness tracking, social media sites that focuses on health data, and other health technologies. This is huge because this legislation would provide protection where HIPAA and GINA doesn’t apply.


Surreptitious DNA testing adds another element of concern when it comes to genetic data privacy. Surreptitious DNA testing is when the DNA of an individual is tested without their knowledge or consent. There are companies out there who allow consumers to obtain biological forms of samples (e.g. a licked envelope, hair from a brush, stains, etc.) WITHOUT requiring any sort of consent from the individual. There are currently NO federal laws prohibiting surreptitious DNA testing.


There is not one simple solution to help mitigate the privacy concerns that come along with genetic testing. Progress needs to be made in every area: our laws, our technology, and in our security practices. A great first step would be for all the companies involved in at-home genetic testing to be more transparent with consumers about how hard it is to guarantee anonymity. This also goes for genetic testing done for research purposes. Consumers have every right to know the risks involved. Anyone who takes part in any genome research project should look into their state’s privacy laws because they vary state to state. Also, any individual taking part in research testing can also obtain a Certificate of Confidentiality from the National Institute of Health. Certificates of Confidentiality (CoC) protect the privacy of research subjects by prohibiting disclosure of identifiable, sensitive research information except when the subject gives consent. Please visit for more information from the National Human Genome Research Institute


As for the data itself, what can be done? We’ve already discussed de-identification but I would also suggest approximate query-answering and Encryption. Encryption in general is a tool that causes data to be indecipherable unless you have the secret encryption key. When dealing with data involved in genetic research I would propose using a Homomorphic Encryption (HE) scheme. HE allows other parties to operate on the data without possession of the decryption key. Doing this allows data to be available for research yet secure.


Online analytical processing (OLAP) is a core functionality in data management and analytics systems. The performance of OLAP is crucial for many applications that need to use this function to make online decisions (e.g. business intelligence). OLAP is however quite costly to use for very large datasets, especially Big Data. This is where Approximate Query Processing(AQP) comes into play. AQP computes approximate answers very efficiently to meet the high performance requirements and is crucial in helping to protect genomic privacy because of the robust size of the databases.

For more of what can be done see my blogs on Secure Read Mapping and Privacy-Preserving String Searching..


Privacy-Preserving String Searching...

Consumer Genetic Testing and Privacy Concerns Part 1.