Bioinformatics is essential for the management of data in modern biology and medicine. By finding more secure solutions we will be creating greater opportunity and more efficiency in medical research. ~Audrey Bentley

Bioinformatics is essential for the management of data in modern biology and medicine. By finding more secure solutions we will be creating greater opportunity and more efficiency in medical research. ~Audrey Bentley

GENOMICS: To Comply in The Cloud Pt. 1

The Genomics industry has grown astronomically in the past decade, and it will only keep growing as the demand for personalized medicine keeps increasing, and Direct-to-Consumer DNA testing continues to flourish. Not to mention the cost for having your entire genome sequenced has decreased significantly in recent years. Thanks to this huge increase in testing we now have a colossal amount of genomic data. The public archives for raw sequencing data has been doubling in size every 18 months! Also, keep in mind: One whole genomic sequence creates approximately 200 gigabytes of raw data, and it should be noted that the actual analysis of the data will create additional gigabytes of data and require even more computing power. That brings us to the point of this write-up: How and where do we securely store this highly sensitive data that also happens to require a lot of computing power? This is where Cloud Computing comes into the picture…

cloudcomput3.jpg
Cloud Computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g. networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.
— NIST

The Cloud Service Models:

Infrastructure As-a-Service (IaaS)

  • Primary Business driver is a large-scale raw compute & storage

  • Primary threats include lack of due diligence for tenant isolation

    Platform As-a-Service (PaaS)

  • Primary business driver is developing and deploying applications

  • Primary threats include poor or absent DevSecOps

    Software As-a-Service (SaaS)

  • Primary business driver is specific service consumption

  • Primary threats include lack of granularity of data access controls


The Cloud Deployment Models:

PRIVATE

  • MAJOR BUSINESS DRIVER IS LOW RISK TOLERANCE & HEAVY REGULATIONS

  • TOP CONCERN: HIGHEST OPERATING COSTS

    PUBLIC

  • MAJOR BUSINESS DRIVER IS LOWER COST OF OWNERSHIP

  • TOP CONCERN: SECURITY GOVERNANCE

    COMMUNITY

  • MAJOR BUSINESS DRIVER IS KNOWLEDGE ACCESS & SHARING

  • TOP CONCERN: GRANULAR COMMUNITY SHARING

    HYBRID

  • MAJOR BUSINESS DRIVER IS DIVERSE SERVICE NEEDS

  • TOP CONCERN: PROPER FEDERATION CONTROLS


Essential characteristics & benefits of The Cloud: On-demand self-service, Rapid Elasticity, Broad Network Access, Resource Pooling, and Measured Service.

Essential characteristics & benefits of The Cloud: On-demand self-service, Rapid Elasticity, Broad Network Access, Resource Pooling, and Measured Service.

There are many more benefits to Cloud Computing, including: Flexibility, Capital Cost Control, Access to skilled staff, and Environmental staff. Let’s go into more detail about the ways in which Genomics specifically could benefit from Cloud Computing.

Elasticity, in particular, would be a great advantage for the world of Genomics. This means a researcher can use as many computers as needed to finish an analysis, this makes the research process much quicker. This also gives multiple researchers the ability to contribute to the same research project and share data effortlessly. Another perk to ‘elasticity’ is it allows the user to rent resources and only pay for what actually gets used. Cloud resources are rented in virtual slides called ‘instances.’ Providers advertise a menu of instance types with their capabilities listed: amount of disk space, processor speed, amount of memory, etc.

A container is a standard unit of software that packages up code and all of its dependencies so the application runs quickly and reliably from one computing environment to another. The fact that containers are made freely available for download saves a lot of time..especially for the researcher who otherwise could spend upwards of a year to develop genomic analysis.

cloudcomput4.jpg

Even with all the positives associated with transferring to The Cloud there are still security concerns that are holding back organizations in the Genomics industry from making the switch. In fact, a lot of them are still relying on the use of HPC (High Performance Computing) instead of the much more storage friendly and convenient cloud computing. My main area of focus in regards to The Cloud is security and data privacy. I go into depth in these previous blogs on why protecting genetic data is so important: Consumer Genetic Testing & Privacy Concerns Part 1 and Part 2. Another topic it’s important to touch on when discussing Genomics and data protection is de-identification and the reality of re-identification. I go into depth on this topic in this blog.


Some Top Threats to Cloud Computing via the CSA (Cloud Security Alliance)

  • Data breaches

  • misconfiguration-inadequate change control

  • lack of cloud security architecture & strategy

  • account hijacking

  • insufficient identity, credential, access & key management


One of the benefits to the cloud is also one of the same things that increases its security concerns. That is the offering of broad network access, that same accessibility that comes with a broad network is also one of the things that makes it more accessible to malicious actors.

Something else to consider: When we talk about Genomic data it is mostly being analyzed in a research or healthcare environment. On average healthcare institutions only spend about 2-3% of their budget on information technology…and more specifically only 0.19% of their overall budget on security! For comparison, financial institutions spend over ten times that (33%).

Cloud Computing is just going to keep becoming more popular and in the process it will continue to change how organizations manage computational resources, not to mention changing the convenience and collaboration for Scientists involved in research.

In the next part of this series we will go into more depth about security concerns and dive into the laws and regulations that apply to GENOMICS in The Cloud (e.g. dbGaP. HIPAA, GINA, etc.)

My Case Study of last year's Ambry Genetics Attack...

The Genetic Connection of COVID-19....