Biodata: Its Collection, Its Usage, and the Questions We Should Ask


Take a look at that seemingly innocuous little Health app on your phone. The pink heart and clean graphs elicit a sense of safety and sterility, the kind of feeling you would expect from a good healthcare provider. And yet, the longer you spend clicking through the app, the more information you will see being collected about your basic, everyday activities.

Not only does your phone track your steps, it also tracks what time of day you took those steps, maybe where you took those steps, and, if you have a particular smartwatch, possibly even how rapidly your heart was beating when you took each of those steps. Seems like overkill, doesn’t it? Data about you is being collected by things around you and on you, all the time. Sometimes, we consciously choose to give away our data, consenting to the exchange of our information for a service in return. But other times, we are unaware of exactly how much data is being collected about us, and more importantly, how that data is being used, sold, shared, and exploited. Your phone’s activity app is not simply counting your steps. With its location tracker, clock, heart rate monitor, and more, your cell phone can provide a very comprehensive description of your activity and behavior throughout the course of a day. As participants in a modern, technology-based world, asking questions and probing for answers about what data is being collected, where it is going, and who is using it becomes increasingly important. This is particularly true when that data changes from just being about your interests to potentially including sensitive information about your health and well-being.

The “Digitization of Biology”

Double helix and gene segments
Gene sequencing has been rapidly improving for rate and accuracy over even just the past decade. The ability to quickly produce massive amounts of data about individuals carries its own range consequences. (source)

Wave after wave of new technology for biology research and medical care have led to what some are calling the “digitization of biology.” The ways in which research and care are conducted are changing dramatically as processes like gene sequencing, protein analysis, associated function analysis, and more become increasingly commonplace. Genetic information and related metadata are being collected in both hospitals and research labs, and this information has been crucial to the development of massive technological structures that work with this data. These large analytic platforms have been used in a variety of ways. Many times, their correlation and categorization-based algorithms provide additional support to causal relationships suggested by observational studies and biological models, which allow us to exploit biological mechanisms by understanding them at a macro scale. Within representative sample sets, these algorithms can also be extremely predictive regarding disease states, genetic composition, and the potential success of specific treatments. These softwares have been a large part of modern research, as the increasing amounts of data allow for improved predictive capacity, and provide a far more holistic understanding than could ever be produced manually by individual doctors.

But these massive data sets are simultaneously very concerning. There are situations where data can be hacked and traced back to individuals, meaning the privacy of their genetic information may be compromised. In addition, the fact that these algorithms can be predictive of new health trends and patterns makes them liable to be misused very quickly, as such predictions are very valuable for many groups; this includes doctors trying to make the best choice for your health, as well as insurance companies trying to make the lowest payment for your health. These concerns have been raised and partially addressed, most importantly by the Health Insurance Portability and Accountability Act (HIPAA), which sets clear privacy standards for individuals’ medical information, and the Genetic Information Non-discrimination Act (GINA), which protects individuals from genetic disc rimination in health insurance and employment. Two policies alone will not be enough to protect everyone, especially given how rapidly technology is advancing, but they are crucial first steps in protecting patients from blatant misuse of their health records.

The Biodata of the Individual

Other forms of “biodata,” or biologically-based data points, are collected from individuals constantly, even outside of healthcare settings. The smallest information that you give technology access to becomes a part of the data set that you produce, and it is collected by the companies that provide the mechanisms for data collection. Information about every aspect of your health can be collected: your heart rate and exercise habits from an activity tracker, the environmental and air quality from your weather app and location services, your menstrual cycle from a period tracker, your eating habits from a calorie counter, and even your friends’ eating habits, based on their connections to you and their locations and restaurant check-ins. This can be good! Data provides information, even if the analysis of the data is simply done with our own minds, so we can make better informed, more educated decisions based off of it. In a simplified example, if your heart rate suddenly spikes after going up a flight of stairs, and you know that this is abnormal for you based on your previous heart rate information, you can choose to go to the hospital to determine if something is wrong, rather than miss a potentially serious condition. This collected data can also provide information on a larger scale, similar to the way that data in healthcare is useful. An example close to home is research being done here at UC San Diego, using an extensive data set from Fitbit. The idea is to create an extremely comprehensive model of disease pathologies, tying together individuals’ health with their eating habits, their virtual social activities, and their activity levels as determined by their Fitbit. Pulling together these millions of data points could lead to a better understanding of lifestyle patterns and pathologies, which can then begin to play a huge role in preventative, predictive, and personalized medicine.

Ghostery Report: Wall Street Journal
Attempts to anonymize data have become a focus recently (source).

However, this collection of personal data can be hugely problematic. The initial process of data collection itself is extremely opaque–very rarely do we truly know how much information our devices are gathering. Information that you may or may not have consciously consented to give away may tell a much bigger story about you than you might expect. Data can always be, and often is, sold off to third-party companies, who are then able to do almost anything they want with that information. This can be something as relatively straightforward as advertising, or as nefarious as putting your information up for sale to more suspicious buyers.

This can seem extreme, but it does happen–even the Fitbit study mentioned earlier is a culprit. The data they are providing UC San Diego is private user data, probably information that Fitbit users had expected to benefit only them and was thought to be intended for personal use only. But this sharing of data with UC San Diego, the third-party, makes FitBit more powerful: not only do they have a monopoly over the data, they also have a say in producing, analyzing, and sharing the information that comes from the UC San Diego study, meaning crucial details might be hidden or misused, based on the company’s interests at the time.

The world is benefitting from the regulations put in place by the European Union in a reformative measure known as the General Data Practices Regulation (GDPR), which puts in place protective measures regarding how your data is handled in multiple industries, ranging from healthcare to banking to social media.

Consumer Data Right
Establishment of GDPR in European countries lead to safer privacy policies for thousands of companies–many of which may have flooded your inbox, informing you of those changes. (source)


Users are now able to demand access to their data and can require companies to take it out or edit it for inaccuracies, which is far more than we could do before, and this is because almost all major companies have physical servers in Europe, forcing them to comply with GDPR terms. Like other regulatory state policies, the GDPR offers increased access to one’s own data, but is by no means a complete regaining of control of our information. However, it is part of the crucial initial movement of consumers better understanding the way we participate in the digital world, and how the material we share with our service providers may be used for or against us.


Looking to the Future:

As our biological data continues to be digitized, we must raise questions about what our future looks like, and who will be able to use, and possibly abuse, our information. The legal push for individuals to reclaim their data has been immensely important, but there is always more that needs to be done. As long as our freely shared data can be monetized by companies, the threat of exploitation persists. Continuing to educate ourselves about the digital world will hopefully allow us to make more informed choices, and perhaps even re-engineer the way our information is collected and purposed.



  • Feature Image