Skip to main content

1.2 Data Licensing & Ethics in Health Informatics

Protecting patient privacy is paramount in the realm of health informatics. While patients contribute a wealth of data to the healthcare ecosystem, it's important to recognize that actual ownership of their data is often limited, particularly when it comes to data generated within Electronic Health Records (EHRs). The complex interplay between patient rights, data ownership, and the healthcare providers' roles underscores the intricate landscape of data licensing.

Synthea EMR - Medication Example

Patient Privacy and HIPAA Compliance

Adhering to regulations like the Health Insurance Portability and Accountability Act (HIPAA) is essential. These standards establish guidelines for the confidentiality and security of healthcare information, promoting patient trust in the digital age.

In addition to HIPAA, several other regulations and standards dictate the proper handling of health data:

  • General Data Protection Regulation (GDPR): If you're dealing with health data from individuals in the European Union, GDPR sets stringent requirements for data protection, transparency, and consent.

  • Health Level Seven (HL7): HL7 standards facilitate the exchange and integration of health data between different systems. Adhering to HL7 standards ensures interoperability and consistency in healthcare data sharing.

  • Clinical Laboratory Improvement Amendments (CLIA): CLIA regulations govern clinical laboratory testing and require laboratories to meet quality standards to ensure accurate and reliable results.

Case Study: Protecting Patient Privacy in Health Research

One notable case study in health data ethics is the partnership between Apple and the Stanford University School of Medicine. The Apple Heart Study, conducted using the Apple Watch, aimed to identify irregular heart rhythms and potential signs of atrial fibrillation. The study involved collecting heart rate data from participants. To ensure ethics and privacy, Apple and Stanford University took several measures:

  • Informed Consent: Participants were fully informed about the study's purpose, methodology, and data collection process before enrolling.

  • Data Security: Data was encrypted during transmission and storage to protect participants' privacy.

  • Anonymization: The collected data was anonymized before analysis, making it impossible to identify individual participants.

  • Opt-in Participation: Participants had the choice to opt in, and they could withdraw from the study at any time without consequences.

Acquiring informed consent from patients before utilizing their data is a cornerstone of ethical data usage. Anonymizing data, removing personally identifiable information, further safeguards patient privacy and confidentiality.

  • Informed Consent Process: Prior to using any health data, ensure that individuals have been informed about the purpose of data usage, potential risks and benefits, and how their privacy will be protected. They should provide explicit consent before their data is used.

  • Sensitive Data: For sensitive health data, such as genetic information or mental health records, the consent process should be even more rigorous to address potential ethical concerns.

Data Anonymization Techniques

Anonymizing data involves removing any information that could identify individuals while preserving the data's utility. Techniques include:

  • De-identification: Removing all direct and indirect identifiers from the dataset, making it nearly impossible to link the data back to specific individuals.

  • Aggregation: Combining data from multiple sources to create larger datasets, which prevents the identification of individuals.

  • Pseudonymization: Replacing identifiers with pseudonyms or codes, making it difficult to identify individuals without access to a key.

Public clouds offer services that can automatically remove or redact PII/PHI, like Amazon Comprehend PII Redaction from AWS, the Azure Ai Langauge Service from Microsoft, or Googles Sensitive Data Protection redaction service.

Case Study: De-Identification in Health Research

The Minnesota Population Center (MPC) offers a case study in data anonymization and de-identification. They provide public access to large-scale survey data, such as the Integrated Public Use Microdata Series (IPUMS), while ensuring individual privacy. The MPC follows a rigorous process:

  • Data and Source Code: IPUMS

  • Data Transformation: Identifiers like names and addresses are removed or replaced with codes.

  • Statistical Disclosure Control: Applying statistical methods to ensure that even indirect identifiers are not disclosed inadvertently.

  • Data Use Agreements: Researchers accessing the data sign agreements that outline how the data will be used and the importance of preserving privacy. For more information on DUAs, please visit the resources section.

  • Monitoring: The MPC continuously monitors data use to prevent re-identification.

Open Health Data vs. Proprietary Data

The data landscape in healthcare offers a diverse spectrum of openness, ranging from open-source datasets to proprietary ones. Understanding the nuances of both open and proprietary data formats is crucial for researchers and data analysts, as each type comes with its own benefits, challenges, and legal considerations.

Open Health Data

Open-source health datasets are characterized by their accessibility to the public without restrictive barriers. These datasets are a valuable resource for researchers and institutions seeking to collaborate, innovate, and collectively address complex health challenges. The importance of open health data lies in its ability to foster transparency, reproducibility, and widespread knowledge dissemination. Researchers can scrutinize the data, replicate findings, and contribute to the advancement of medical science.

Benefits of Open Health Data:

  • Collaboration: Open datasets facilitate collaboration among researchers from different domains and institutions, enabling multidisciplinary approaches to healthcare research.
  • Innovation: Open data encourages innovation by providing a foundation for the development of new algorithms, models, and tools that benefit public health.
  • Transparency: Open datasets promote transparency in research methodologies and findings, contributing to the credibility and trustworthiness of research outcomes.
  • Data Validation: Public access to datasets allows for independent validation of results, reducing the risk of biased or erroneous conclusions.

Challenges of Open Health Data:

  • Privacy Concerns: Open datasets must be carefully de-identified to protect individual privacy. Balancing data utility with privacy preservation is a challenge.
  • Data Quality: Ensuring the quality and accuracy of open datasets can be challenging, as there may be limited quality control mechanisms in place.

Proprietary Data

On the other end of the spectrum are proprietary datasets, which are owned and controlled by individuals, organizations, or corporations. These datasets often come with considerations related to licensing, access restrictions, and commercial interests.

Benefits of Proprietary Data:

  • Control: Data providers have control over who can access and use the data, allowing them to protect sensitive information or maintain commercial interests.
  • Monetization: Proprietary data can be a source of revenue for data providers, particularly when licensing agreements are established.
  • Quality Control: Proprietary data providers have the ability to implement stringent quality control measures to ensure data accuracy and reliability.

Challenges of Proprietary Data:

  • Limited Collaboration: Proprietary datasets may limit collaboration and hinder the potential for cross-disciplinary research.
  • Access Barriers: Access to proprietary data can be restricted by licensing fees, usage agreements, or non-disclosure agreements, limiting the ability of researchers to analyze and share the data.
  • Data Compatibility: Combining proprietary datasets with other data sources can be challenging due to differences in data formats, structures, and privacy policies.

The legal complexities arise when merging open and proprietary data sets. Integrating these datasets requires careful consideration of licensing agreements, data ownership, and compatibility. Combining open and proprietary data can lead to questions about how intellectual property rights, usage rights, and data sharing agreements intersect.

Scenario: Suppose a researcher aims to analyze health outcomes by merging an open clinical trial dataset with proprietary electronic health records (EHR) data. The researcher must navigate the legal aspects of data ownership and usage rights of both datasets. Additionally, the compatibility of data formats and privacy policies becomes a critical concern.

Considerations:

  • Licensing: Researchers must ensure compliance with the terms of use and licensing agreements of both datasets.
  • Data Ownership: Clarifying who owns the data and any associated intellectual property rights is crucial.
  • Data Sharing Agreements: If the proprietary EHR data involves patient information, data sharing agreements and patient consents may apply.
  • Privacy: Merging data from different sources may require thorough de-identification and privacy risk assessment to protect individuals' privacy.

Researchers and data analysts contemplating the merging of open and proprietary health data should seek legal counsel, review licensing agreements, and engage in transparent communication with data providers to navigate the legal complexities responsibly.

By understanding the significance of both open and proprietary health data formats and being mindful of the legal implications when merging the two, researchers can harness the benefits of each format while upholding ethical and legal standards.

Ethical Considerations in Genetic and Biological Data

The field of bioinformatics presents a landscape rich with potential but also fraught with intricate ethical challenges. As advancements in genetic and biological data collection and analysis accelerate, the ethical implications of these technologies become increasingly significant. Addressing these considerations is paramount to ensure responsible and respectful data usage.

Direct-to-consumer genetic testing has democratized access to personal genetic information, empowering individuals to explore their ancestry, genetic traits, and potential health risks. However, the convenience of these tests is accompanied by ethical debates around consent and data usage. Can genetic data collected for one purpose be ethically utilized for other purposes, such as research or commercialization?

Ethical Dilemma: The question arises as to whether genetic data collected by consumer testing companies can be resold to pharmaceutical or research companies without explicit and informed consent from the individuals providing the genetic material.

The Henrietta Lacks Case: An Ethical Landmark

The story of Henrietta Lacks stands as a poignant reminder of the ethical complexities in genetic and biological data usage. Henrietta Lacks, an African American woman, had her cervical cancer cells collected without her knowledge or consent in 1951. These cells, known as the HeLa cell line, exhibited unprecedented growth and became invaluable to medical research, leading to numerous scientific breakthroughs, including the development of vaccines and cancer treatments.

The case of Henrietta Lacks underscores several ethical dimensions:

  1. Informed Consent: Henrietta Lacks' cells were taken without her informed consent, raising questions about autonomy and the right to control one's biological material.

  2. Commercialization and Ownership: The widespread distribution and commercialization of the HeLa cell line generated immense profits for various entities. The Lacks family received no compensation, highlighting issues of fairness and equity.

  3. Benefit and Harm: While the HeLa cells contributed to groundbreaking medical advancements, the case also highlights potential harms to individuals whose genetic material is used without consent.

  4. Cultural Sensitivity: Henrietta Lacks' cultural and racial background adds another layer of ethical complexity, prompting discussions about exploitation and historical injustices.

Reference: Johns Hopkins University - Henrietta Lacks

As genetic and biological data continue to transform healthcare and research, it is imperative to approach these advancements with ethical mindfulness. Clear guidelines on informed consent, data usage, privacy protection, and equitable distribution of benefits are essential to ensure that individuals' rights and interests are respected. Additionally, fostering open dialogue and collaboration between researchers, ethicists, policymakers, and the public can help navigate the intricate ethical landscape of genetic and biological data usage.

Addressing Bias and Fairness in Health Data

Health data serves as a critical foundation for medical decision-making, research, and public health initiatives. However, the presence of biases within these datasets can lead to detrimental consequences, perpetuating health disparities and hindering equitable care. Identifying and addressing bias in health data is crucial for promoting fair and effective patient care.

Biases in Health Data

Biases in health data can stem from various sources, including data collection methods, sample selection, and societal factors. Biases can manifest as underrepresentation or overrepresentation of certain demographic groups, leading to skewed perceptions and potentially harmful outcomes.

Example 1: Gender Bias in Heart Disease Research

Historically, heart disease research primarily focused on male subjects, leading to a gender bias in diagnostic and treatment protocols. This bias resulted in many women being misdiagnosed or receiving suboptimal care, as their symptoms were often different from those typically observed in men. This gender bias has had adverse consequences on women's cardiovascular health and overall well-being.

Example 2: Racial Disparities in Drug Trials

Racial and ethnic minorities are often underrepresented in clinical trials for new drugs and treatments. This underrepresentation can lead to a lack of data on how certain medications affect different populations. As a result, medications may not be equally effective or safe for all racial and ethnic groups, exacerbating health disparities and limiting treatment options for marginalized communities.

Example 3: Socioeconomic Status and Data Accessibility

Health data biases can also stem from socioeconomic factors. Patients from lower socioeconomic backgrounds may have limited access to healthcare services and therefore have less data available for analysis. This can lead to skewed perceptions of health outcomes and hinder efforts to address health disparities affecting disadvantaged communities.

Importance of Equitable Representation

Equitable representation in health data is fundamental for ensuring that diagnoses, treatments, and public health policies are effective for all individuals, regardless of their demographic characteristics. Collecting data from diverse populations allows for a more comprehensive understanding of health trends and tailoring interventions to specific needs.

Mitigating Bias and Promoting Fairness

Addressing bias and promoting fairness in health data requires a multi-faceted approach:

  1. Diverse Data Collection: Actively seek diverse representation in data collection to ensure that different demographic groups are adequately represented.

  2. Data Augmentation: Use data augmentation techniques to balance imbalanced datasets, ensuring that underrepresented groups have a stronger presence in the data.

  3. Algorithmic Audits: Regularly audit and assess algorithms used in medical decision-making to identify and correct biases.

  4. Transparency: Make data collection, processing, and analysis methods transparent to allow for external validation and accountability.

  5. Ethical Considerations: Ensure that ethical guidelines are followed in data collection and analysis, respecting individuals' rights and privacy.

By acknowledging biases and striving for equitable representation in health data, we can advance healthcare research and decision-making while promoting fairness and reducing health disparities that have far-reaching implications for patient populations.

Reidentification of De-identified Healthcare Data

As the healthcare landscape evolves, new points of concern emerge, highlighting the complex nature of data privacy and security. Even when data is de-identified to remove personally identifiable information, the risks of reidentification persist, driven by the convergence of diverse datasets.

  • Risks of Data Convergence: Combining seemingly unrelated datasets can lead to unexpected revelations. Open-source health data, along with proprietary or other open datasets, might collectively enable the reidentification of individuals. This phenomenon raises questions about the adequacy of de-identification techniques and the far-reaching consequences of data aggregation.

  • The Challenge of Protecting Privacy: Safeguarding patient privacy goes beyond removing names and addresses. Characteristics like age, gender, medical history, and even zip codes can serve as identifying factors when coupled with external data sources. The ease of access to publicly available information further complicates efforts to ensure anonymity.

  • Ethics and Informed Consent: Reidentification challenges underscore the need for ethical considerations and informed consent. Patients might not anticipate the potential risks when contributing data, making it crucial for organizations to educate individuals about the potential implications of data sharing.


Addressing the reidentification of de-identified data demands a proactive approach that considers not only technical safeguards but also ethical and legal aspects. The evolving landscape of data privacy necessitates continuous vigilance, adaptation, and collaboration to uphold patient rights and data integrity.

Overall, navigating the ethical landscape of health data involves understanding legal frameworks, patient rights, and the potential implications of data usage. As healthcare informatics progresses, it's crucial to approach data licensing and usage with a commitment to patient privacy, transparency, and responsible stewardship of this invaluable resource.