Open Source Healthcare Datasets

Finding healthcare data to practice with and build your skillset


Are you a health informatics enthusiast looking to enhance your skills and explore real-world healthcare data? In this blog post, we'll introduce you to a collection of open source healthcare datasets that can help you practice, analyze, and develop valuable insights. Whether you're interested in social determinants of health (SDoH), mental health, substance use disorders, or other healthcare domains, these resources will broaden your horizons.

GitHub Repository

For easy access and convenience, we have compiled all the links to these healthcare datasets and resources in a GitHub repository. You can visit the repository to explore and discover more about each dataset and resource.

Feel free to explore these datasets, resources, and tools to enhance your understanding of healthcare data and develop innovative solutions in the field of health informatics.

Assessments

  • PhenX Toolkit: A comprehensive collection of standardized assessment tools for various health domains, including SDoH, mental health, substance use disorders, and more.
  • CMS Quality Measure Inventory: Explore a wide range of quality measures used by the Centers for Medicare & Medicaid Services (CMS) to assess healthcare performance.
  • AHRQ Health Tech Assessments: Access a compendium of surveys and assessments related to health information technology from the Agency for Healthcare Research and Quality (AHRQ).
  • AHRQ Time Motion: Dive into a database of time and motion studies focused on healthcare processes and workflows.

Unique Open Source Datasets

  • Gun Violence Archive: Analyze comprehensive data related to gun violence incidents, including location, date, victims, and more.
  • Social Capital: Explore datasets related to social determinants of health, offering valuable insights into the impact of social factors on population health outcomes.
  • NY SDoH Resources: A curated list of resources and data specific to social determinants of health in New York.
  • All of Us (NIH): Gain access to diverse datasets encompassing genetics, patient-reported outcomes, environmental factors, social determinants of health, and more.
  • MIMIC: A dataset offering ICU-like data, ideal for research and analysis in critical care settings.
  • CMS Open Payments: Explore the financial relationships between healthcare providers and manufacturers in the United States.
  • CMS Medicare Claims PUF: Publicly available Medicare claims data that provides insights into healthcare utilization, costs, and more.
  • MTSamples: A collection of text samples for natural language processing (NLP) tasks in healthcare, including medical transcription examples.
  • Healthdata.gov DATAJAM Curated Datasets: A curated selection of datasets covering SDoH, care access, Lyme disease, COVID-19 equity, and more.

Feel free to explore these datasets and leverage them for research, analysis, and building your skillset. To provide you with a quick overview, here's a table summarizing the datasets:

DatasetData TypeDescription
PhenX ToolkitAssessmentsStandardized assessment tools for various health domains
CMS Quality Measure InventoryAssessmentsMeasures used by CMS to assess healthcare performance
AHRQ Health Tech AssessmentsAssessmentsSurveys and assessments related to health information technology
AHRQ Time MotionAssessmentsTime and motion studies focused on healthcare processes and workflows
Gun Violence ArchiveUniqueComprehensive data related to gun violence incidents
Social CapitalUniqueDatasets related to social determinants of health
NY SDoH ResourcesUniqueResources and data specific to social determinants of health in New York
All of Us (NIH)UniqueDiverse datasets encompassing genetics, outcomes, social determinants, and more
MIMICUniqueICU-like data for research and analysis in critical care settings
CMS Open PaymentsUniqueFinancial relationships between healthcare providers and manufacturers in the US
CMS Medicare Claims PUFUniquePublicly available Medicare claims data
MTSamplesUniqueText samples for NLP tasks in healthcare
Healthdata.gov Curated DatasetsUniqueDatasets covering SDoH, care access, Lyme disease, COVID-19 equity, and more
KEGGTools and CodexesTools and databases for drug, genomic, disease, and more
MONDOTools and CodexesDisease mappings and ontologies
BiomeTools and CodexesDisease-related data and information
Government Data USTools and CodexesOpen source government datasets focused on healthcare and more
SNOMED SearchableTools and CodexesSearchable database of SNOMED CT codes and concepts
AthenaTools and CodexesRelational search tool for healthcare terminology and concepts
US Government AgenciesTools and CodexesTools and resources from US government agencies (HHS, CDC, etc.)
Global Health Data ExchangeTools and CodexesAggregator of health data from various sources
ACRDSIopen MLRadiology-related datasets and resources
The Algorithmsopen MLCollection of machine learning algorithms
Movementopen MLGymLytics for movement-related pose/body estimation
Bed Positionopen MLMultimodal in-bed pose estimation for bed position tracking
Instructionalopen MLComprehensive guide on human pose estimation with examples and tutorials
Summary of Librariesopen MLTop and best computer vision human pose estimation projects
GPT for Pubmedopen MLNLP model for Pubmed data analysis and research

These open source datasets will serve as valuable resources to enhance your understanding of healthcare data and develop innovative solutions in the field of health informatics. Happy exploring!

Note: It's always important to review the terms of use and data licensing agreements associated with each dataset before use.

Last updated: 2023-07-05