Open Source Healthcare Datasets
Finding healthcare data to practice with and build your skillset
Are you a health informatics enthusiast looking to enhance your skills and explore real-world healthcare data? In this blog post, we'll introduce you to a collection of open source healthcare datasets that can help you practice, analyze, and develop valuable insights. Whether you're interested in social determinants of health (SDoH), mental health, substance use disorders, or other healthcare domains, these resources will broaden your horizons.
GitHub Repository
For easy access and convenience, we have compiled all the links to these healthcare datasets and resources in a GitHub repository. You can visit the repository to explore and discover more about each dataset and resource.
Feel free to explore these datasets, resources, and tools to enhance your understanding of healthcare data and develop innovative solutions in the field of health informatics.
Assessments
- PhenX Toolkit: A comprehensive collection of standardized assessment tools for various health domains, including SDoH, mental health, substance use disorders, and more.
- CMS Quality Measure Inventory: Explore a wide range of quality measures used by the Centers for Medicare & Medicaid Services (CMS) to assess healthcare performance.
- AHRQ Health Tech Assessments: Access a compendium of surveys and assessments related to health information technology from the Agency for Healthcare Research and Quality (AHRQ).
- AHRQ Time Motion: Dive into a database of time and motion studies focused on healthcare processes and workflows.
Unique Open Source Datasets
- Gun Violence Archive: Analyze comprehensive data related to gun violence incidents, including location, date, victims, and more.
- Social Capital: Explore datasets related to social determinants of health, offering valuable insights into the impact of social factors on population health outcomes.
- NY SDoH Resources: A curated list of resources and data specific to social determinants of health in New York.
- All of Us (NIH): Gain access to diverse datasets encompassing genetics, patient-reported outcomes, environmental factors, social determinants of health, and more.
- MIMIC: A dataset offering ICU-like data, ideal for research and analysis in critical care settings.
- CMS Open Payments: Explore the financial relationships between healthcare providers and manufacturers in the United States.
- CMS Medicare Claims PUF: Publicly available Medicare claims data that provides insights into healthcare utilization, costs, and more.
- MTSamples: A collection of text samples for natural language processing (NLP) tasks in healthcare, including medical transcription examples.
- Healthdata.gov DATAJAM Curated Datasets: A curated selection of datasets covering SDoH, care access, Lyme disease, COVID-19 equity, and more.
Feel free to explore these datasets and leverage them for research, analysis, and building your skillset. To provide you with a quick overview, here's a table summarizing the datasets:
Dataset | Data Type | Description |
---|---|---|
PhenX Toolkit | Assessments | Standardized assessment tools for various health domains |
CMS Quality Measure Inventory | Assessments | Measures used by CMS to assess healthcare performance |
AHRQ Health Tech Assessments | Assessments | Surveys and assessments related to health information technology |
AHRQ Time Motion | Assessments | Time and motion studies focused on healthcare processes and workflows |
Gun Violence Archive | Unique | Comprehensive data related to gun violence incidents |
Social Capital | Unique | Datasets related to social determinants of health |
NY SDoH Resources | Unique | Resources and data specific to social determinants of health in New York |
All of Us (NIH) | Unique | Diverse datasets encompassing genetics, outcomes, social determinants, and more |
MIMIC | Unique | ICU-like data for research and analysis in critical care settings |
CMS Open Payments | Unique | Financial relationships between healthcare providers and manufacturers in the US |
CMS Medicare Claims PUF | Unique | Publicly available Medicare claims data |
MTSamples | Unique | Text samples for NLP tasks in healthcare |
Healthdata.gov Curated Datasets | Unique | Datasets covering SDoH, care access, Lyme disease, COVID-19 equity, and more |
KEGG | Tools and Codexes | Tools and databases for drug, genomic, disease, and more |
MONDO | Tools and Codexes | Disease mappings and ontologies |
Biome | Tools and Codexes | Disease-related data and information |
Government Data US | Tools and Codexes | Open source government datasets focused on healthcare and more |
SNOMED Searchable | Tools and Codexes | Searchable database of SNOMED CT codes and concepts |
Athena | Tools and Codexes | Relational search tool for healthcare terminology and concepts |
US Government Agencies | Tools and Codexes | Tools and resources from US government agencies (HHS, CDC, etc.) |
Global Health Data Exchange | Tools and Codexes | Aggregator of health data from various sources |
ACRDSI | open ML | Radiology-related datasets and resources |
The Algorithms | open ML | Collection of machine learning algorithms |
Movement | open ML | GymLytics for movement-related pose/body estimation |
Bed Position | open ML | Multimodal in-bed pose estimation for bed position tracking |
Instructional | open ML | Comprehensive guide on human pose estimation with examples and tutorials |
Summary of Libraries | open ML | Top and best computer vision human pose estimation projects |
GPT for Pubmed | open ML | NLP model for Pubmed data analysis and research |
These open source datasets will serve as valuable resources to enhance your understanding of healthcare data and develop innovative solutions in the field of health informatics. Happy exploring!
Note: It's always important to review the terms of use and data licensing agreements associated with each dataset before use.
Last updated: 2023-07-05