The Rensselaer Health Informatics Challenges in Technology Education (INCITE) Pipeline recruits and prepares students at Rensselaer and worldwide to be data scientists in healthcare using early data analytics courses and experiential research projects centered on real-world health challenges.
With the advent of electronic healthcare records (EHR) and precision medicine, healthcare increasingly relies on health informatics (HI), the philosophy and tools of data science (DS) and their application in healthcare. We propose innovative, replicable programs that directly expand the HI workforce pipeline at the early undergraduate level for students at RPI and worldwide. The proposal addresses key challenges in attracting and training top talent: a shortage of data scientists, lack of awareness among students of HI careers, and difficulty incorporating reality-driven healthcare projects into curricula due to EHR privacy concerns.
Rensselaer Polytechnic Institute (RPI) has a novel pipeline for undergraduate DS education consisting of an early data analytics course followed by applied DS research experiences on real-world problems. This pipeline results in DS skills and prompts students to pursue further coursework and careers in DS (see attached report). We now propose to build a similar pipeline to recruit and train data scientists for HI careers.
The Health INCITE Pipeline will:
- Produce students skilled in HI.
- Create novel, low-barrier pathways into HI for students from a wide array of majors, including pre-med, biology, biomedical engineering, computer science, and mathematics.
- Enable health informatics education at many institutions by creating shared HI instructional project resources.
- Recruit students to pursue HI careers.
The Health INCITE Pipeline has four components:
- Health INCITE Lab
In the Lab, teams of students tackle open HI problems contributed by industry, research, and foundation partners. Past problems include infant stunting (Gates Foundation), 72-hour Emergency Department Readmissions (Albany Medical Center) and Acute Kidney Injury in Children (HBI Solutions). Instructors coach students along with a 1 hour class on health informatics.
Students can enter the the Lab through biology or mathematics. Students are first introduced to HI in freshman courses (Intro Biology or Art and Science of Math). Students take an entry-level, project based course on DS methods for transforming data to insight (Biostatistics or Intro to Data Math), ensuring their success in the Lab.
- Privacy-Preserving Synthetic Data Generation
We will solicit HI projects for the Lab and courses from our partners. Massive EHR data collections are available publicly or through partners. One barrier is meeting privacy requirements for data in the courses and Lab. Thus a major contribution of this project is the development of a privacy-preserving project data generator. Students work on task-specific simulated data produced by data generators trained with real data using recent techniques in machine learning. The solutions developed can be tested on real data in secure environments. The educational benefits of synthetic data include control over problem difficulty and dataset size, and confidentiality. The published data generators will enhance HI education and research at many institutions.
- Online Challenges to Scale HI Experiential Learning
Publicly available HI challenges offer thousands of students motivation to engage in open HI problems. RPI and Chalearn.org will develop two challenges per year, creating a library of didactic HI problems for educational use. We will offer annual open challenges to engage the community at large with an associated workshop and prizes. Challenges combined with generated data will enable scalable HI experiential learning at RPI and elsewhere.
Links to Health INCITE-related Activities:
- Rensselaer Datathon (April 2018)
- Cognitive and Immersive Data Insights Application Challenge (June 2018)