An analysis of survey data to predict how likely is a specific patient to get either one or both the H1N1 and seasonal flu vaccines based on several factors.
As part of a competition on Drivendata.
Vaccines provide immunization for individuals, and enough immunization in a community can further reduce the spread of diseases through "herd immunity."
In the beginning in spring 2009, a pandemic caused by the H1N1 influenza virus, colloquially named "swine flu," swept across the world. Researchers estimate that in the first year, it was responsible for between 151,000 to 575,000 deaths globally.
A vaccine for the H1N1 flu virus became publicly available in October 2009. In late 2009 and early 2010, the United States conducted the National 2009 H1N1 Flu Survey. A phone survey was conducted asking respondents whether they had received the H1N1 and seasonal flu vaccines, in conjunction with questions about themselves. These additional questions covered their social, economic, and demographic background, opinions on risks of illness and vaccine effectiveness, and behaviors towards mitigating transmission. A better understanding of how these characteristics are associated with personal vaccination patterns can provide guidance for future public health efforts.
Analysing available data to create a machine learning model that generates predictions about wether a patient is likely to get H1N1 or seasonal flu vaccine, or both.
The data comes from the National 2009 H1N1 Flu Survey (NHFS).
The National 2009 H1N1 Flu Survey (NHFS) was sponsored by the National Center for Immunization and Respiratory Diseases (NCIRD) and conducted jointly by NCIRD and the National Center for Health Statistics (NCHS), Centers for Disease Control and Prevention (CDC). The NHFS was a list-assisted random-digit-dialing telephone survey of households, designed to monitor influenza immunization coverage in the 2009-10 season.
The target population for the NHFS was all persons 6 months or older living in the United States at the time of the interview. Data from the NHFS were used to produce timely estimates of vaccination coverage rates for both the monovalent pH1N1 and trivalent seasonal influenza vaccines.
National 2009 H1N1 Flu Survey Public-Use Data File Readme
- Python
- Jupyter Notebook
- Pandas
- Numpy
- Matplotlib
- Scikit-learn
Use the tutorial provided.
The datasets can be found here

