The Food Standards Agency (FSA) is a non-ministerial government body that is responsible for food safety and hygiene in England, Wales and Northern Ireland. The FSA gathers information on public food practices by conducting surveys. The Food and You Survey (F&Y) is a flagship biennial study that explores the public's attitudes, knowledge and behaviour relating to food safety and production. Data is analysed and used to compile publicly available reports.
Wales is demographically distinct to the rest of the UK, being less populous, more deprived, with a larger rural population. For this reason, the FSA was particularly concerned that its F&Y engagement activities do not reach some of the demographic groups in Wales, who may have associated food risks. The following specific questions were posed:
- Does F&Y survey sampling reflect the true demographic profile of Wales?
- What food risks are associated with undersampled groups, and what is their understanding of food labelling?
- Are there any relationships between behaviours related to food safety, and can we predict food risk for specific groups or individuals?
James Doherty - Personal GitHub 
Lorena Garcia-Perez - Personal GitHub 
Charlie Jeynes - Personal GitHub 
Mishka Nemes - Personal GitHub
Science to Data Science Virtual - 23rd of March to 24th of April 2020 
Hosted and organised by Pivigo
Food and You (F&Y) survey 2010-2018: Food and You Waves 1-5 Data, csv file, dated on 09 September 2019 
Food and You (F&Y) survey guide: Select Wave 1 to 5 - Data user guide 
Census 2011 - microdata with individual entries: For data download (isg_regionv2.csv), an account needs to be created here
- includes everything required to run the dashboard (the .ipynbfile, together with a requirements.txt and the F&Y survey.csvdata files). Instructions about which settings to use to run the app from the https://mybinder.org/ website are provided in a.pngfile. In order to have this app available for anyone online, the relevant files must be localted in a public repository.
- microdata_census2011_Wales_prepared.csvhas all the relevant demographic data from the 2011 Census. This includes only the Wales entries for 7 out of 120 original demographics.
- survey.csvincludes F&Y survey data from waves 1-5, for Wales, England and Northern Ireland
- survey_guide_values.csvincludes data to be parsed in the dictionary that translates answers names
- survey_guide_variables.csvincludes data to be parsed in the dictionary that translates question names
- S2DS-2020_FSA_Wales_presentation.pdf- project presentation for the S2DS programme, presented on 23rd of April 2020
- S2DS-2020_FSA_Wales_case_study.pdf- executive summary of the challenge, approach, findings, impact and recommendations
- Graphs within the .pdffiles starting withfoodBehaviour_can also be visuialized in the dashboard provided here. A greater variety of graphs related to food behaviours by demographics can be visualized on demand by using the dahboard.
- All other .pdffiles contain graphs that can be plotted only by running the notebook. These are exhaustive in light of the current data.
- masterscript_with_markdown.ipynbincludes all the code developed for the project. For details, see below.
Data loading, data wrangling and data analysis are carried out in the notebooks/masterscript_with_markdown
F&Y survey - values encoded as 'Not applicable' or 'Not known' were encoded as NaN  - apart from the principle component analysis where the data was kept in its original state.
Census - given the higher granularity of the data, data was aggregated to reflect the answer labels in F&Y in order to allow direct comparison. There were no missing values as all demographics were provided for each respondent.
Two dictionaries were built. The first one translates question names from their short version to their longer, comprehensible, version. The second is a nested dictionary that translates individual answers to each question from their numeric code to a meaningful answer. Their input data is provided in this repository.
custom_barplots is a custom plotting function that outputs horizontal barplots with the percentage of people giving a certain answer, and 95% confidence intervals error bars. The names for each of the relevant questions and answers are displayed automatically for each plot title, axis labels and legend thanks to the use of the two dictionaries built. 
custom_lineplots is a custom plotting function that ouputs lineplots showing the temporal evolution of the F&Y survey demographics, for Wales, England and Northern Ireland (whose results appear side-by-side, for ease of comparison between the trends for these UK countries). 95% confidence intervals error bars are also displayed, together with the number of respondents (n) and the specific percentage represented by each category. The names for each of the relevant questions and answers are displayed automatically for each plot title, axis labels and legend thanks to the use of the two dictionaries built.
Principal Component Analysis (PCA) is used to explore the raw data in order to understand global patterns present within the whole F&Y dataset for Wales.
A timeline of the evolution of the F&Y survey demographics is plotted using custom_lineplots.
Demographic variables (age, gender, marital status, religion, health status, work status, deprivation) are compared between the F&Y survey and the census using custom_barplots.
Demographic variables are also taken into consideration for the analysis of questions of interest related to food safety, using the F&Y survey data and custom_barplots.
To evaluate the significance of the differences under study, chi square statistical testing is carried out (being the survey and census datasets non-parametric).
Correlation analysis is performed on the F&Y survey data to identify which features (questions and their respective answers) correlate the most, positevely or negatively, with the risk of suffering food poisoning. A preliminary precitive model is also developed, which informs again about the set of the features most relevant for determining food poisoning risk. To fully assess the predictive capability of this model, further work is required.
The dashboard components can be executed within the notebook, where further instructions are included.
In order to access the dashboard online:
- go to Binder
- select GitHub under GitHub repository name or URL and insert the appropiate path name (for example, my-repository/app) for the remote open repository where the dashboard code is hosted (such as theappfolder in here).
- select URL under Path to a notebook file (optional) and insert voila/render/Food-and-You-survey_risks.ipynb