Individual Analysis Project for GEO448: Spatial Data Science at DePaul University
Highlights: Spatial Join, Spatial Clustering, Spatial Outlier Detection, Agglomerative Clustering
Analysis explores the spatial relationship between the HIV and COVID-19 in California at the county level. The analysis focuses on analyzing new infection rates between the two epidemics and its impact on ethnic and minority groups. Through spatial clustering and outlier detection techniques, specific areas in California were determined to be more vulnerable to HIV and/or COVID-19. LISA interactive map was created to show significant clusters affected by both infection rates and social vulnerabilities. Agglomerative clustering was performed showing areas affected by higher social vlunerability related to ethnic and minority status.
- HIV Data: 3 datasets were used for the analysis which came from AIDsVu, a partnership between Gilead Sciences Inc., and the Center for AIDS Research at Emory University. The datasets include: 2020 National New Infections, 2020 National Prevalance, and 2020 National PrEP.
- COVID-19 Data: 2 datasets were used for the analysis which came from the California Department of Public Health Open Data Database. The datasets include: Statewide COVID-19 Cases Deaths Tests and COVID-19 Vaccine Progress.
- Shapefiles: 2020 Social Vulnerability Index (SVI) for California by county comes from the CDC containing census tracts and spatial geometry data.
Methodology:
- Data Preprocessing
- Exploratory Data Analysis
- Exploratory Spatial Data Analysis
- Agllomerative Cluster Analysis For this analysis, rates insteas of case counts provided more beneficial insights due to the volume of HIV and COVID-19 cases.
Summary of Results:
Significant spatial clusters and outliers appear in different areas for HIV infection rate compared to COVID-19 infection rate. Significant clusters with HIV infection appear in counties in Southern California such as Los Angeles, Ventura, Orange, San Bernadino, Riverside, and San Diego as well as Bay Area counties including Marin, Contra Costa, and Alameda. In contrast, for COVID-19 infection rate, clusters appear in Central California including Kern, Tulare, Madera, and Mariposa counties. The analysis shows that there are no significant spatial clusters between the two epidemics. Moreover, there is no interaction spatially between the two. However, significant clusters do appear in the exact same counties for racial and ethnic minority status as HIV infection, showing that there is significant overlap spatially between these two. Moreover, by applying spatial weights, key areas in California can be visualized and determiend as experiencing a higher social vulnerability regarding racial and ethnic minority status, socioeconomic status, and public health epidemics.