Skip to content

GenTech2025/affymetrix-microarray-data-analysis

Repository files navigation

Project Overview

This project reanalyzes the affymetrix microarray data of a study titled The genomic response of the retinal pigment epithelium to light damage and retinal detachment which was originally conducted on 2008. The goal of the study was to understand the transcriptional changes that takes place in the Retinal Pigment Epithelium to occular light damage. However, it is worth noting that the goal of the study was not the motivation for this project but rather this project was done in a demonstrative way to understand and learn the process of reliably and meaning fully analyze and interpret affymetrix microarray data.

Reproduced Results

Quality Control plots

Before normaization and background correction

1) Density Plot for all six samples

Ideally good quality affymetrix microarray samples would have similar to identical log intensity peaks for all samples, However from the plot above its evident that two out of the six samples have significantly different peak height as compared to the rest of the samples.

2) Box Plot of all six samples

The boxplot above shows the distribution of expression values of different samples It can be evidently seen from the mean for all the samples vary from each other, however ideally the members of the same replicate group should have similar mean but in the plot it can be seen that DARCR1 and LDRR1 have similar means despite being from different replicate groups and their mean differ significantly from the other members of their respective replicate groups.

3) MA Plot before normalization

After Normalization and Background correction

1) Box plot of normalized expression values

Using the robust multiarray analysis method implemented in the rma() function of affy package the data was normalized. Comparing this plot to the pre normalized box plot it is evident that all the samples now have comparable means which will allow us to perform unbiased differential gene expression analysis.

2) MA Plot after normalization

Clustering and PCA

1) Principal Component Analysis plot

From the PCA plot its is evident that the within group variance between the first members of the replicate groups was greater than the between group variance which raised further questions regarding the quality of the samples of the first members of the replicate groups. Further more in order to confirm our suspicion we procedded with heirarchical clustering.

2)Heirarchical Clustering

The heirarchical clustering confirmed our suspision that there was something wrong with the samples DARCR1 and LDRR1 and hence the quality control step was revisited once again to uncover the reason behind the clustering of these two samples from different replicate group.

Comprehensive Quality Control

The simpleaffy package was used to perform a more comprehensive quality control procedure.

However it is worth noting that the simpleaffy package is an older package that was removed from Bioconductor in version 3.13. To use this package, earlier versions of Bioconductor need to be installed. However, this can be challenging, as the latest version of R (4.4.2 at the time of writing) is not compatible with earlier versions of Bioconductor, which require R versions 4.0.x. This issue can be addressed by using the R Installation Manager (rig) on Ubuntu to install earlier versions of R. Additionally, the renv package in R can be used to manage earlier versions of Bioconductor without interfering with the central package repository.

1) RNA degradation plot

This RNA degradation plot shows the steep slope of all the samples in the dataset but despite that two of the samples within the sample have a more steeper slope both of which is the first member of the two replicate groups.

2) QC Statistics plot

The QC statistic plot clearly shows that the present call percentages for all the samples in the dataset are very low. However, the samples DARCR1 and LDRR1 have much lower present calls relative to the other members of their respective replicate groups.

This indicates that while RNA degradation occurred in all the samples of the dataset, the first members of the two replicate groups were heavily affected by RNA degradation. This resulted in them having a similar transcriptional profile despite belonging to different replicate groups.

It is worth noting that the original study did not present any plots in the publication or reported any low quality samples however as we will see later that this might explain the reasons that the differential expression results are not completely reproducible. As only three out of the five genes that were reported to be significantly upregulated were found to be significantly upregulated when p-value is adjusted for multiple testing

Differential Gene Expression Analysis

Volcano Plot

Out of the four genes (Mmp3, Serpin a3n, Serpin b1a, Osmr) reported in the original study, three can be seen in the volcano plot, which validates the reproduction of previous work. Additionally, the comprehensive quality control analysis explains why the samples from different replicate groups exhibited similar transcriptional profiles.

About

Part of Functional Genomics Technology Course

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages