Skip to content

ejeej/ImmuneSpace_SDY984

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Identification of genes associated with immune response using open database ImmuneSpace

Study SDY984 (immune response after varicella zoster vaccine)

Authors: Olga Mironenko, Shakir Suleymanov

Bioinformatics Institute, Biostatistics and Analisys of Medical Data, Final project

Fall 2022


Introduction


Vaccination leads to the development of the immunity against specific pathogen. This process is associated with the activation of various genes expression and molecular pathways. The quality of vaccination might be assessed by measuring the titer of specific antibodies. However, in some cases, the antibody titer does not increase after vaccination, which may indicate a lack of immunity against the pathogen. Recent advances in transcriptomics technologies allowed scientists and clinicians to collect high-throughput immunological data, for example, in order to conduct controlled vaccinational studies. These methods will help to study the molecular mechanisms of the vaccine-induced immunity, as well as to search for molecular signatures to predict the quality of immunity against particular pathogens.

Varicella zoster is a commonly known virus from the Herpesviridae family, which is capable to cause such diseases as shingles and postherpetic neuralgia. Risk of these diseases increases with the age, so vaccination is highly recommended for people who is older than 60 years. Zostavax is a live attenuated vaccine which is used to treat a latent virus that has remained dormant in cells since chicken pox infection.

In this study we analysed transcriptomics data from the research conducted by the Human Immunology Project Consortium among volunteers after vaccination with Zostavax vaccine (SDY984 study) using open-source ImmuneSpace database. The aim of our project was to compare the gene expression profiles of patients with high and low levels of post-vaccination immunity and to identify early transcriptional markers that can be used to evaluate the quality of the immunity.


Aim, tasks and data


The aim of our project is to find genes with expression significantly related to the strong immune response to Zoster vaccination, as well as those biological processes which are overrepresented by these genes.

This aim can be achieved bt solving the following tasks:

  • Literature review on the basics of RNA transcription, process of immune response to vaccination and results of other studies realized on the same dataset.

  • Descriptive statistics for the baseline characteristics of the study participants.

  • Analysis of differential gene expression (revealing genes with expression that differs significantly between participants with the strong and weak response, before vaccination and over time after it).

  • Analysis of the relation between the probability of the strong immune response and gene expression at different time points (revealing genes with expression associated with the probability of the strong response).

  • Gathering annotations for biological processes represented by gene sets revealed at the previous two steps, listing the most represented biological processes.


Dataset for the project was obtained within the study SDY984 realized under the program of the Human Immunology Project Consortium (HIPC) and is available at HIPC-II Immune Signatures Data Resource and Analysis (IS2) (can be downloaded after registration). Results of the original study made on the full data set were published in (Li et al. 2017).

Gene annotations will be obtained from the Gene Ontology (GO) - a special knowledge database containing the most up-to-date information concerning annotations on genes, gene products' molecular functions, biological processes accomplished by these products and cellular components inside which these functions are performed. In our research we will be mainly interested in biological processes represented by genes with differential expression.


In the dataset for the study SDY984 gene expression was assessed for 35 volunteers just before vaccination and in the several points (1, 3, 7 days) after it.

Response to vaccination was evaluated at 30 days after vaccination. All participants were divided into 3 groups by the strength of their response: low, moderate and high responders. There were several alternative criteria (measures) for this division within the whole range of IS2 studies, but for our project we will use just one of the two available approaches - the MFC_p40 measure. MFC is the maximum fold change of the antibody titer after vaccination in comparison with its value before it (maximum - because some vaccines in some of the IS2 studies contained several strains of virus or atibody titer was measured by several tests for them (Fourati et al. 2022)). There was just one strain of Zoster in Zostavax vaccine and antibody titer in the SDY984 study was evaluated with just one test (IgG measurement with ELISA (enzyme-linked immunosorbent assay)), therefore in our case MFC is actually just singular fold change of the antibody titer at 30 days after vaccination (precisely, log2(FC)). As for labelling the participants by the strength of their response, thresholds for their division into low, moderate and high responders were chosen as the 40th and 60th percentiles of MFC (Fourati et al. 2022), correspondingly (detailed information about that can be found, inter alia, inside the scripts which were used by researchers for preprocessing the data for the study we have chosen).

For simplification we exclude moderate responders (7 participants) from our analysis.


It should be noted that the original study (Li et al. 2017) utilized the broader data set in comparison with the one available to us (with respect to the greater number of volunteers, number of indicators, including those from metabolomics and flow cytometry, as well as number of time points when these variables were measured) and aimed at wider range of tasks with a bit different (in comparison with ours) emphasis, namely the estimate of the correlation between changes in metabolic indicators and changes of genes' and their modules' expression up to 180 days from vaccination. Therefore, the authors succeed in the detailed description of the immune response to Zostavax vaccine, without it dichotomization. In our research we set a narrower goal - to reveal those genes that are differentially expressed (DEGs) in participants depending on the strength of their immune response or expression of which is related (positively or negatively) to the probability of the strong response, and to find out those paths which are represented by these genes.


Literature review


Process of the immune response generation can be described as follows (Pollard and Bijker 2021):


Or as follows (Desmet and Ishii 2012):


Vaccine contains patogene-associated molecular patterns (PAMPs) or can induce local reaction with releasing damage-associated molecular patterns (DAMPs). These patterns are recognized by pattern recognition receptors (PRRs), expressed by the cells of the innate immune system: macrophages, mast cells, neutrophils, dendritic cells (DCs). These cells do not reveal which particular infection or another pathogen has infiltrated the organism, but, depending on the certain cell, should destroy or absorb and split them into fragments and/ or signal about them to the cells of adpative immunity. Adaptive immunity system will then work to form/ initialize specific, target immune response against this certain pathogen, by antibody production, in particular.

In addition, PAMPs and DAMPs can be detected by lymphoid cells of the innate immunity (e.g., NK cells), which induce the release of cytokines, that may cooperate in the activation and orientation of the dendritic cells (DCs). DCs plays the crucial role in transferring the signal from the innate to the adaptive immune system. Activated DCs migrate to lymphoid nodes where they represent fragments of the absorbed antigen to T-cells through special mechanism of co-stimulation of the T-cells receptors (TCR). At this stage one type of T-cells (T-killers, or CD8+ lymphocytes) interact with the type I molecules of the major histocompatibility complex (MHC-I) and, upon recognition of the alien MHC-I, release proteins that dissolve the pathogen, while another type of T-cells (T-helpers, or CD4+ lymphocytes) interact with the type II molecules of the MHC (MHC-II) and, upon recognition of the alien MHC-II, produce cytokines and chemokines. Depending on the cytokines mileau, CD4+ lymphocites can differentiate into different subtypes of T-helpers (Th) and T-follicular helpers (Tfh) - the latter activates B-cells which differentiate into plasma cells that produce antigen-specific antibodies. B-cells and CD8+ lymphocytes can differentiate into memory cells, which allow immune system more quickly recognize and response to familiar pathogen.

The given algorithm of the immune response to vaccination gives a rough notion about those biological processes which can be activated after vaccination. Based on that, in our analysis we will pay special attention to those paths which are directly related to the immune response, namely the offsprings of the following bioilgical processes from Gene Ontology (on the pictures below only their childrens are shown, click on pictures to enlarge them):

  • immune system process (GO:0002376) - any process involved in the development or functioning of the immune system, an organismal system for calibrated responses to potential internal or invasive threats.
  • defense response (GO:0006952) - reactions, triggered in response to the presence of a foreign body or the occurrence of an injury, which result in restriction of damage to the organism attacked or prevention/recovery from the infection caused by the attack.
  • cytokine production (GO:0001816) - appearance of a cytokine due to biosynthesis or secretion following a cellular stimulus, resulting in an increase in its intracellular or extracellular levels.
  • response to cytokine (GO:0034097) - any process that results in a change in state or activity of a cell or an organism (in terms of movement, secretion, enzyme production, gene expression, etc.) as a result of a cytokine stimulus.
  • MHC protein complex assembly (GO:0002396) - the aggregation, arrangement and bonding together of a set of components to form an MHC protein complex.

It is obvious that immune response activates other biological processes, besides mentioned above, e.g. cell activation, cell differentiation, cell-cell interaction, etc., but these paths can work not only in response to vaccination, therefore we will pay attention to them, but will not include them into the group of direct immune-related processes.


Descriptive statistics


Participants' characteristics at the Baseline


Descriptive statistics for participants' characteristics at the baseline (before vaccination) are presented in Table 1. As we are going to compare gene expression between participants with strong and weak response to Zostavax vaccine in our study, we showed statistics for these two groups in Table 1. Young are volunteers aged 25 years old, Elderly are volunteers aged 60 years old.

Table 1. Baseline characteristics of the study participants.
Characteristic Low Responder
N = 14
High Responder
N = 14
p-value
Study arm, n (%) 0.023
Young 4 (29%) 10 (71%)
Elderly 10 (71%) 4 (29%)
Gender, n (%) 0.430
Female 8 (57%) 10 (71%)
Male 6 (43%) 4 (29%)
Race, n (%) 0.516
White 11 (79%) 8 (57%)
Black or African American 2 (14%) 5 (36%)
Unknown 1 (7.1%) 1 (7.1%)
Ethnicity, n (%) >0.999
Not Hispanic or Latino 12 (86%) 13 (93%)
Hispanic or Latino 2 (14%) 1 (7.1%)
log2(IgG), ELISA 0.012
Mean (SD) 14.2 (1.0) 13.0 (1.3)
Median (Q1-Q3) 14.0 (13.4-15.1) 13.0 (12.6-13.7)
Range 12.8-15.9 10.6-15.4
1 p-value: Study arm, Gender, Race - Pearson's Chi-squared test;
Ethnicity - Fisher's exact test; IgG - Mann-Whitney test

Gene expression


The initial dataset contained data in 26925 genes' expression. For the purposes of our research we excluded those genes which had missings on their expression for all participants - it resulted in the data for 16146 genes. We estimate median absolute deviation (MAD) of the expression for these genes (without differentiation by time points) and leave 5 thousand genes with the maximum MAD (genes with the largest variation in expression).


Below histograms for expression are given for 10 randomly selected genes.


Methods


We will use several althernative approaches for identifying genes related to immune response:

  1. Compare gene expressions between low and high responders using the Mann-Whitney test at every time point (0, 1, 3 and 7 days) with FDR-adjusted p-values (Benjamini & Hochberg adjustment), and mark those with adjusted p-value < 0.05 as differentially expressed (DEGs).

  2. For every gene out of 5 thousand genes with the largest MAD we will estimate linear mixed effects model of the following kind:

exprit = β0 + β0i + β1*timej + β2*responsei + β3*timej*responsei + εij, where

  • exprit - i-th individual's gene expression at the j-th time point; we will estimate separate specifications for expression measured as:

    • the initially given data,

    • ranks of genes by expression, calculated for each participant at each time point as the quantile of the empirical distribution function for all genes' expression for this participant at this time point,

  • timej - time from vaccination (days),

  • responsei - i-th participant's response to Zostavax vaccine measured at 30 days from vaccination (1 - high responder, 0 - low responder),

  • β0 - regression intercept (global average expression for this particular gene among all participants and time points),

  • β0i - individual random effect (modeled for each participant as normally distributed variable with zero mean and variance which characterize the between-subject variation of the mean gene expression around the global average gene expression),

  • β1,2,3 - regression coefficients (fixed effects for time, response and their interaction),

  • εij - residuals (it is assumed that they are normally distributed with zero mean and variance which characterize the within-subject variation of the gene expression (in time)).

    This model allows to estimate whether the response effect changes significantly over time (or, equivalently, whether the change in expression differs significantly depending on the response to vaccination). β2 estimate in this model can be interpreted as the average treatment effect (ATE) of the high response at the Baseline (time=0), i.e. the baseline difference in the average gene expression between high and low responders. For other time points the ATE can be calculated as β2 + β3*time. Therefore, β3 estimate can be interpreted as the average change in the ATE of the high response with every additional day after vaccination.

    Linear mixed models will be estimated with the lmer function out of the lme4 package (Bates et al. 2015) for R (version 4.1.1), average treatment effects will be estimated with the means of the marginaleffects function out of the package with the same name (Arel-Bundock 2023). It can be noted that if we give particular time points to the latter function after the mixed model with the interaction term (e.g., time=0,1,3,7), then near the response variable we will get the desired values for the ATE of the high response at these points (β2 + β3*time) with the corresponding p-values. If to not pass the time points into this function, near the response variable we will get the estimate for the ATE of the high response from the model without the interaction term.

  1. For every gene out of 5 thousand genes with the largest MAD we will estimate logistic regression of the following kind:

log(odds(responsei)) = γ0 + γ1*exprij, where:

  • responsei - response to vaccination for the i-th participant (1 - high responder, 0 - low responder),

  • exprij - gene expression for the i-th participant at the j-th time point, j=0,1,3,7. As for linear mixed models we will estimate separate specifications for:

    • the initially given data,

    • ranks of genes by expression, calculated for each participant at each time point as the quantile of the empirical distribution function for all genes' expression for this participant at this time point (for interpretability of results in logistic regressions we transformed ranks into the scale from 0 to 100),

  • γ0 - regression intercept,

  • γ1 - regression coefficient.

    Estimates of the regression coefficient for each gene and time point will be exponentiated, and after that we will be able to interpret them as the odds ratios (ORs) of response in respect to the increase in the gene expression by 1 at the corresponding time point.


Every specification of the linear mixed models and logistic regressions will give us a gene set with statistically significant coefficients (p < 0.05) or average treatment effects at some time point - we will see how much these gene sets are intersected between each other using upset plots (a variant of Venn diagrams) to understand whether there are some changes in them over time or differences depending on expression measurement or model type. In addition, we will match statistical significance with the size of effects (ATEs and ORs) using volcano plots to identify genes with the largest significant effects, i.e. genes for which expression is highly positively or negatively correlated with the high response.


Gene ontology (GO) annotations for genes and bilogical processes (BPs) will be obtained with the means of the AnnotationDbi package for R (Pagès H, Carlson M, Falcon S, Li N 2022).

As there can be many findings (i.e.many DEGs), and products of the same gene can participate in different paths, as well as products from different genes can be involved into the same process, the naive estimate of BPs' representations in each gene set with the number of representative genes (which we also will use, however, due to the simplicity and visibility of this approach) can give biased results. Approaches from the singular enrichment analysis (SEA) (Tipney and Hunter 2010) allows to determine which particular processes are overrepresented by the genes in some set in comparison with the full gene set (the latter comprises 5 thousand genes in our case).

At the final stage of our analysis we will try to reveal such overrepresented BPs using p-values from the hypergeometric distribution, i.e. probabilities of getting the same or even more representation of the given BP in the set of $n$ genes in comparison with the full set of $N$ genes (Boyle et al. 2004). These p-values can be calculated with the hyperGtest function from the Category package in R (Gentleman R 2022). We will set the option conditional to TRUE in it (more information on that can be found in the vignette, Falcon and Gentleman 2007), use 0.05 as a p-value cut-off and set minimum and maximum number of genes in BPs from the gene universe to 10 and 500, correspondingly. After estimating the hypergeometric test we will exclude BPs with less than 5 genes from the chosen gene set. For the p-values of the remained BPs we will use two approaches to FDR control: p-value adjustment under Benjamini & Hochberg method and q-values estimations (Storey and Tibshirani 2003).

SEA will be held separately for each set of significant genes, revealed after linear mixed models and logistic regressions.


Results


DEGs: pairwise Mann-Whitney tests


At every time point for every gene we compared its expression between low and high responders using the Mann-Whitney test, correct the obtained p-values with Benjamini-Hochberg method (for the point-wise control of the FDR) and count the number of genes with unadjusted and adjusted p-value less than 0.05. The results are given in Table 2.

Table 2. P-values of the Mann-Whitney tests before and after
adjustment under Benjamini & Hochberg method
,
number of genes (%)
Baseline 1 d. 3 d. 7 d.
P-value
< 0.05 356 (7.1%) 423 (8.5%) 353 (7.1%) 426 (8.5%)
$\geq$ 0.05 4644 (93%) 4577 (92%) 4647 (93%) 4574 (91%)
Adjusted p-value
< 0.05 0 (0%) 0 (0%) 0 (0%) 2 (<0.1%)
$\geq$ 0.05 5000 (100%) 5000 (100%) 5000 (100%) 4998 (100%)

There are just 2 genes with statistically significant different expression (at 5% significance level) after adjustment. Both of these cases are observed at 7 days after vaccination. These genes are are IL2RA, RIC3. Data on their expression is given in Table 3.

Table 3. Descriptive statistics for genes with expressions, significantly different between low and high responders.
Baseline
1 d.
3 d.
7 d.
Gene/ Statistic Low Responder
N = 14
High Responder
N = 14
p-value Low Responder
N = 14
High Responder
N = 14
p-value Low Responder
N = 14
High Responder
N = 14
p-value Low Responder
N = 14
High Responder
N = 14
p-value
IL2RA 0.914 0.902 0.893 0.017
Mean (SD) 4.9 (0.4) 5.1 (0.4) 5.0 (0.4) 5.1 (0.4) 5.1 (0.4) 5.0 (0.5) 4.8 (0.3) 5.4 (0.4)
Median (Q1-Q3) 5.1 (4.7-5.1) 4.9 (4.8-5.4) 5.1 (4.7-5.2) 5.2 (4.8-5.4) 5.1 (4.8-5.4) 4.9 (4.6-5.3) 4.8 (4.7-5.0) 5.3 (5.3-5.5)
Range 4.3-5.3 4.5-5.9 4.3-5.6 4.7-6.0 4.3-5.8 4.4-5.8 4.1-5.2 4.8-6.2
RIC3 0.752 0.401 0.739 0.017
Mean (SD) 4.2 (0.3) 4.4 (0.3) 4.1 (0.3) 4.4 (0.5) 4.2 (0.5) 4.5 (0.4) 3.8 (0.3) 4.5 (0.3)
Median (Q1-Q3) 4.1 (4.0-4.4) 4.3 (4.2-4.6) 4.1 (3.9-4.2) 4.4 (4.2-4.7) 4.1 (3.8-4.6) 4.5 (4.3-4.6) 3.7 (3.6-3.9) 4.4 (4.2-4.7)
Range 3.7-4.7 3.8-5.0 3.5-4.6 3.1-5.1 3.4-5.0 3.7-5.2 3.3-4.4 4.1-4.9
1 p-value: Mann-Whitney test with Benjamini & Hochberg correction

As an illustration:


IL2RA states for interleukin 2 receptor subunit alpha, it codes Interleukin-2 receptor alpha chain (CD25), Interleukin-2 (IL-2) which participates in regulating the activity of immune-related leukocytes. Here are biological processes for this genes' products (from Gene Ontology): inflammatory response to antigenic stimulus; regulation of T cell tolerance induction; apoptotic process; activation-induced cell death of T cells; inflammatory response; immune response; cell surface receptor signaling pathway; Notch signaling pathway; interleukin-2-mediated signaling pathway; positive regulation of activated T cell proliferation; negative regulation of T cell proliferation; positive regulation of T cell differentiation; regulation of T cell homeostatic proliferation; negative regulation of inflammatory response.

RIC3 states for RIC3 acetylcholine receptor chaperone, it codes chaperon protein RIC-3, which is responsible for resistance to cholinesterase 3 inhibitors. Chaperon proteins participate in folding and unfolding of the large proteins or macromolecular protein complexes. Here are biological processes for this genes' products (from Gene Ontology): protein folding; positive regulation of cytosolic calcium ion concentration; synaptic transmission, cholinergic; protein localization to cell surface; cellular protein-containing complex assembly; positive regulation of protein localization to cell surface.


DEGs: linear mixed models


DEGs sets


Upon estimation of the linear mixed models we got 49 genes with significant coefficients for both the main response effect and its interaction with time in the specifications with the initial data for expression (42 in the specification with the ranks) - these genes were not only differentially expressed at the Baseline, but differed in the dynamics of the mean expression after vaccination in dependence of the response strength. In addition, 277 (260) genes had significant main effect for the response only, i.e. were differentially expressed at the Baseline, but got similar dynamics of the expression among low and high responders. For 318 (337) genes the main effect of the response was insignificant, while the interaction term was significant, i.e. these genes got similar average expression before vaccination, but diverged in its dynamics afterwards.

We can compare p-values for the main effect of the response and response-time interaction for each gene using the scatter diagram, showing p-value for the main effect (β2 estimate) on the X axis and p-value for the interaction term (β3 estimate) on the Y axis. For better clarity, we use $-log_{10}$-transformation for both of p-values. In each quadrant except the lower left (here are genes with both insignificant effects), we label 5 genes with the lowest p-values.

In addition, we can see how much significant gene sets, revealed by the significance of the coefficients near response and its interaction with time, are intersected (Figure 4).


Combining the results for the both specifications of gene expression we found 334 unique genes with the significant main effect for response (in any of specifications) and its insignificant interaction with time (in both specifications), and 370 unique genes with significant interaction (in any specification) and insignificant main term (in both specifications). We can say that the average expression for the former genes was significantly different between high and low responders before vaccination, but its dynamics was similar for these groups after vaccination. On the contrary, the latter genes got similar average expression at the Baseline for both groups, but differed in its change over time.

Using information from Gene Ontology for each of these two gene sets we extract those signalling paths (biological processes) into which the products of these genes are involved. Figure 5 (1) represents the revealed processes with the largest number of genes. Figure 5 (2) lists the most represented immune-related processes (those of them which were missing among ones with the significant timre-response interaction are marked in blue color).


Besides regression coefficients, we can also estimate the average treatment effect (ATE) of the response at particular time points. Figures 6(1) and 6(2) shows the intersection of the genes sets with the significant ATE between time points.

The noticeably small number of genes have the significant ATE over the whole range of time points when expression was assessed (0-7 days). There were also quite few genes with insignificant baseline ATE of response and significant ATE from 1 to 7 days, as well genes with significant ATE at the Baseline and insignificant subsequent ATEs. On the contrast, there are two modes: genes for which the ATE became significant only at 7 days after vaccination and genes for which the ATE turned from significant at 0-3 days to insignificant at 7 days.

After combining the results for the specifications with the initial data on expression and ranks of genes by expression we obtain 310 genes with significant 7-day ATE (in any specification) and insignificant 0-1-3-day ATEs (in both specifications), as well as 216 genes with significant 0-1-3-day ATEs (in any specification) and insignificant 7-day ATE (in both specifications). The most represented by these two gene sets biological processes are shown in Figures 7 (1) and 7(2).


Significance vs. size of the average response effect


At the next step in the analysis of the results obtained from the linear mixed models we can match the statistical significance of the average response effect to its size.

Figure 8 contains the volcano plots for the ATEs at particular time points (on the X axis) and -log10-transformed p-values for them (on the Y axis). Points for 5 genes with the largest negative and largest positive significant ATEs are highlighted at every point.

The same plots are drawn for the average response effect at particular time points estimated after linear mixed models with the response-time interaction (Figure 9).


On the one side, it can be seen that points have been "stretched" to the left and right by 7 days after vaccination. On the other hand, it is noticeable that the absolute value of the ATEs for genes with the largest ones at the Baseline has been shrunk over time.


Let's call genes with ATE > 0.25 (in the intitial measirements units for expressiob) as upregulated and genes with ATE < -0.25 as downregulated at the corresponding time point. The average expression for every gene in the former gene set is significantly higher among high responders in comparison to the low ones, for the latter it is significantly lower. Using Gene Ontology we obtained annotations for the corresponding biological processes for these sets, chose 10 of the most represented of them for each point and showed them at Figure 9.


In addition, we filtered only immune-related processes and showed the most represented of them at Figure 10.


Probability of high response vs. gene expression: logistic regression


Sets of genes with expression related to high response


Figures 11(1) and 11(2) shows how much the sets of genes with significant coefficients in logistic regressions are intersected.

Also for every time point we can compare gene sets with significant coefficient for expression in the logistic regression and gene sets with significant ATE at the corresponding time point as a result of the linear mixed model - results are shown at Fugures 12.:

We see that the gene sets are much more intersected at day 7, than before vaccination and up to 3 days after it.


Significance vs. size of the response odds ratio


We can estimate the size of the effect of the gene expression at the particular point of time on the probability of high response using odds rations, OR (exponentiated coefficients from the logistic regression).

Figure 12 contains volcano plot matching the effect size (log10 of the OR on the X asis) with its statistical significance (-log10-transformed p-value on the Y axis). For the purpose of visualiazation in the results for regressions with the initial data on expressions we replaced log10 values of ORs out of the interval [-10, 10] with the nearest border of this interval, we did the same with the log10 values of ORs out of the interval [-4, 4] from the results in regressions with ranks of genes by expression. We colored points for 5 genes with the lowest and largest significant ORs in orange.


Functional analysis


The number of the overrepresented BPs obtained in every model with different cut-offs for the adjusted p-values and q-values after the hypergeometric test is listed in Table 4 (empty cells state for zero BPs).

Table 4. Number of significant biological processes (GO) found in hypergeometric tests
Adjusted p-value
q-value
Gene set source < 0.1 < 0.05 < 0.01 < 0.001 < 0.1 < 0.05 < 0.01 < 0.001
Expr., sig. ATE (Baseline) 3 10 4
Expr., sig. ATE (1 d.) 2
Expr., sig. ATE (3 d.)
Expr., sig. ATE (7 d.)
Expr., sig. b2 (Baseline) 8 20 12
Expr., sig. b3 (Change)
Ranks, sig. ATE (Baseline) 3 2 1 39 8 2
Ranks, sig. ATE (1 d.) 10 8 38 10 1
Ranks, sig. ATE (3 d.)
Ranks, sig. ATE (7 d.)
Ranks, sig. b2 (Baseline) 6 2 2 55 7 2
Ranks, sig. b3 (Change)
Expr., sig. OR (Baseline) 1 1 6 1 1
Expr., sig. OR (1 d.) 6 115 14
Expr., sig. OR (3 d.)
Expr., sig. OR (7 d.)
Ranks, sig. OR (Baseline)
Ranks, sig. OR (1 d.) 1 1 33 14
Ranks, sig. OR (3 d.)
Ranks, sig. OR (7 d.)

Mostly all overrepresented biological processes are found for the Baseline and 1-day post-vaccination.

Table 5 contains all BPs with q-value < 0.05, grouped by time points when they were overrepresented at least in one model. There are 3 among them which are directly related to the immune system (particularly to the innate immunity), namely: regulation of natural killer cell mediated immunity, positive regulation of innate immune response, positive regulation of natural killer cell mediated cytotoxicity - they are marked in bold in the table.

Table 5. Overrepresented BPs, by time
Time GOBPID BP (GO term) Definition (GO)
1 d. GO:0002715 regulation of natural killer cell mediated immunity Any process that modulates the frequency, rate, or extent of natural killer cell mediated immunity.
1 d. GO:0045089 positive regulation of innate immune response Any process that activates or increases the frequency, rate or extent of the innate immune response, the organism's first line of defense against infection.
1 d. GO:0045954 positive regulation of natural killer cell mediated cytotoxicity Any process that activates or increases the frequency, rate or extent of natural killer cell mediated cytotoxicity.
1 d. GO:0010562 positive regulation of phosphorus metabolic process Any process that increases the frequency, rate or extent of the chemical reactions and pathways involving phosphorus or compounds containing phosphorus.
1 d. GO:0010611 regulation of cardiac muscle hypertrophy Any process that modulates the rate, frequency or extent of the enlargement or overgrowth of all or part of the heart due to an increase in size (not length) of individual cardiac muscle fibers, without cell division.
1 d. GO:0014068 positive regulation of phosphatidylinositol 3-kinase signaling Any process that activates or increases the frequency, rate or extent of signal transduction mediated by the phosphatidylinositol 3-kinase cascade.
1 d. GO:0021602 cranial nerve morphogenesis The process in which the anatomical structure of the cranial nerves are generated and organized. The cranial nerves are composed of twelve pairs of nerves that emanate from the nervous tissue of the hindbrain. These nerves are sensory, motor, or mixed in nature, and provide the motor and general sensory innervation of the head, neck and viscera. They mediate vision, hearing, olfaction and taste and carry the parasympathetic innervation of the autonomic ganglia that control visceral functions.
1 d. GO:0030003 cellular cation homeostasis Any process involved in the maintenance of an internal steady state of cations at the level of a cell.
1 d. GO:0030198 extracellular matrix organization A process that is carried out at the cellular level which results in the assembly, arrangement of constituent parts, or disassembly of an extracellular matrix.
1 d. GO:0035025 positive regulation of Rho protein signal transduction Any process that activates or increases the frequency, rate or extent of Rho protein signal transduction.
1 d. GO:0042327 positive regulation of phosphorylation Any process that activates or increases the frequency, rate or extent of addition of phosphate groups to a molecule.
1 d. GO:0043393 regulation of protein binding Any process that modulates the frequency, rate or extent of protein binding.
1 d. GO:0048015 phosphatidylinositol-mediated signaling A series of molecular signals in which a cell uses a phosphatidylinositol-mediated signaling to convert a signal into a response. Phosphatidylinositols include phosphatidylinositol (PtdIns) and its phosphorylated derivatives.
1 d. GO:0050801 ion homeostasis Any process involved in the maintenance of an internal steady state of ions within an organism or cell.
1 d. GO:0051347 positive regulation of transferase activity Any process that activates or increases the frequency, rate or extent of transferase activity, the catalysis of the transfer of a group, e.g. a methyl group, glycosyl group, acyl group, phosphorus-containing, or other groups, from a donor compound to an acceptor.
1 d. GO:0051480 regulation of cytosolic calcium ion concentration Any process involved in the maintenance of an internal steady state of calcium ions within the cytosol of a cell or between the cytosol and its surroundings.
1 d. GO:0055065 metal ion homeostasis Any process involved in the maintenance of an internal steady state of metal ions within an organism or cell.
1 d. GO:0060349 bone morphogenesis The process in which bones are generated and organized.
1 d. GO:0071398 cellular response to fatty acid Any process that results in a change in state or activity of a cell (in terms of movement, secretion, enzyme production, gene expression, etc.) as a result of a fatty acid stimulus.
1 d. GO:0072507 divalent inorganic cation homeostasis Any process involved in the maintenance of an internal steady state of divalent cations within an organism or cell.
1 d. GO:2000772 regulation of cellular senescence
Baseline GO:0002067 glandular epithelial cell differentiation The process in which a relatively unspecialized cell acquires specialized features of a glandular epithelial cell. A glandular epithelial cell is a columnar/cuboidal epithelial cell found in a two dimensional sheet with a free surface exposed to the lumen of a gland.
Baseline GO:0007187 G protein-coupled receptor signaling pathway, coupled to cyclic nucleotide second messenger The series of molecular signals generated as a consequence of a G protein-coupled receptor binding to its physiological ligand, where the pathway proceeds with activation or inhibition of a nucleotide cyclase activity and a subsequent change in the concentration of a cyclic nucleotide.
Baseline GO:0015872 dopamine transport The directed movement of dopamine into, out of or within a cell, or between cells, by means of some agent such as a transporter or pore. Dopamine is a catecholamine neurotransmitter and a metabolic precursor of noradrenaline and adrenaline.
Baseline GO:0044262 cellular carbohydrate metabolic process The chemical reactions and pathways involving carbohydrates, any of a group of organic compounds based of the general formula Cx(H2O)y, as carried out by individual cells.
Baseline GO:0046661 male sex differentiation The establishment of the sex of a male organism by physical differentiation.
Baseline GO:0048568 embryonic organ development Development, taking place during the embryonic phase, of a tissue or tissues that work together to perform a specific function or functions. Development pertains to the process whose specific outcome is the progression of a structure over time, from its formation to the mature structure. Organs are commonly observed as visibly distinct structures, but may also exist as loosely associated clusters of cells that work together to perform a specific function or functions.
Baseline GO:0051148 negative regulation of muscle cell differentiation Any process that stops, prevents, or reduces the frequency, rate or extent of muscle cell differentiation.
Baseline GO:0051154 negative regulation of striated muscle cell differentiation Any process that stops, prevents, or reduces the frequency, rate or extent of striated muscle cell differentiation.
Baseline GO:0051482 positive regulation of cytosolic calcium ion concentration involved in phospholipase C-activating G protein-coupled signaling pathway Any process that increases the concentration of calcium ions in the cytosol that occurs as part of a PLC-activating G protein-coupled receptor signaling pathway. G-protein-activated PLC hydrolyses phosphatidylinositol-bisphosphate (PIP2) to release diacylglycerol (DAG) and inositol trisphosphate (IP3). IP3 then binds to calcium release channels in the endoplasmic reticulum (ER) to trigger calcium ion release into the cytosol.
Baseline GO:0060348 bone development The process whose specific outcome is the progression of bone over time, from its formation to the mature structure. Bone is the hard skeletal connective tissue consisting of both mineral and cellular components.
Baseline, 1 d. GO:0003416 endochondral bone growth The increase in size or mass of an endochondral bone that contributes to the shaping of the bone.
Baseline, 1 d. GO:0007569 cell aging An aging process that has as participant a cell after a cell has stopped dividing. Cell aging may occur when a cell has temporarily stopped dividing through cell cycle arrest (GO:0007050) or when a cell has permanently stopped dividing, in which case it is undergoing cellular senescence (GO:0090398). May precede cell death (GO:0008219) and succeed cell maturation (GO:0048469).
Baseline, 1 d. GO:0007605 sensory perception of sound The series of events required for an organism to receive an auditory stimulus, convert it to a molecular signal, and recognize and characterize the signal. Sonic stimuli are detected in the form of vibrations and are processed to form a sound.
Baseline, 1 d. GO:0051057 positive regulation of small GTPase mediated signal transduction Any process that activates or increases the frequency, rate or extent of small GTPase mediated signal transduction.
Baseline, 1 d. GO:0051928 positive regulation of calcium ion transport Any process that activates or increases the frequency, rate or extent of the directed movement of calcium ions into, out of or within a cell, or between cells, by means of some agent such as a transporter or pore.
Baseline, 1 d. GO:0098868 bone growth The increase in size or mass of a bone that contributes to the shaping of that bone.

Discussion


We analysed transcriptomics data from the research conducted by the Human Immunology Project Consortium among volunteers after vaccination with the live attenuated vaccine against Varicella zoster (Zostavax). The dataset used for the analysis contained data on 5 thousand genes' expression at 0, 1, 3 and 7 days after vaccination, and indicator for the immune response status according to categorized fold change in antibodies to Zoster from 0 to 30 days (14 high and 14 low responders).

We used several alternative approaches to identify genes with expression related to the immune response.

Comparison of the gene expressions between low and high responders using Mann-Whitney tests with p-values adjustment under Benjamini & Hochberg method discovered 2 genes with statistically significant (at 5% significance level) different expression at 7 days after vaccination: IL2RA and RIC3.

After estimation of the linear mixed models for every gene' expression over individual random intercept, time, response and time-response interaction we got two largest gene sets: one of them consists of 334 genes with the significant main effect for response and its insignificant interaction with time, the other one includes 370 genes with significant interaction and insignificant main term. We can say that the average expression for the former genes was significantly different between high and low responders before vaccination, but its dynamics was similar after vaccination. The latter genes got similar average expression at the baseline for both groups, but differed in its change over time. The most representative biological processes (BPs, annotated using Gene Ontology) for the both of these sets include inflammatory, innate immunity and adaptive immunity responses, but only the first of them has genes involved into neutrophil chemostaxis, chemokine-mediated signalling pathway, regulation of interleukin-8 production and natural killer cell mediated cytoxicity.

In addition, estimation of the average treatment effects (ATEs) for response after linear mixed models gave us another pair of the largest gene sets with significant ATEs: 310 genes for which the ATE became significant only at 7 days after vaccination and 216 genes for which the ATE turned from significant at 0-3 days to insignificant at 7 days. Again, both of them represented, among others, BPs for inflammatory, innate immunity and adaptive immunity responses, and, which is more interesting, genes involved into hemokine-mediated signalling pathway, regulation of interleukins 6 and 8, cellular response to interleukin-1 or interferon-gamma, B-cell differentiation ocurred only in the latter set (significant before vaccination and 3 days after it), while representatives of thymus development, regulation of T-cell proliferation or cytokine production, and lymph node development ocurred only in the former one (had significant ATE only at 7 days).

Genes with the largest significant positive ATE for response (i.e., on average, having higher expression among high responders) included NEBL, CRIP2, CCR3, LILRA3, TSIX, SLC38A11, with the largest negative ATE for response (i.e., on average, having lower expression among high responders) - OLFM4, MMP8, CEACAM8, BNC2, GZMH, USP9Y, KDM5D, RPS4Y1, EIF1AY, DDX3Y.

We showed that over time the number of genes with significant ATE of response which were related to the immune response BP increased - mainly owing to genes regulating inflammatory response. The number of genes with significant ATEs involved into the innate immune response were higher before vaccination and up to 3 days after it, when, among others, paths for response to fungus, mucosa and bacterium were active (perhaps, immune system tried to identify the type of pathogen). T-cell costimulation, regulation of cytokine production and response to cytokine, regulation of B-cell and T-cell proliferation showed up among significant gene sets only at 7 days after vaccination.

Results for logistic regressions for immune response over each gene's expression at each time point demonstrated that gene sets with significant response odds ratio (OR) with respect to 1-point increase in expression are not perfectly intersected with gene sets with significant ATE at the corresponding time point: at 7 days they are intersected by the largest amount of genes, although the set of ATE-significant genes still contains much more unique items than the corresponding set with OR-significant genes.

We can list genes with the largest positive log(OR), i.e. with expression strongly positively related to the probability of the high immune response: STX3, IL2RA, RIC3, TREML5P, IRGQ, CROCCP2, SYNGAP1, and genes with the largest negative log(OR), i.e. i.e. with expression strongly negatively related to the probability of the high immune response: CCL5, SNX33, RAP2A.

Finally, for every set of genes identified as related to the immune response after linear mixed models and logistic regressions, we realized functional (singular enrichment) analysis by estimating p-values from the hypergeometric distribution and discovering overrepresented BPs as ones with q-value < 0.05. There were no BPs overrepresented by gene sets obtained from any model using data on their expressions at 3 or 7 days after vaccination. Totally, we got 37 overrepresented BPs (mostly at 1 day after vaccination), 3 of them are directly related to immunity: regulation of innate immune response, regulation of natural killer cell mediated immunity, regulation of natural killer cell mediated cytotoxicity - all three of them were overrepresented at 1 day post vaccination.


Conclusion


We analysed transcriptomics data from the research conducted by the Human Immunology Project Consortium among volunteers after vaccination with the live attenuated vaccine against Varicella zoster (Zostavax). The dataset is freely available form the ImmuneSpace database. The aim of our project was to compare the gene expression profiles between patients with high and low levels of post-vaccination immunity and to identify early transcriptional markers that can be used to evaluate the quality of the immunity.

As we got data on gene expression only up to 7 days after vaccination, while immune response to vaccine was assessed using ELISA-measured IgG for Zoster antigen at 30 days, we were able to analyze gene expression profiles only for the very early stage of the immune response. We found some signs of activation in the signal paths related to inflammatory response, as well as to innate and adaptive immune response among high responders.

Thus, at 1 day after vaccination paths for positive regulation of the innate immune response (particularly, for the natural killer cell mediated immunity) were overrepresented by gene sets with expression significantly related to the strength of immune response in study participants. In addition, at the early stage there were differentially expressed genes involved into processes of response to fungus, mucosa and bacterium, while T-cell costimulation, regulation of cytokine production and response to cytokine, regulation of B-cell and T-cell proliferation showed up among significant gene sets only at 7 days after vaccination.


References


Arel-Bundock, Vincent. 2023. Marginaleffects: Marginal Effects, Marginal Means, Predictions, and Contrasts. https://vincentarelbundock.github.io/marginaleffects/.

Bates, Douglas, Martin Mächler, Ben Bolker, and Steve Walker. 2015. “Fitting Linear Mixed-Effects Models Using Lme4.” Journal of Statistical Software 67 (1). https://doi.org/10.18637/jss.v067.i01.

Boyle, E. I., S. Weng, J. Gollub, H. Jin, D. Botstein, J. M. Cherry, and G. Sherlock. 2004. “GO::TermFinder–Open Source Software for Accessing Gene Ontology Information and Finding Significantly Enriched Gene Ontology Terms Associated with a List of Genes.” Bioinformatics 20 (18): 3710–15. https://doi.org/10.1093/bioinformatics/bth456.

Desmet, Christophe J., and Ken J. Ishii. 2012. “Nucleic Acid Sensing at the Interface Between Innate and Adaptive Immunity in Vaccination.” Nature Reviews Immunology 12 (7): 479–91. https://doi.org/10.1038/nri3247.

Falcon, S., and R. Gentleman. 2007. “Using GOstats to Test Gene Lists for GO Term Association.” Bioinformatics (Oxford, England) 23 (2): 257–58. https://doi.org/10.1093/bioinformatics/btl567.

Fourati, Slim, Lewis E. Tomalin, Matthew P. Mulè, Daniel G. Chawla, Bram Gerritsen, Dmitry Rychkov, Evan Henrich, et al. 2022. “Pan-Vaccine Analysis Reveals Innate Immune Endotypes Predictive of Antibody Responses to Vaccination.” Nature Immunology 23 (12): 1777–87. https://doi.org/10.1038/s41590-022-01329-5.

Gentleman R. 2022. Category: Category Analysis. R Package Version 2.65.0.

Li, Shuzhao, Nicole L. Sullivan, Nadine Rouphael, Tianwei Yu, Sophia Banton, Mohan S. Maddur, Megan McCausland, et al. 2017. “Metabolic Phenotypes of Response to Vaccination in Humans.” Cell 169 (5): 862–877.e17. https://doi.org/10.1016/j.cell.2017.04.026.

Pagès H, Carlson M, Falcon S, Li N. 2022. AnnotationDbi: Manipulation of SQLite-Based Annotations in Bioconductor. R Package Version 1.60.0. https://bioconductor.org/packages/AnnotationDbi.

Pollard, Andrew J., and Else M. Bijker. 2021. “A Guide to Vaccinology: From Basic Principles to New Developments.” Nature Reviews Immunology 21 (2): 83–100. https://doi.org/10.1038/s41577-020-00479-7.

Storey, John D., and Robert Tibshirani. 2003. “Statistical Significance for Genomewide Studies.” Proceedings of the National Academy of Sciences 100 (16): 9440–45. https://doi.org/10.1073/pnas.1530509100.

Tipney, Hannah, and Lawrence Hunter. 2010. “An Introduction to Effective Use of Enrichment Analysis Software.” Human Genomics 4 (3): 202. https://doi.org/10.1186/1479-7364-4-3-202.

About

Final study project, Biostatistics program at the Bioinformatics Institute

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •