Gene expression regulation in lungs of COPD patients Roberta
Minotti and Philip Zimmermann
© NEBION AG. July 31, 2015
GENEVESTIGATOR is a powerful tool
for combined exploration of many transcriptomic datasets (microarray
and RNA-seq). At the time of this analysis, its database contained
over 120,000 extensively curated, quality controlled and globally
normalized microarrays from human, mouse and rat. Thanks to several
academic and industrial collaborations, it has a very rich content
for respiratory diseaeses, in particular COPD, smoking, IPF and PAH.
In this example, we searched for genes specifically up-regulated in
lung tissues of chronic obstructive pulmonary disease (COPD)
patients. Using GENEVESTIGATOR, we identified candidate genes
related to COPD (in smoker populations), of which some were already
known to be associated with the onset of the disease but several
others have never been published in this context. The results from
the human platform were subsequently confirmed in the mouse
datasets. We show a particular workflow and demonstrate how
GENEVESTIGATOR enables, in a few clicks, to identify genes highly
specific for a certain disease and to confirm the results in other
IDENTIFICATION OF GENES SPECIFICALLY UP-REGULATED IN COPD
For our analysis, we selected all curated datasets from
the Affymetrix Human Genome U133 Plus 2.0 Array platform (a curated
compendium of 49,191 samples). Then we used the Perturbation tool
from the GENE SEARCH toolset to compare samples from COPD patients
with those of healthy individuals. In most studies, the majority of
COPD patients were smokers, while a large proportion of healthy
controls were non-smokers. The effect measured therefore results to
a large extent from the combination of smoking and COPD. To start
the analysis, we chose a single study comparing COPD vs. normal
small airway epithelial cell samples as "target" and all the other
perturbations as "base" (default setting). This allowed searching
for the top 50 genes most specifically up-regulated in small airway
epithelial cells of COPD patients. The result of this initial search
revealed four other conditions causing similar responses, all of
which also investigate small airway epithelial cells in the context
of COPD. We therefore refined our query by adding these four
conditions to our "target" set of categories and repeated the
analysis (see Figure 1).
Figure 1. Identification of the
transcripts most specifically up-regulated in the small airway
epithelium of smoking COPD patients as compared to all other
perturbations in the database.
From the top 50 up-regulated genes identified, several have
previously been associated with COPD and/or smoking, for example:
- UCHL1: ubiquitin carboxyl-terminal esterase L1
(ubiquitin thiolesterase) (Carolan GJ et al., 2006)
- AKR1B10: also-keto reductase 1, member B10 (Pastor
MD et al., 2013) MUC5AC: mucin 5AC (Caramori G et al., 2004)
- CYP1B1: cytochrome P450, family 1 subfamily B,
polypeptide 1 (Kaur-Knudsen D et al., 2012)
By contrast, several genes identified in this analysis have not yet
been associated with the onset of the disease and are therefore new
candidate genes for the study of COPD. Five out of the 50 identified
genes have not yet been characterized at all. COPD occurs primarily
as pathological consequence of smoking. To pinpoint which genes are
up-regulated independently from smoking status, we compared COPD
patients (smokers) with healthy. From this search, 5 genes (ELFN2,
GAD1, CEACAM5, PRR4 and CYP1B1) matched with the previous analysis.
ARE THE TOP 50 COPD/SMOKING UP-REGULATED GENES AFFECTED BY
To see how the genes are
regulated in general, we used the Perturbation tool from the
CONDITION SEARCH toolset and looked at other perturbations that also
affect their expression. To be more stringent, we used a filter to
visualize only significant changes with fold change > 2 and p-value
< 0.05 (in this case relative to the first gene/probeset 241764_at).
Interestingly, besides the different COPD studies, many smoking
experiments in healthy individuals turned out to affect the
expression of these genes in a similar manner (Figure 2).
Furthermore, a number of other conditions appeared in this search
(e.g. exercise, various neoplasms). To better understand the
specific behavior of each gene, we performed a hierarchical
clustering of these 50 genes across the relevant perturbations.
Figure 2. Selection of
perturbations significantly regulating gene 241764_at, the most COPD
specific up-regulated gene in our analysis. From 3,230 perturbations
tested, 45 were found to significantly regulate this gene (using
fold-change > 2 and p < 0.05). All genes from this cluster appear to
have a similar expression signature across different COPD studies
(red arrows) and the majority of the genes are also up-regulated by
smoking (blue balks). Color scale: Red represents up-regulated,
green represents down-regulated genes.
HIERARCHICAL CLUSTERING OF RELEVANT PERTURBATIONS
From the 3,230 perturbations present in GENEVESTIGATOR on the
Affymetrix Human 133 Plus 2 platform, only 45 perturbations caused
significant expression changes to the most specific COPD
up-regulated gene (241764_at). To identify other genes potentially
co-regulated, we created a new data selection containing only these
45 conditions and run a two-way hierarchical clustering of the 50
COPD-up-regulated genes (Figure 3).
Figure 3. Hierarchical clustering of the top 50 COPD
up-regulated genes across relevant conditions (Euclidean Distance,
The tree highlighted in red represents the COPD and
Smoking perturbations. From this cluster it is clear that all 50
genes which are up-regulated by COPD are also up-regulated by
smoking. The blue-marked gene cluster represents genes that are
additioinally involved in breast cancer (strong down-regulation).
The yellow-marked gene cluster represents genes that are
additionally up-regulated during renal cell carcinoma. The
violet-marked gene cluster represents genes with no response to any
perturbations besides COPD or smoking.
CROSS SPECIES VALIDATION
The function of many of the 50 genes that were identified
in the previous analysis with human datasets as COPD/Smoking
specific is still unknown. To validate these putative biomarkers,
animal models may be helpful. We therefore selected the Affymetrix
Mouse Genome 430 2.0 Array platform (a compendium of 6,873 samples)
and using the Perturbation tool from the CONDITION SEARCH toolset we
looked at the responses of the above COPD responsive genes across
tobacco smoking studies. Seven genes (10 Probesets) confirmed the
results observed in human also in a COPD animal model (Figure 4).
Figure 4. Regulation of some of the previously
identified genes by tobacco smoking in mouse. Similarly to human
studies, the analyzed genes are up-regulated under smoking
conditions (red balks). Interestingly, this pathological signature
can be counteracted upon smoking cessation or switching to aerosol
smoke (blue balks).
Using GENEVESTIGATOR we could
easily screen across 3634 human perturbations to identify candidate
biomarkers or targets for COPD. Some of the genes identified were
already known to be associated with the disease and could therefore
serve as positive controls. We observed that the majority of the
COPD related genes were up-regulated already in healthy individuals
exposed to cigarette smoke and that some of these genes may also be
involved in other diseases such as breast cancer or renal carcinoma.
Similar queries could be done to discover also COPD specific
down-regulated genes. Working with GENEVESTIGATOR allowed us to
easily confirm our results in other organisms, such as mouse.
Similar analyses could be performed to find specific biomarkers for
other diseases or conditions, to discover new targets for
therapeutic areas of interest, or for drug repositioning.
To try out GENEVESTIGATOR or to replicate/extend this
analysis, please go here
REFERENCES Carolan BJ, Heguy A,
Harvey BG, Leopold PL, Ferris B, Crystal RG (2006) Up-regulation
of expression of the ubiquitin carboxyl-terminal hydrolase L1 gene
in human airway epithelium of cigarette smokers.
Cancer Res. 2006
Nov 15;66(22):10729-40. [Abstract
Pastor MD, Nogal A, Molina-Pinelo S,
Meléndez R, Salinas A, González De la Peña M, Martín-Juan J, Corral
J, García-Carbonero R, Carnero A, Paz-Ares L. (2013) Identification
of proteomic signatures associated with lung cancer and COPD.
Proteomics. 2013 Aug 26;89:227-37. [Abstract
G, Di Gregorio C, Carlstedt I, Casolari P, Guzzinati I, Adcock IM,
Barnes PJ, Ciaccia A, Cavallesco G, Chung KF, Papi A. (2004) Mucin
expression in peripheral airways of patients with chronic
obstructive pulmonary disease.
Kaur-Knudsen D, Bojesen SE,
Nordestgaard BG. (2012) Cytochrome P450 1B1 and
2C9 genotypes and risk of ischemic vascular disease, cancer, and
chronic obstructive pulmonary disease.
Curr Vasc Pharmacol. 2012
Hruz T, Laule O, Szabo G, Wessendorp F,
Bleuler S, Oertle L, Widmayer P, Gruissem W and P Zimmermann (2008)
Genevestigator V3: a reference expression database
for the meta-analysis of transcriptomes.
Advances in Bioinformatics
2008, 420747 [Full Text