Gene expression regulation in lungs of COPD patients

Roberta Minotti and Philip Zimmermann
© NEBION AG. July 31, 2015


GENEVESTIGATOR is a powerful tool for combined exploration of many transcriptomic datasets (microarray and RNA-seq). At the time of this analysis, its database contained over 120,000 extensively curated, quality controlled and globally normalized microarrays from human, mouse and rat. Thanks to several academic and industrial collaborations, it has a very rich content for respiratory diseaeses, in particular COPD, smoking, IPF and PAH. In this example, we searched for genes specifically up-regulated in lung tissues of chronic obstructive pulmonary disease (COPD) patients. Using GENEVESTIGATOR, we identified candidate genes related to COPD (in smoker populations), of which some were already known to be associated with the onset of the disease but several others have never been published in this context. The results from the human platform were subsequently confirmed in the mouse datasets. We show a particular workflow and demonstrate how GENEVESTIGATOR enables, in a few clicks, to identify genes highly specific for a certain disease and to confirm the results in other organisms.


For our analysis, we selected all curated datasets from the Affymetrix Human Genome U133 Plus 2.0 Array platform (a curated compendium of 49,191 samples). Then we used the Perturbation tool from the GENE SEARCH toolset to compare samples from COPD patients with those of healthy individuals. In most studies, the majority of COPD patients were smokers, while a large proportion of healthy controls were non-smokers. The effect measured therefore results to a large extent from the combination of smoking and COPD. To start the analysis, we chose a single study comparing COPD vs. normal small airway epithelial cell samples as "target" and all the other perturbations as "base" (default setting). This allowed searching for the top 50 genes most specifically up-regulated in small airway epithelial cells of COPD patients. The result of this initial search revealed four other conditions causing similar responses, all of which also investigate small airway epithelial cells in the context of COPD. We therefore refined our query by adding these four conditions to our "target" set of categories and repeated the analysis (see Figure 1).

Figure 1. Identification of the transcripts most specifically up-regulated in the small airway epithelium of smoking COPD patients as compared to all other perturbations in the database.

From the top 50 up-regulated genes identified, several have previously been associated with COPD and/or smoking, for example:
  • UCHL1: ubiquitin carboxyl-terminal esterase L1 (ubiquitin thiolesterase) (Carolan GJ et al., 2006)
  • AKR1B10: also-keto reductase 1, member B10 (Pastor MD et al., 2013) MUC5AC: mucin 5AC (Caramori G et al., 2004)
  • CYP1B1: cytochrome P450, family 1 subfamily B, polypeptide 1 (Kaur-Knudsen D et al., 2012)
By contrast, several genes identified in this analysis have not yet been associated with the onset of the disease and are therefore new candidate genes for the study of COPD. Five out of the 50 identified genes have not yet been characterized at all. COPD occurs primarily as pathological consequence of smoking. To pinpoint which genes are up-regulated independently from smoking status, we compared COPD patients (smokers) with healthy. From this search, 5 genes (ELFN2, GAD1, CEACAM5, PRR4 and CYP1B1) matched with the previous analysis.


To see how the genes are regulated in general, we used the Perturbation tool from the CONDITION SEARCH toolset and looked at other perturbations that also affect their expression. To be more stringent, we used a filter to visualize only significant changes with fold change > 2 and p-value < 0.05 (in this case relative to the first gene/probeset 241764_at). Interestingly, besides the different COPD studies, many smoking experiments in healthy individuals turned out to affect the expression of these genes in a similar manner (Figure 2). Furthermore, a number of other conditions appeared in this search (e.g. exercise, various neoplasms). To better understand the specific behavior of each gene, we performed a hierarchical clustering of these 50 genes across the relevant perturbations.

Figure 2. Selection of perturbations significantly regulating gene 241764_at, the most COPD specific up-regulated gene in our analysis. From 3,230 perturbations tested, 45 were found to significantly regulate this gene (using fold-change > 2 and p < 0.05). All genes from this cluster appear to have a similar expression signature across different COPD studies (red arrows) and the majority of the genes are also up-regulated by smoking (blue balks). Color scale: Red represents up-regulated, green represents down-regulated genes.


From the 3,230 perturbations present in GENEVESTIGATOR on the Affymetrix Human 133 Plus 2 platform, only 45 perturbations caused significant expression changes to the most specific COPD up-regulated gene (241764_at). To identify other genes potentially co-regulated, we created a new data selection containing only these 45 conditions and run a two-way hierarchical clustering of the 50 COPD-up-regulated genes (Figure 3).

Figure 3. Hierarchical clustering of the top 50 COPD up-regulated genes across relevant conditions (Euclidean Distance, no leaf-ordering).

The tree highlighted in red represents the COPD and Smoking perturbations. From this cluster it is clear that all 50 genes which are up-regulated by COPD are also up-regulated by smoking. The blue-marked gene cluster represents genes that are additioinally involved in breast cancer (strong down-regulation). The yellow-marked gene cluster represents genes that are additionally up-regulated during renal cell carcinoma. The violet-marked gene cluster represents genes with no response to any perturbations besides COPD or smoking.


The function of many of the 50 genes that were identified in the previous analysis with human datasets as COPD/Smoking specific is still unknown. To validate these putative biomarkers, animal models may be helpful. We therefore selected the Affymetrix Mouse Genome 430 2.0 Array platform (a compendium of 6,873 samples) and using the Perturbation tool from the CONDITION SEARCH toolset we looked at the responses of the above COPD responsive genes across tobacco smoking studies. Seven genes (10 Probesets) confirmed the results observed in human also in a COPD animal model (Figure 4).

Figure 4. Regulation of some of the previously identified genes by tobacco smoking in mouse. Similarly to human studies, the analyzed genes are up-regulated under smoking conditions (red balks). Interestingly, this pathological signature can be counteracted upon smoking cessation or switching to aerosol smoke (blue balks).


Using GENEVESTIGATOR we could easily screen across 3634 human perturbations to identify candidate biomarkers or targets for COPD. Some of the genes identified were already known to be associated with the disease and could therefore serve as positive controls. We observed that the majority of the COPD related genes were up-regulated already in healthy individuals exposed to cigarette smoke and that some of these genes may also be involved in other diseases such as breast cancer or renal carcinoma. Similar queries could be done to discover also COPD specific down-regulated genes. Working with GENEVESTIGATOR allowed us to easily confirm our results in other organisms, such as mouse. Similar analyses could be performed to find specific biomarkers for other diseases or conditions, to discover new targets for therapeutic areas of interest, or for drug repositioning.

To try out GENEVESTIGATOR or to replicate/extend this analysis, please go here.


Carolan BJ, Heguy A, Harvey BG, Leopold PL, Ferris B, Crystal RG (2006) Up-regulation of expression of the ubiquitin carboxyl-terminal hydrolase L1 gene in human airway epithelium of cigarette smokers. Cancer Res. 2006 Nov 15;66(22):10729-40.  [Abstract]

Pastor MD, Nogal A, Molina-Pinelo S, Meléndez R, Salinas A, González De la Peña M, Martín-Juan J, Corral J, García-Carbonero R, Carnero A, Paz-Ares L. (2013) Identification of proteomic signatures associated with lung cancer and COPD. . J Proteomics. 2013 Aug 26;89:227-37.  [Abstract]

Caramori G, Di Gregorio C, Carlstedt I, Casolari P, Guzzinati I, Adcock IM, Barnes PJ, Ciaccia A, Cavallesco G, Chung KF, Papi A. (2004) Mucin expression in peripheral airways of patients with chronic obstructive pulmonary disease. Histopathology. 2004 Nov;45(5):477-84.  [Abstract]

Kaur-Knudsen D, Bojesen SE, Nordestgaard BG. (2012) Cytochrome P450 1B1 and 2C9 genotypes and risk of ischemic vascular disease, cancer, and chronic obstructive pulmonary disease. Curr Vasc Pharmacol. 2012 Jul;10(4):512-20.  [Abstract]

Hruz T, Laule O, Szabo G, Wessendorp F, Bleuler S, Oertle L, Widmayer P, Gruissem W and P Zimmermann (2008) Genevestigator V3: a reference expression database for the meta-analysis of transcriptomes. Advances in Bioinformatics 2008, 420747 [Full Text]