Microarray analysis with 40 000 cDNA gene chip arrays determined differential gene expression profiles (GEPs) in CD34+ marrow cells from myelodysplastic syndrome (MDS) patients compared with healthy persons. Using focused bioinformatics analyses, we found 1175 genes significantly differentially expressed by MDS versus normal, requiring a minimum of 39 genes to separately classify these patients. Major GEP differences were demonstrated between healthy and MDS patients and between several MDS subgroups: (1) those whose disease remained stable and those who subsequently transformed (tMDS) to acute myeloid leukemia; (2) between del(5q) and other MDS patients. A 6-gene “poor risk” signature was defined, which was associated with acute myeloid leukemia transformation and provided additive prognostic information for International Prognostic Scoring System Intermediate-1 patients. Overexpression of genes generating ribosomal proteins and for other signaling pathways was demonstrated in the tMDS patients. Comparison of del(5q) with the remaining MDS patients showed 1924 differentially expressed genes, with underexpression of 1014 genes, 11 of which were within the 5q31-32 commonly deleted region. These data demonstrated (1) GEPs distinguishing MDS patients from healthy and between those with differing clinical outcomes (tMDS vs those whose disease remained stable) and cytogenetics [eg, del(5q)]; and (2) molecular criteria refining prognostic categorization and associated biologic processes in MDS.

The myelodysplastic syndromes (MDS) are a spectrum of clonal myeloid hemopathies with inherent hematopoietic precursor cell (HPC; ie, inclusive of primitive hematopoietic stem cells [HSCs] and committed progenitor cells) anomalies and abnormal hematopoietic regulation.1,2  Heterogeneous subsets of MDS patients have been defined by their clinical (percentage marrow blasts, number of cytopenias) and biologic (specific cytogenetic and molecular lesions) abnormalities.3  Use of these features has provided methods (eg, the International Prognostic Scoring System [IPSS]) to help define the patients' prognoses, including their relative risk of evolving to acute myeloid leukemia (AML) or to have shortened survival.3  However, these approaches are limited in predicting clinical course, and management of patients remains challenging given the uncertainty of the time course of disease progression. Broad-based molecular and cellular analyses are potentially valuable to improve prognostication and the understanding of the mechanisms underlying the defective hematopoietic cell differentiation and abnormal clone expansion in those patients who undergo progression to AML.

Specific gene expression profiles (GEPs) and differentially expressed cellular pathways have been defined and provide insights into the molecular biology of AML and its subtypes.4-6  However, in contrast to the relatively homogeneous marrow population of blasts present in AML for which several well-defined microarray studies have been reported, analysis of MDS marrow is more complex as it contains heterogeneous populations of cells with various degrees of cellular differentiation. Studies evaluating data from enriched marrow HPCs from a variety of MDS patients have been reported,7-10  as have reports for those predominantly with del(5q) MDS.11-14  However, with one exception,11  prior microarray studies in MDS analyzed a limited number of non-del(5q) subjects (10-22 patients). Differing GEPs were described from each study. No association has been reported in these investigations indicating the relationship between GEPs and the long-term outcome of MDS patients. To further evaluate the molecular nature of stable MDS patients as contrasted to those who progressed to AML, using microarray analysis we assessed the GEPs and their functional correlates from CD34+ marrow cells from such patients after prolonged follow-up.

Patients, bone marrow samples

For microarray analysis of GEPs from patients, CD34+ bone marrow mononuclear cells were obtained by magnetic bead separation (Miltenyi Biotec)15  from 35 MDS patients and 6 age-matched healthy persons. The CD34+ purity was more than 90% on these samples, checked flow cytometrically. CD34+ cells thus obtained were pelleted, frozen in liquid nitrogen, and kept frozen at −80°C until use. MDS patients were categorized by the French-American-British classification, which was the morphologic basis for the IPSS prognostic classification, incorporating refractory anemia with excess blasts in transformation (RAEB-T) patients. AML transformation was thus considered when patients developed morethan 30% marrow blasts. Marrow samples and clinical information were obtained from patients after informed consent in accordance with the Declaration of Helsinki, with the approval of the Stanford Institutional Review Board.

RNA isolation and amplification

RNA was isolated using the RNeasy kit (QIAGEN). We amplified RNA by the method of Wang et al,16  which optimizes amplification of low-abundance RNA samples with high fidelity by combining antisense RNA (aRNA) amplification with a template-switching effect (Clontech). The concentration and quality of aRNA were monitored spectrophotometrically at optical density (OD) 260/280 and 260/230 and with 1% agarose gels. RNA purity and quality were evaluated using the Bioanalyzer 2100 (Agilent Technologies). Cy3-conjugated nucleotide for aRNA from healthy and Cy5-conjugated nucleotide for aRNA from MDS were hybridized to 40 000 gene chip microarrays obtained from the Stanford Functional Genomics Microarray Facility.17  The Gene Expression Omnibus accession number for the deposited microarray data is GSE18366.

Data acquisition and analysis

The microarrays were scanned with an Axon GenePix scanner (Axon Instruments) and software. High-resolution scans (10 microns per pixel) were performed to compile a raw dataset for each microarray. Files were submitted to the Stanford Microarray Database,18  and the data were normalized by computer-generated normalization values. From the 40 000 gene chips, 11 000 genes expressed with high quality and intensity levels more than 1.5-fold background were used for further analysis. The gene expression data discussed in this article have been deposited in the NCBI Gene Expression Omnibus website (http://www.ncbi.nlm.nih.gov/geo/).

GEPs from CD34+ marrow cells from MDS patients were compared with those from age-matched CD34+ healthy marrow cells. aRNA from CD34+ pooled normal marrow cells was used as a reference standard.

SAM

Significance analysis of microarrays (SAM) software was used to measure the strength of the statistical relationship between differentially expressed genes and response variables within our microarray dataset.19  The response variables we used included: unpaired groupings (eg, MDS vs normal, those whose disease subsequently transformed [tMDS] vs normal; those whose disease remained stable [sMDS] vs normal, del(5q) MDS vs normal), multiclass grouping (normal vs sMDS vs tMDS), and censored time to leukemia. A false discovery rate (FDR) was generally set to 10% or less.

Hierarchical cluster dendrograms

Supervised and unsupervised hierarchical clustering methods were used to generate dendrograms from the gene list obtained by SAM analysis.5,17  The graphically ordered tree (dendrogram) indicated the relationships among genes. The cluster program indicating the relationship between genes is represented by a dendrogram tree whose branch lengths indicate the degree of similarity between genes. The computed tree thus groups genes with similar expression patterns to be adjacent and coalesced with arrays from each patient.

PAM

The prediction analysis of microarrays (PAM) methodology is a class predictor for gene expression profiling based on the “nearest shrunken centroid method,” which identified subsets of genes that best characterized each class of samples.20  For example, samples from normal persons and MDS patients as subcategories were compared.

Gene function annotation

Gene functions were assessed using SOURCE, a unified genomic resource provided by the Stanford Microarray Database (http:source.stanford.edu)21  and Gene Ontology (http://www.geneontology.org).22 

Gene set enrichment analysis

We subjected our 11 000 gene sets to gene set enrichment analysis (GSEA), a computational supervised analysis methodology that uses aggregated public gene sets (1892 gene sets within a molecular signature database; http://www.broadinstitute.org/gsea/)23  to identify biologic processes present across phenotypes in our microarray dataset. (For a list of biologic processes, see Table 1.) GSEA assigns an enrichment score, which represents the difference between the observed and expected rankings (based on correlation with the chosen phenotype). These enrichment scores are normalized based on the number of genes in the gene set. The gene sets were weighted according to each included gene's correlation with the phenotype.

Table 1

MDS versus normal: biologic processes engaged in by the differentially expressed genes

CodeBiologic processNo. of genes* MDSvNlPercentage of genesPercentage of genes in overrepresented biologic process
Cell adhesion 27§ 
Cytoskeleton + structural 179 19 
Metabolism 96 10 31 
Oxidative phosphorylation§ 34§ 100 
Apoptosis 47 
Signaling, transport§ 177§ 18 13 
Proliferation 28 14 
Cell cycle 41 
Differentiation 66 
10 Transcription§ 182§ 19 49 
11 Translational§ 59§ 29 
12 Protein ubiquitination 11 
13 Other 18 
 Total 965 100 64 
CodeBiologic processNo. of genes* MDSvNlPercentage of genesPercentage of genes in overrepresented biologic process
Cell adhesion 27§ 
Cytoskeleton + structural 179 19 
Metabolism 96 10 31 
Oxidative phosphorylation§ 34§ 100 
Apoptosis 47 
Signaling, transport§ 177§ 18 13 
Proliferation 28 14 
Cell cycle 41 
Differentiation 66 
10 Transcription§ 182§ 19 49 
11 Translational§ 59§ 29 
12 Protein ubiquitination 11 
13 Other 18 
 Total 965 100 64 
*

Significantly differentially expressed genes in our dataset (FDR = 10%) within biologic processes defined by Gene Ontology.

Percentage of genes within each process.

Percentage of genes within biologic processes, which were overrepresented and significant by hypergeometric analysis (P < .05).

§

Significant in GSEA, FDR ≤ 0.05.

Hypergeometric analysis

This analysis was performed with OntoExpress software (http://vortex.cs.wayne.edu/projects.htm#Onto-Express),24  which evaluated which of the 250 metabolic and signaling pathways from the Kyoto Encyclopedia of Genes and Genomes (http://www.genome.jp/kegg) database were significantly overrepresented when assessing differentially expressed genes from our database. This method evaluated the statistically significant probability (P < .05) of having the observed number of differentially expressed genes within a given biologic process, using the Fisher exact test to determine these probabilities. Gene Ontology was the basis for Kyoto Encyclopedia of Genes and Genomes pathways.

Kaplan-Meier curves

Kaplan-Meier plots were generated using R software (www.r-project.org). These curves were generated based on either clinical features or gene expression values and the patients' freedom from development of AML transformation.

For assessment of degrees of expression of groups of genes (ie, from gene signatures) associated with AML transformation (ie, the “poor risk” signature as indicated in “Results”), we determined the median value for the combined means of the signature genes for each patient and scored the patients as having overexpression or underexpression of these genes. We then generated Kaplan-Meier curves for the patients based on the combination of the significant genes possessing these dichotomous features.

Real-time polymerase chain reaction

Real-time quantitative polymerase chain reaction (RT-PCR) was used to validate expression data for selected genes.25  The expression level of the aRNA from the CD34+ pooled marrow cell reference standard was used to normalize for differences in input cDNA. Predeveloped TaqMan Assays were used (Assays-on-Demand; Applied Biosystems). Each sample was performed in triplicate, and a reverse-transcriptase negative control was also tested to exclude contaminating DNA amplification. The expression ratio was calculated as 2n, where n is the C(T) value difference for each patient (selected gene minus the reference standard).26 

Patient information

GEP analyses were performed on RNA obtained from CD34+ marrow cells from 35 MDS patients and 6 age-matched healthy persons. The clinical and cytogenetic details of the patients classified by French-American-British (but with RAEB patients being subdivided into RAEB-1 and RAEB-2 based on whether they had less than 10% or 10% to 20%, respectively, marrow blasts) and IPSS are described in Table 2. There were 24 IPSS Low/Intermediate-1 (Int-1) [10 with del(5q) cytogenetics] and 11 Int-2/High patients analyzed. The patients had not received disease-specific treatment other than 3 del(5q) patients who had received lenalidomide. Most patients had received recombinant erythropoietin therapy. The patients were monitored clinically, with a median follow-up time of 4.3 years (range, 0.2-8.5 years) from the time the bone marrow sample was obtained. During this follow-up period, 12 of the 35 patients transformed to AML (termed tMDS), all within 14 months, whereas the patients' diseases remained stable in the remaining 23 patients (termed sMDS), at least beyond this time period. The flow cytometric characteristics (forward and side scatter) within the blast gate for the CD34+ cells were similar for the MDS and normal cells.

Table 2

Clinical features of patients with myelodysplastic syndromes

Clinical featureIPSS Low; Int-1 (n = 24)IPSS Int-2; High (n = 11)Stable MDSDeveloped AMLTotal
No. of patients 12; 12 7; 4 23 12 35 
Median age, y (range) 68 (45-88) 72 (57-81) 71 (51-88) 67 (45-77) 69 (45-88) 
FAB      
    RA; RARS 21; 2 0; 0 18; 2 3; 0 21; 2 
    RAEB-1 
    RAEB-2; RAEBT 3; 6 1; 0 2; 6 3; 6 
Cytogenetics      
    Normal 12 
    del(5q) 10 
    Abnormal, non-del(5q) 13 
Median survival, mo (range) 45 (12-102) 11 (2-27) 46 (5-102) 12 (2-27) 35 (2-102) 
AML evolution      
    Yes 12 12 
    No 21 23 23 
IPSS Low; Int-1 12; 12 12; 9 0; 3 12; 12 
IPSS Int-2; High 7; 4 2; 0 5; 4 7; 4 
Clinical featureIPSS Low; Int-1 (n = 24)IPSS Int-2; High (n = 11)Stable MDSDeveloped AMLTotal
No. of patients 12; 12 7; 4 23 12 35 
Median age, y (range) 68 (45-88) 72 (57-81) 71 (51-88) 67 (45-77) 69 (45-88) 
FAB      
    RA; RARS 21; 2 0; 0 18; 2 3; 0 21; 2 
    RAEB-1 
    RAEB-2; RAEBT 3; 6 1; 0 2; 6 3; 6 
Cytogenetics      
    Normal 12 
    del(5q) 10 
    Abnormal, non-del(5q) 13 
Median survival, mo (range) 45 (12-102) 11 (2-27) 46 (5-102) 12 (2-27) 35 (2-102) 
AML evolution      
    Yes 12 12 
    No 21 23 23 
IPSS Low; Int-1 12; 12 12; 9 0; 3 12; 12 
IPSS Int-2; High 7; 4 2; 0 5; 4 7; 4 

Median follow-up was 4.3 years (range, 0.2-8.5 years).

GEPs

MDS versus normal.

Using SAM evaluation, 1175 genes were found to be significantly differentially expressed (FDR = 10%). Of these, 953 genes were overexpressed in MDS and 222 were underexpressed. The median fold change was 2.2 (1.3 to 36) and −2.24 (−1.56 to −33), respectively.

Unsupervised hierarchical clustering using this gene set (Figure 1) clearly separated normal from MDS-derived CD34+ cells and separated MDS patients into 2 major branches, with distinctive signatures derived from their respective GEP clusters. One branch of the MDS patients, highly enriched for those patients who subsequently transformed to AML (tMDS) during follow-up, had a distinctive cluster of overexpressed genes. This patient subgroup, grouped farthest from normal, was composed of 14 patients (top dendrites), within which 10 of the 12 tMDS patients were present. Of interest, only 9 of these patients were classified clinically as having higher risk disease (ie, IPSS Int-2 or High). One of these patients, who did not progress to AML, had subsequently received an allogeneic marrow transplantation 4 months after his marrow sample was obtained. In the other major branch were located the remaining 21 patients, 19 of whose disease remained stable (sMDS), adjacent in the dendrogram, closer to the normals (18 of whom clinically had lower risk IPSS status). Within this patient group, the subgroup of 10 patients with del(5q) abnormalities were present and separable by their distinctive GEP.

Figure 1

GEPs of MDS versus normal CD34+ marrow cells. This unsupervised hierarchical cluster dendrogram depicts differential branches and GEPs from normal and MDS patients (FDR = 10%). Indicated are the clinical and cytogenetic characteristics of these patients as well as whether they subsequently developed AML (purple) or remained stable (blue or brown). Brown dendrites from the patient arrays were from patients with del(5q) MDS.

Figure 1

GEPs of MDS versus normal CD34+ marrow cells. This unsupervised hierarchical cluster dendrogram depicts differential branches and GEPs from normal and MDS patients (FDR = 10%). Indicated are the clinical and cytogenetic characteristics of these patients as well as whether they subsequently developed AML (purple) or remained stable (blue or brown). Brown dendrites from the patient arrays were from patients with del(5q) MDS.

Close modal

We used PAM (Figure 2) to identify a minimal classifier distinguishing MDS from normal requiring 39 genes, 26 of which were overexpressed and 13 underexpressed in MDS (Table 3). In cross-validation, 5 of 6 healthy persons were classified correctly as were all 35 of the MDS samples. All of the PAM significant classifier genes resided within the most significantly expressed group of SAM significant genes (ie, at FDR = 1%, seen in supplemental Table 1, available on the Blood website; see the Supplemental Materials link at the top of the online article).

Figure 2

Distinctive classification of MDS from normal using PAM. As indicated (top panel), the classifier distinguishing these groups of persons required a minimum number of 39 genes (the arrow shows the inflection point, below which the misclassification error increases). The specific genes are listed in Table 3. In cross-validation (bottom panel), 5 of 6 healthy persons were classified correctly, as were all 35 MDS samples.

Figure 2

Distinctive classification of MDS from normal using PAM. As indicated (top panel), the classifier distinguishing these groups of persons required a minimum number of 39 genes (the arrow shows the inflection point, below which the misclassification error increases). The specific genes are listed in Table 3. In cross-validation (bottom panel), 5 of 6 healthy persons were classified correctly, as were all 35 MDS samples.

Close modal
Table 3

MDS versus normal PAM significant genes

SymbolCytogenetic bandFold change
Overexpressed (n = 26)   
    PLG (4) 6q26 5.58 
    GSPT1 (11)* 16p13.1 5.09 
    DNAJB6 (5) 4.89 
    IL8 (6) 17q21.31 4.64 
    SSB (10) 2q31.1 4.35 
    GARS (11) 7p15 4.33 
    C9orf114 (13) 9q34.1 4.21 
    ZNF32 (10)* 10q22-25 4.10 
    SS18L2 (10) 3p21 4.05 
    RPS13 (11)* 11p15 3.96 
    CPE (3)* 4q32.3 3.95 
    FUSIP1 (10)* 1p36.11 3.54 
    SAP30BP (10)* 17q25.1 3.53 
    KRT18 (2)* 12q13 3.52 
    DDX17 (11)* 22q13.1 3.38 
    CPM (3) 2q14.3 3.37 
    MKNK2 (6) 19p13.3 3.32 
    RFNG (13)* 17q25 3.25 
    SLC25A37 (2) 8p21.2 3.18 
    IMPDH1 (3)* 7q31.3-32 3.17 
    PPP2R2B (6) 5q31-32 2.96 
    RGS16 (6) 1q25-31 2.92 
    HTR2A (2) 13q14-21 2.92 
    ENSA (2) 1q21.2 2.86 
    DLG2 (2) 11q14.1 2.84 
    TBC1D13 (6) 9q34.11 2.44 
Underexpressed (n = 13)   
    ADD2 (2) 2p13-14 −30.47 
    OPHN1 (9) Xq12 −28.70 
    COBL (9) 7p12.1 −18.21 
    ENG (1)* 9q33-34.1 −11.78 
    COG3 (2)* 13q14.12 −11.34 
    CLID:2030430 (13) — −11.29 
    CLID:433230 (13) — −10.21 
    LEPREL1 (3,4)* 3q28 −8.56 
    ANGEL2 (9) 1q32.3 −7.52 
    CLID:826137 (13) — −4.67 
    RIMS2 (6) 8q22.3 −3.13 
SymbolCytogenetic bandFold change
Overexpressed (n = 26)   
    PLG (4) 6q26 5.58 
    GSPT1 (11)* 16p13.1 5.09 
    DNAJB6 (5) 4.89 
    IL8 (6) 17q21.31 4.64 
    SSB (10) 2q31.1 4.35 
    GARS (11) 7p15 4.33 
    C9orf114 (13) 9q34.1 4.21 
    ZNF32 (10)* 10q22-25 4.10 
    SS18L2 (10) 3p21 4.05 
    RPS13 (11)* 11p15 3.96 
    CPE (3)* 4q32.3 3.95 
    FUSIP1 (10)* 1p36.11 3.54 
    SAP30BP (10)* 17q25.1 3.53 
    KRT18 (2)* 12q13 3.52 
    DDX17 (11)* 22q13.1 3.38 
    CPM (3) 2q14.3 3.37 
    MKNK2 (6) 19p13.3 3.32 
    RFNG (13)* 17q25 3.25 
    SLC25A37 (2) 8p21.2 3.18 
    IMPDH1 (3)* 7q31.3-32 3.17 
    PPP2R2B (6) 5q31-32 2.96 
    RGS16 (6) 1q25-31 2.92 
    HTR2A (2) 13q14-21 2.92 
    ENSA (2) 1q21.2 2.86 
    DLG2 (2) 11q14.1 2.84 
    TBC1D13 (6) 9q34.11 2.44 
Underexpressed (n = 13)   
    ADD2 (2) 2p13-14 −30.47 
    OPHN1 (9) Xq12 −28.70 
    COBL (9) 7p12.1 −18.21 
    ENG (1)* 9q33-34.1 −11.78 
    COG3 (2)* 13q14.12 −11.34 
    CLID:2030430 (13) — −11.29 
    CLID:433230 (13) — −10.21 
    LEPREL1 (3,4)* 3q28 −8.56 
    ANGEL2 (9) 1q32.3 −7.52 
    CLID:826137 (13) — −4.67 
    RIMS2 (6) 8q22.3 −3.13 

Numbers in parentheses correspond to the biologic process codes in Table 1.

— indicates not applicable.

*

In significant biologic process using hypergeometric analysis.

tMDS and sMDS versus normal.

Analysis of GEPs between tMDS and normal and between sMDS and normal demonstrated 1008 and 1052 significantly differentially expressed genes, respectively (FDR = 10%). PAM analysis was used to determine highly differentially expressed gene subset classifiers for tMDS versus normal and sMDS versus normal. This analysis showed distinct segregation between both tMDS and sMDS from normal. The classifier distinguishing tMDS from normal required a minimum of 19 genes and between sMDS and normal required 49 genes. In cross-validation, 5 of 6 normal persons were classified correctly as were all 12 of the tMDS and all 23 of the sMDS samples (6 and 36 genes were unique, respectively, and 13 were concordant). The specific classifier genes for these 2 MDS subgroups are shown in supplemental Tables 2 and 3. SAM analysis depicted 1008 genes differentially expressed at FDR less than 10%, 96 genes at 1%, which also encompassed all of the PAM significant genes. SAM analysis of tMDS versus sMDS revealed 11 highly differentially expressed genes (q value < 10%), overexpressed in tMDS, 5 of which coded for ribosomal proteins (RPs: RPS4X, RPS19, RPS20, RPL6, RPL23, kallikrein-related peptidase 3 [KLK3], tripeptidyl-peptidase II [TPP2], COPB1, SHKBP1, CLID:307029, and CLID:897670).

To determine genes potentially involved with disease progression, we performed 2 further statistical analyses. A “time-dependent AML evolution” analysis was performed using SAM to identify the differentially expressed genes that related to the patients' leukemic transformation. This analysis demonstrated 12 significantly differentially expressed genes at FDR less than or equal to 10%, 7 of which coded for RPs (Table 4). In addition, a multiclass progression analysis was performed comparing concordantly expressed genes, which were increased more than or equal to 1.5-fold in sMDS versus normal and a further more than or equal to 1.5-fold increment in tMDS versus sMDS. This analysis demonstrated 26 differentially overexpressed genes (including 8 coding for RPs) to be highly significantly associated (FDR = 1%) with potential for disease progression (Figure 3; Table 5) and 174 genes at less than or equal to 10% FDR.

Table 4

Time-dependent AML evolution analysis: highly significant differentially expressed genes

SymbolCytobandq value, percentage
RPS19 (11)* 19q13.2 
RPS20 (11)* 8q12 
RPS4X (11)* Xq13.1 
CLID:307029 (13) — 
TPP2 (3) 13q32-q33 
KLK3 (2) 19q13.41 
CLID:2467016 (13) — 
RPL23 (11)* 17q 8.74 
RPS25 (11)* 11q23.3 8.74 
RPL23 (11)* 17q 8.74 
SHKBP1 (6) 19q13.2 8.74 
RPL6 (11)* 12q24.1 8.74 
RPL5 (11)* 1p22.1 12.10 
ANKRD13D (2) 11q13.1 19.67 
CLID:810600 (13) — 19.67 
COPB1 (2) 11p15.2 19.67 
EEF2 (11)* 19pter-q12 22.48 
GMFG (2) 19q13.2 22.48 
MAN2A1 (3) 5q21-q22 22.48 
C7orf49 (13) 7q33 22.48 
SLC25A6 (2) Xp22.32 and Yp11.3 22.48 
SymbolCytobandq value, percentage
RPS19 (11)* 19q13.2 
RPS20 (11)* 8q12 
RPS4X (11)* Xq13.1 
CLID:307029 (13) — 
TPP2 (3) 13q32-q33 
KLK3 (2) 19q13.41 
CLID:2467016 (13) — 
RPL23 (11)* 17q 8.74 
RPS25 (11)* 11q23.3 8.74 
RPL23 (11)* 17q 8.74 
SHKBP1 (6) 19q13.2 8.74 
RPL6 (11)* 12q24.1 8.74 
RPL5 (11)* 1p22.1 12.10 
ANKRD13D (2) 11q13.1 19.67 
CLID:810600 (13) — 19.67 
COPB1 (2) 11p15.2 19.67 
EEF2 (11)* 19pter-q12 22.48 
GMFG (2) 19q13.2 22.48 
MAN2A1 (3) 5q21-q22 22.48 
C7orf49 (13) 7q33 22.48 
SLC25A6 (2) Xp22.32 and Yp11.3 22.48 

Numbers in parentheses correspond to the biologic process codes in Table 1.

— indicates not applicable.

*

In significant biologic process using hypergeometric analysis.

Figure 3

Multiclass analysis of gene expression in normal persons, sMDS, and tMDS. Comparison of concordantly expressed genes, which were increased more than or equal to 1.5-fold in sMDS versus normal and a further more than or equal to 1.5-fold increment in tMDS versus sMDS demonstrated 26 differentially overexpressed genes to be highly significantly associated with potential for disease progression (FDR = 1%).

Figure 3

Multiclass analysis of gene expression in normal persons, sMDS, and tMDS. Comparison of concordantly expressed genes, which were increased more than or equal to 1.5-fold in sMDS versus normal and a further more than or equal to 1.5-fold increment in tMDS versus sMDS demonstrated 26 differentially overexpressed genes to be highly significantly associated with potential for disease progression (FDR = 1%).

Close modal
Table 5

Progressively overexpressed genes in multiclass analysis of tMDS versus sMDS versus normal marrow CD34+ cells (depicted in Figure 3): n = 26, FDR = 1%

SymbolCytogenetic bandBiologic process*
RPS28 19p13.2 11 
RPS24 10q22-q23 11 
RPS19 19q13.2 11 
RPS16 19q13.1 11 
RPS13 11p15 11 
RPS4X Xq13.1 11 
RPL36AL 14q21 11 
RPL23 17q 11 
GSPT1 16p13.1 11 
RBBP6 16p12.2 11 
NR4A2 2q22-23 10 
PSMC3 11p12-13 10 
STRA13 17q25.3 10 
EEF2 19pter-q12 10 
FOXK2 17q25 10 
MKNK2 19p13.3 
GAL3ST1 22q12.2 
TPP2 13q32-33 
UQCRB 8q22 
IMPDH1 7q31.3-32 
KRT18 12q13 
PARVB 22q13.2-13.33 
CLID:2566815 — — 
CLID:344589 — — 
CLID:80265 — — 
CLID:969844 — — 
SymbolCytogenetic bandBiologic process*
RPS28 19p13.2 11 
RPS24 10q22-q23 11 
RPS19 19q13.2 11 
RPS16 19q13.1 11 
RPS13 11p15 11 
RPS4X Xq13.1 11 
RPL36AL 14q21 11 
RPL23 17q 11 
GSPT1 16p13.1 11 
RBBP6 16p12.2 11 
NR4A2 2q22-23 10 
PSMC3 11p12-13 10 
STRA13 17q25.3 10 
EEF2 19pter-q12 10 
FOXK2 17q25 10 
MKNK2 19p13.3 
GAL3ST1 22q12.2 
TPP2 13q32-33 
UQCRB 8q22 
IMPDH1 7q31.3-32 
KRT18 12q13 
PARVB 22q13.2-13.33 
CLID:2566815 — — 
CLID:344589 — — 
CLID:80265 — — 
CLID:969844 — — 

— indicates not applicable.

*

Numbers correspond to the biologic process codes in Table 1.

Association of clinical and molecular features with AML transformation

We then determined a “poor risk” gene signature by including genes demonstrated to be highly significantly differentially expressed (FDR < 10%) in all of the 3 following analyses (as described in “tMDS and sMDS versus normal”): time-dependent AML evolution analysis, tMDS versus normal, and multiclass progression analysis (supplemental Table 4, methodologic details). This evaluation demonstrated a group of 6 overexpressed genes (Table 6). Of note, included in this list were 4 coding for RP genes.

Table 6

“Poor risk” gene signature: concordant in time-dependent AML evolution analysis, tMDS versus normal, and multiclass MDS progression analysis

SymbolCytogenetic bandFold change*Biologic process
RPL23§ 17q 17.0 11 
RPS4X§ Xq13.1 16.9 11 
RPS25 11q23.3 15.9 11 
RPS19§ 19q13.2 7.8 11 
KLK3 19q13.41 5.9 
TPP2 13q32-33 2.6 
SymbolCytogenetic bandFold change*Biologic process
RPL23§ 17q 17.0 11 
RPS4X§ Xq13.1 16.9 11 
RPS25 11q23.3 15.9 11 
RPS19§ 19q13.2 7.8 11 
KLK3 19q13.41 5.9 
TPP2 13q32-33 2.6 
*

Fold expression change in tMDS versus normal, 1% or 10% FDR.

Numbers correspond to the biologic process codes in Table 1.

Gene present in significant biologic process using hypergeometric analysis.

§

tMDS versus normal, 1% FDR.

tMDS versus sMDS, SAM significant at < 10%.

Kaplan-Meier curves are shown, which evaluated freedom from AML evolution for patients classified clinically (using IPSS categories; Figure 4A), by their subgrouping in the unsupervised GEP Figure 1 dendrogram (Figure 4B; ie, evaluating the 14 distal patients vs remaining 21 MDS patients in Figure 1), and categorized by their overexpressing (or not) genes comprising the “poor risk” gene signature (Figure 4C; “Methods”). In Figure 4B, analysis of the 14 distal patients in this gene set (group 2) versus the remaining stable (sMDS) patients (group 1) was prognostic, showing increased leukemic transformation in group 2. The GEP was distinct from clinical evaluation (Figure 4A), as only 9 of the 14 GEP high risk subgroup patients (group 2) were clinically higher risk, ie, IPSS Int-2 or High; only 8 were RAEB-2 or RAEB-T.

Figure 4

Freedom from AML evolution for MDS patients classified by clinical and molecular features. Evaluation was performed using (A) clinical features (ie, IPSS categories, P < .001), (B) subgrouping in the unsupervised GEP Figure 1 dendrogram (the 14 distal patients [group 2] vs the remaining 21 MDS patients [group 1]; P = .005), and (C) subgrouping by the overexpression (or not) of genes composing the poor risk gene signature (PRS; P = .01). The Kaplan-Meier curves show significant differences in AML progression using each of these analyses, with significant separation of the IPSS Int-1 subgroup using the poor risk signature (C).

Figure 4

Freedom from AML evolution for MDS patients classified by clinical and molecular features. Evaluation was performed using (A) clinical features (ie, IPSS categories, P < .001), (B) subgrouping in the unsupervised GEP Figure 1 dendrogram (the 14 distal patients [group 2] vs the remaining 21 MDS patients [group 1]; P = .005), and (C) subgrouping by the overexpression (or not) of genes composing the poor risk gene signature (PRS; P = .01). The Kaplan-Meier curves show significant differences in AML progression using each of these analyses, with significant separation of the IPSS Int-1 subgroup using the poor risk signature (C).

Close modal

Of note, analysis of the impact of the poor risk signature on clinical outcome in the 12 patients having the IPSS Int-1 subtype indicated that, whereas 3 of 6 patients in this clinical group who transformed to AML overexpressed the poor risk signature, all 6 of the patients who lacked this overexpression remained stable (Figure 4C). Those patients in the IPSS Int-2/High risk groups overexpressed the poor risk signature genes, whereas this molecular feature was not present in the low-risk patient group (Figure 4C). These curves all demonstrated significant differences in freedom from AML evolution.

del(5q) MDS versus normal

GEP analysis performed on the dataset comprising del(5q) MDS (n = 10) versus normal (n = 6) demonstrated 540 genes to be significantly differentially expressed (FDR = 10%; supplemental Figure 1). Of these genes, 506 were overexpressed in del(5q) patients and 34 were underexpressed. The median fold change was 3.0 (1.4 to 40) and −3.2 (−2.27 to −71), respectively. The genes that were most significantly overexpressed included GSPT1, ENDOG, ENSA, HCNGP, and SS18L2; those significantly underexpressed were ENG, COG3, COBL, HBA2, and R19275. No clear GEP differences were found between those del(5q) patients before (n = 7) versus after (n = 3) lenalidomide treatment or those with cytogenetic lesions in addition to del(5q) (n = 4). PAM analysis classified del(5q) versus normal well, with the classifier requiring a minimum of 33 genes, 27 overexpressed and 6 underexpressed in del(5q) (Table 7).

Table 7

del(5q) MDS versus normal PAM significant genes

SymbolCytogenetic bandFold change
Overexpressed (n = 27)   
    DFFA (5)* 1p36.2-36.3 18.49 
    CLID:72441 (13) — 10.80 
    AKT3 (6)* 1q43-44 9.41 
    KIAA1715 (13) 2q31 6.40 
    CLID:725978 (13) — 5.67 
    CPE (3) 4q32.3 5.57 
    CCL20 (6)* 2q33-37 5.45 
    SLC37A4 (6) 11q23.3 5.22 
    ENDOG (1) 9q34.11 4.89 
    GSPT1 (11) 16p13.1 4.71 
    KTI12 (13) 1p32.3 4.64 
    SAP30BP (10) 17q25.1 4.64 
    SS18L2 (10) 3p21 4.55 
    CNDP2 (3) 18q22.3 4.43 
    FUSIP1 (10)* 1p36.11 4.35 
    GARS (11) 7p15 4.35 
    TMEM170 (2) 16q23.1 3.82 
    ENSA (2) 1q21.2 3.79 
    CLID:868199 (13) — 3.79 
    CLID:53203 (13) — 3.70 
    TITF1 (10) 14q13 3.59 
    CLID:85580 (13) — 3.55 
    SPATA17 (13) 1q41 3.55 
    CLID:1087325 (13) — 3.51 
    DDX17 (11) 22q13.1 3.40 
    WDR1 (2) 4p16.1 3.23 
Underexpressed (n = 6)   
    CLID:129986 (13) — −71.03 
    CLID:124661 (13) — −38.47 
    CLID:207558 (13) — −22.13 
    COBL (9) 7p12.1 −18.66 
    ENG (1)* 9q33-34.1 −18.10 
    COG3 (2)* 13q14.1 −14.38 
SymbolCytogenetic bandFold change
Overexpressed (n = 27)   
    DFFA (5)* 1p36.2-36.3 18.49 
    CLID:72441 (13) — 10.80 
    AKT3 (6)* 1q43-44 9.41 
    KIAA1715 (13) 2q31 6.40 
    CLID:725978 (13) — 5.67 
    CPE (3) 4q32.3 5.57 
    CCL20 (6)* 2q33-37 5.45 
    SLC37A4 (6) 11q23.3 5.22 
    ENDOG (1) 9q34.11 4.89 
    GSPT1 (11) 16p13.1 4.71 
    KTI12 (13) 1p32.3 4.64 
    SAP30BP (10) 17q25.1 4.64 
    SS18L2 (10) 3p21 4.55 
    CNDP2 (3) 18q22.3 4.43 
    FUSIP1 (10)* 1p36.11 4.35 
    GARS (11) 7p15 4.35 
    TMEM170 (2) 16q23.1 3.82 
    ENSA (2) 1q21.2 3.79 
    CLID:868199 (13) — 3.79 
    CLID:53203 (13) — 3.70 
    TITF1 (10) 14q13 3.59 
    CLID:85580 (13) — 3.55 
    SPATA17 (13) 1q41 3.55 
    CLID:1087325 (13) — 3.51 
    DDX17 (11) 22q13.1 3.40 
    WDR1 (2) 4p16.1 3.23 
Underexpressed (n = 6)   
    CLID:129986 (13) — −71.03 
    CLID:124661 (13) — −38.47 
    CLID:207558 (13) — −22.13 
    COBL (9) 7p12.1 −18.66 
    ENG (1)* 9q33-34.1 −18.10 
    COG3 (2)* 13q14.1 −14.38 

Numbers in parentheses correspond to the biologic process codes in Table 1.

— indicates not applicable.

*

In significant biologic process using hypergeometric analysis.

del(5q) MDS versus non-del(5q) MDS

GEP analysis was performed comparing the dataset composed of del(5q) MDS (n = 10) versus non-del(5q) MDS patients (n = 25). A total of 1924 genes were found to be significantly differentially expressed (FDR = 10%); 1014 were underexpressed and 901 were overexpressed in del(5q) MDS. The median fold change was 2.28 (1.4 to 54) and −1.89 (−1.11 to −18), respectively. An unsupervised hierarchical clustering dendrogram using these genes showed distinct differences in the GEPs between del(5q) and non-del(5q) MDS patients (supplemental Figure 2). The 10 underexpressed genes within the CDR were AFF4, KIF3A, TGFBI, VDAC1, TCF7, GFRA3, HARSL, ATOX1, FBXO38, and FGFR4.

Functional analyses

The functional categories and biologic processes in which the differentially expressed genes were engaged in MDS versus normal persons (SAM analysis, Figure 1), as determined by Gene Ontology (Table 1) and GSEA, demonstrated a predominance of genes (66%) involved with transcription, cytoskeletal, metabolism, and signaling/transport (at FDR = 10%). Analysis of the most highly differentially expressed genes (ie, at FDR = 1%) demonstrated 96 genes, of which 59% were involved in these same biologic processes (supplemental Table 1). In addition, the genes within these processes were also overrepresented in our dataset compared with the total genes present within the process (using hypergeometric analysis).

GSEA

We subjected our 11 000 gene set to GSEA analysis to identify highly represented differentially expressed genes within our dataset that were common to those in gene sets present within curated public databases. We compared our rank-ordered list of MDS versus normal genes to 412 gene sets obtained from Molecular Signature Database, a database detailing which genes were involved in specific biologic processes. These gene sets were associated with the 12 distinct cellular processes relevant to our MDS dataset (Table 1). Significantly increased numbers of genes involved with RP biosynthesis, Myc and Wnt signaling pathways were present in tMDS patients compared with normal (Table 8; supplemental Figure 3). This contrasted with increased levels of apoptosis-related genes present in sMDS compared with normal persons (supplemental Figure 3). Further, in contrast to the relative overexpression of the ribosomal, Myc and Wnt target genes in tMDS versus normal, these genes were relatively underexpressed in del(5q)MDS versus other MDS patients (Table 8). Table 9 shows the representative gene sets within the public databases related to our tMDS versus sMDS dataset, also demonstrating the predominantly enriched translational (including ribosomal)-, Myc-, and Wnt-related gene sets in tMDS versus sMDS.

Table 8

GSEA analysis: proportions of overexpressed ribosomal genes and Myc and Wnt target genes in specified datasets

Gene setsGenes within biologic process of the gene settMDS vs normal, no. (%)del(5q) MDS vs other MDS
“Ribosomal”* 492 51 (94) 80 (18) 
Myc targets 189 20 (100) 23 (17) 
Wnt targets 224 15 (80) 23 (22) 
Gene setsGenes within biologic process of the gene settMDS vs normal, no. (%)del(5q) MDS vs other MDS
“Ribosomal”* 492 51 (94) 80 (18) 
Myc targets 189 20 (100) 23 (17) 
Wnt targets 224 15 (80) 23 (22) 
*

Includes all genes involved in translational processes.

P < .001, χ2 statistic.

Table 9

Representative gene sets within public databases related to our tMDS versus normal dataset (11 000 genes): GSEA

Gene set nameGenes in curated gene setGenes in our gene setESFDR q valueDescription of gene set
Myc-related      
    LEE_MYC_UP 54 30 0.49 0.023 Genes Myc up-regulated in mouse and human hepatocellular carcinoma 
    FERNANDEZ_MYC_TARGETS 180 105 0.41 0.008 Regulatory and biologic diversity among Myc-target genes 
    MENSSEN_MYC_UP 34 17 0.60 0.011 Genes up-regulated by MYC in human umbilical vein endothelial cells 
    MYC_TARGETS 42 20 0.57 0.008 Regulatory networks of Myc-responsive genes 
    SCHUMACHER_MYC_UP 54 30 0.52 0.011 Multiple functions of Myc and its target genes 
Translational      
    BRENTANI_PROTEIN_MODIFICATION 150 78 0.32 0.094 Cancer-related genes involved in protein modification 
    MRNA_PROCESSING 47 32 0.64 0.000 Genes involved in mRNA processing (Broad Institute)* 
    MRNA_SPLICING 58 32 0.51 0.012 Genes involved in mRNA splicing (Broad Insitute)* 
    TRNA_ SYNTHETASES 20 16 0.59 0.014 tRNA synthetases (Broad Institute)* 
    RIBOSOMAL_PROTEINS 123 39 0.78 0.000 Genes curated by GenMapp2.1 
    MRNA_PROCESSING_REACTOME 121 66 0.53 0.000 Genes curated by GenMapp2.1 
    TRANSLATION_ FACTORS 52 27 0.59 0.002 Genes curated by GenMapp2.1 
Wnt-related      
    KENNY_WNT_UP 51 19 0.43 0.142 Genes deregulated by Wnt in murine mammary epithelial cells 
    LIN_WNT_UP 56 31 0.36 0.182 Wnt target genes associated with MLL in human leukemia up-regulated in human colon cancer 
Gene set nameGenes in curated gene setGenes in our gene setESFDR q valueDescription of gene set
Myc-related      
    LEE_MYC_UP 54 30 0.49 0.023 Genes Myc up-regulated in mouse and human hepatocellular carcinoma 
    FERNANDEZ_MYC_TARGETS 180 105 0.41 0.008 Regulatory and biologic diversity among Myc-target genes 
    MENSSEN_MYC_UP 34 17 0.60 0.011 Genes up-regulated by MYC in human umbilical vein endothelial cells 
    MYC_TARGETS 42 20 0.57 0.008 Regulatory networks of Myc-responsive genes 
    SCHUMACHER_MYC_UP 54 30 0.52 0.011 Multiple functions of Myc and its target genes 
Translational      
    BRENTANI_PROTEIN_MODIFICATION 150 78 0.32 0.094 Cancer-related genes involved in protein modification 
    MRNA_PROCESSING 47 32 0.64 0.000 Genes involved in mRNA processing (Broad Institute)* 
    MRNA_SPLICING 58 32 0.51 0.012 Genes involved in mRNA splicing (Broad Insitute)* 
    TRNA_ SYNTHETASES 20 16 0.59 0.014 tRNA synthetases (Broad Institute)* 
    RIBOSOMAL_PROTEINS 123 39 0.78 0.000 Genes curated by GenMapp2.1 
    MRNA_PROCESSING_REACTOME 121 66 0.53 0.000 Genes curated by GenMapp2.1 
    TRANSLATION_ FACTORS 52 27 0.59 0.002 Genes curated by GenMapp2.1 
Wnt-related      
    KENNY_WNT_UP 51 19 0.43 0.142 Genes deregulated by Wnt in murine mammary epithelial cells 
    LIN_WNT_UP 56 31 0.36 0.182 Wnt target genes associated with MLL in human leukemia up-regulated in human colon cancer 

This list includes gene sets identified as strongly correlated Myc-, Wnt-, and translational-activated genes by GSEA. The 14 curated gene sets listed are highly enriched relevant datasets with FDR < 0.20. Enrichment score (ES) is a statistical measure reflecting the degree of correlation between genes within specific public datasets to those genes highly represented within the tMDS versus normal dataset. NES is an ES normalized for the size of the gene set.

*

The tMDS versus normal dataset was compared with curated database of gene sets published in the Molecular Signature Database (MsigDB; Broad Institute; http://www.broadinstitute.org/gsea/msigdb/index.jsp).

Gladstone Institutes, University of California at San Francisco, http://www.wikipathways.org/index.php.

Biologic processes

To further clarify the differential expression of specific groups of genes within patient subgroups, we analyzed the genes that were represented in the “poor risk” signature. Because RPs were overrepresented within the signature and in GSEA, we assessed the representation of the entire group of RPs (70 total) and found 37 of them to be differentially expressed. Of interest, these ribosomal genes were all overexpressed in comparisons of MDS versus normal and tMDS versus sMDS (Figure 5A-B), whereas they were underexpressed in the del(5q) group versus the remainder of MDS (Figure 5C). The relative expression of 14 RPs concordantly expressed in the 3 compared subgroups is shown in Figure 5D, including 3 of the 4 RPs within the poor risk signature (RPS4X, RPS25, and RPL23). These genes were also overrepresented as determined by hypergeometric analysis.

Figure 5

Differentially expressed RP expression in MDS subsets. (A) MDS versus normal. (B) tMDS versus sMDS. (C) del(5q) versus non del(5q) MDS. (D) Comparative expression of 14 RPs in these MDS subsets. These data demonstrated increased RP expression in MDS versus normal and in tMDS versus sMDS in contrast to their underexpression in del(5q) MDS versus other MDS patients.

Figure 5

Differentially expressed RP expression in MDS subsets. (A) MDS versus normal. (B) tMDS versus sMDS. (C) del(5q) versus non del(5q) MDS. (D) Comparative expression of 14 RPs in these MDS subsets. These data demonstrated increased RP expression in MDS versus normal and in tMDS versus sMDS in contrast to their underexpression in del(5q) MDS versus other MDS patients.

Close modal

Quantitative RT-PCR

RT-PCR analysis of 7 representative genes, including 5 from the “poor risk” signature (Figure 6), showed similar relative levels of altered gene expression compared with the data generated by the microarray determinations obtained from 9 patients (5 tMDS, 4 sMDS) and 4 healthy persons for which there was adequate remaining material. Noteworthy are the relatively differing expression levels of relevant genes between tMDS and sMDS patients, as also demonstrated by microarray analysis. For example, increased expression was noted from tMDS versus sMDS patients for those genes in the poor risk signature (RPL23, RPS4X, RPS19, RPS25, and TPP2). Combined tMDS and sMDS patients had higher and similar expression levels than healthy persons for GARS and GSPT1 (supplemental Table 5).

Figure 6

Expression of representative genes assessed by quantitative RT-PCR. Comparison of the relative expression levels obtained from RT-PCR and cDNA microarray experiments for 5 genes present in the “poor risk” signature from CD34+ marrow cells from patients with MDS and healthy persons. Demonstrated are the similar degrees of expression for these genes using both analytic methods, as related to the reference standard (mean ± SEM in log2 scale). Also shown are the differing levels of expression of these genes in tMDS (n = 5, increased) versus sMDS (n = 4, decreased) patients, which are further decreased in healthy persons (n = 4).

Figure 6

Expression of representative genes assessed by quantitative RT-PCR. Comparison of the relative expression levels obtained from RT-PCR and cDNA microarray experiments for 5 genes present in the “poor risk” signature from CD34+ marrow cells from patients with MDS and healthy persons. Demonstrated are the similar degrees of expression for these genes using both analytic methods, as related to the reference standard (mean ± SEM in log2 scale). Also shown are the differing levels of expression of these genes in tMDS (n = 5, increased) versus sMDS (n = 4, decreased) patients, which are further decreased in healthy persons (n = 4).

Close modal

Our study provides the initial paper evaluating GEPs from CD34+ marrow cells of MDS patients with prolonged clinical follow-up. SAM was used to differentiate MDS from normal, and then unsupervised hierarchical clustering using this gene set demonstrated 2 major MDS subgroups: those with a high potential to develop AML (tMDS) within 14 months and those whose disease remained stable (sMDS) (Figure 1). These 2 unsupervised GEP subgroups were prognostic and distinct from the clinical evaluation of the patients (Figure 4A-B). This finding led to our subsequent supervised analyses of the tMDS and sMDS patients.

Using a variety of bioinformatic methods and comparative analyses, we demonstrated GEPs valuable for classifying these 2 subgroups of patients and defined a “poor risk” signature of6 genes, which correlated with their subsequent development of leukemia within 14 months (Table 6). This signature also correlated with GEP differences between tMDS and sMDS and showed progressive alterations of expression with more advanced disease status. We demonstrated that patients with overexpression of the genes within the poor risk signature had adverse clinical outcomes (ie, AML transformation). As “controls,” those patients in the IPSS Low and Int-2/High categories had gene signature findings consistent with outcomes generally associated with these features (Figure 4C). Further, of particular note, this association was also evident within the IPSS Int-1 patient group (ie, 3 of 6 such patients overexpressing the signature genes developed AML, whereas all 6 Int-1 patients lacking such expression remained stable). Because clinical determination of prognosis in Int-1 patients remains somewhat problematic, these molecular findings may provide a useful approach to aid evaluation of prognostic features for these patients.

Our data using hierarchical clustering algorithms and dendrograms, obtained from a heterogeneous group of MDS patients, also demonstrated substantial differences in GEPs from their marrow CD34+ cells compared with those from normal persons. These data confirm and extend those from prior studies, which generally had smaller numbers of patients and used a different molecular platform (oligonucleotide arrays rather than the cDNA arrays we used).7-14  These prior studies showed various identity, numbers, and functional correlates of differentially expressed genes in marrow cells from MDS patients, generally with different genes being demonstrated from each study.7-14 

A primary biologic observation in our study was the consistent differential expression of ribosomal transcripts in MDS. We demonstrated that a substantial number of RP genes were overexpressed in MDS and more prominently in tMDS versus normal (Figure 5), in distinction to the decreased ribosomal expression in del(5q) MDS (later in “Discussion”). In addition, 4 of the overexpressed genes within the 6-gene poor risk signature were those generating RPs (RPL23, S4X, S19, and S25; Table 6). These findings reflect much prior information in several other neoplastic conditions in which RP overexpression is associated with disease progression and aggressiveness.27-30  Data are accumulating regarding the extraribosomal functions of RPs, with reports showing relationships between overexpression of genes encoding RPs and cancer.31,32  RPs play a direct role in growth regulation. Of note, RPL23, a negative regulator of a Myc antagonist, promotes cell proliferation.33 

In addition to the overexpressed RPs represented in the poor risk signature are 2 other genes, which generate known proteolytic enzymes: KLK3 and TPP2. Human tissue kallikreins (KLKs or kallikrein-related peptidases) are a family of extracellular serine proteases that act on a wide variety of physiologic substrates, display aberrant expression patterns in several neoplasms, and have been reported as potential cancer biomarkers.34,35  Prostate-specific antigen (also known as KLK3) is the most widely recognized member of this family.34,35 TPP2 is a protease involved in intracellular proteolysis that is up-regulated in irradiated glioblastoma cells, enhancing tumor cell survival and radio-resistance.36 

Confirmation of the relative levels of increased gene expression of specific genes, including those in the poor risk signature assessed by microarray was demonstrated using PCR analysis. Noteworthy are the relatively differing expression levels of relevant genes between tMDS and sMDS patients, as shown by the 2 independent methods.

Using analytic methods accessing public databases (GSEA), we determined functional correlates of these genes and found a high degree of enrichment in MDS (particularly tMDS) compared with normal of the Myc-, Wnt-, and translational-related genes (including RPs) with gene sets in previously curated datasets (Tables 89). Further, using additional methods (hypergeometric analyses) to define the relative overrepresentation of the specific genes, we demonstrated these intracellular hematopoietic signaling pathways known to be associated with leukemic progression or AML (Myc and Wnt)37-41  were also overexpressed in the tMDS patients relative to those with sMDS. The Wnt pathway plays important roles in hematopoiesis and stem cell biology.42  Myc is linked to the Wnt pathway by being an important transcriptional target of β-catenin43  and mediates the hyperproliferative effects of Wnt activation.44  In contrast, genes involved with apoptosis-related pathways were more prominent in the sMDS patient subgroup compared with tMDS. These findings are consistent with prior studies indicating enhanced apoptosis and Myc:Bcl2 oncoprotein expression within the CD34 cell compartment early in MDS, with a switch to a decrease in this process concomitant with disease progression.45  Consistent with these findings, recent data have indicated that deregulation of protein translation (including sustained translation of the Myc oncogene) is critical for leukemic cell survival in AML.46 

For the more global analysis of the functional correlates of the differentially expressed genes from evaluation of MDS versus normal marrow CD34+ cells, 12 general processes were overrepresented, with the main functions (66%) being those involved with transcription, cytoskeletal metabolism, and signaling/transport (Table 1). We also demonstrated groups of novel genes associated with potential for disease progression (ie, tMDS). Prior MDS microarray studies evaluating marrow cells indicated differentially expressed genes to be associated with stress-related protectors, immune processes, signaling, or the cell cycle or apoptotic inhibitors.7-14 

Patients with del(5q) chromosomal abnormalities are a separable MDS subgroup with anemia who have selective beneficial responses to the drug lenalidomide compared with the remainder of MDS patients.47  We found that these patients' marrow CD34+ GEPs were also distinctive; but in contrast to the tMDS patients, these persons generally had decreased RP expression. Prior studies have shown decreased levels of other genes (SPARC)14  and a specific RP (RPS14)48  in this disorder. In addition, consistent with our findings, Pellagatti et al have also shown ribosomal- and translation-related probe sets to be significantly differentially expressed, with approximately 90% of these showing lower expression levels in the 5q− syndrome patient group compared with normal and other MDS patients.12  Several congenital types of anemias (eg, Diamond-Blackfan anemia, dyskeratosis congenita), which have a propensity to develop leukemia, also have mutations or decreases in RP synthesis or expression, including RPS19, S20, L5, and L11.49,50  Of note, the del(5q) patients in our study had decreased expression of genes coding for RPs, including RPS20, L5, and L11 (Figure 5). These are potentially relevant ribosomal relationships between congenital types of anemias (which are not only not clonal hemopathies initially but have highly vulnerable and noncompetitive stem cells) and MDS patients (which are clonal and the clones have out-competed the nonclonal stem cells). A possible explanation for these findings is that initially the del(5q) patients may be comparable with the pre-MDS DBA or dyskeratosis patients, whereas the del(5q) MDS patients have evolved clonally in a way that alters the defect in a maladaptive way. This is consistent with the differentially expressed genes in the del(5q)MDS versus normal and the non-del(5q) MDS comparisons.

Our findings of distinctive marrow CD34+ cell GEPs in MDS patient subsets provide molecular insight into mechanisms underlying the disease and its propensity to progress to a more aggressive stage. Of particular importance in our study was the definition of a “poor risk” signature associated with the propensity of MDS patients to undergo AML transformation. Assessing the impact of multiple molecular abnormalities on disease phenotype, particularly of RPs, by this gene array technology supplements those studies evaluating single-gene analyses and clinical features. These findings, if verified, should prove to be valuable in the future for diagnostically and prognostically classifying such patients. It will be important to expand the number of patients analyzed by these methods and to validate the poor risk signature and GEPs found in this study against an additional cohort of MDS patients. Such studies are ongoing in our laboratory and those of others.

The online version of this article contains a data supplement.

Presented in part at the American Society of Hematology 47th Annual Meeting, Atlanta, GA, December 12, 2005.51 

The publication costs of this article were defrayed in part by page charge payment. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.

After completion of our study and just before submission of our manuscript, we noted the article by Mills K, Kohlmann A, Williams PM, et al. Microarray-based classifiers and prognosis models identify subgroups with distinct clinical outcomes and high risk of AML transformation of myelodysplastic syndrome. Blood. 2009;114(5):1063-1072. These authors used nonenriched mononuclear cells for their analyses rather than CD34+ cells as used in our study.

This study was supported by the Muriel and Ira Coleman Leukemia Research Fund, the William E. Walsh Leukemia Research Fund, the Eugene, Elizabeth and Christina Cronkite Fund for Hematology, California Cancer Research Program (grant 99-00520V-10144), the Leukemia & Lymphoma Society (SCOR grant), Veterans Administration Palo Alto Health Care System (resources and use of facilities), and the National Institutes of Health (R01 grant LM009719; A.J.B.).

National Institutes of Health

Contribution: K.S. performed and designed the research and assisted in writing the manuscript and analyzing the data; D.T.R. analyzed the data and reviewed the manuscript; R.T. provided statistical analysis; A.J.B. provided bioinformatics and statistical analyses; and P.L.G. designed the research, analyzed the data, and wrote the manuscript.

Conflict-of-interest disclosure: The authors declare no competing financial interests.

The current address of D.T.R. is Applied Genomics Inc, Burlingame, CA.

Correspondence: Peter L. Greenberg, Hematology Division, Stanford University Medical Center, 875 Blake Wilbur Dr, Rm 2335, Stanford, CA 94305; e-mail: peterg@stanford.edu.

1
Greenberg
 
P
Greenberg
 
PL
Pathogenetic mechanisms underlying myelodysplastic syndrome.
Myelodysplastic Syndromes: Clinical and Biological Advances
2006
Cambridge, United Kingdom
Cambridge University Press
(pg. 
63
-
93
)
2
Greenberg
 
P
Hoffman
 
RBE
Shattil
 
S
Cohen
 
H
The myelodysplastic syndromes.
Hematology: Basic Principals and Practice
2000
3rd ed
New York, NY
Churchill Livingstone
(pg. 
1106
-
1129
)
3
Greenberg
 
P
Cox
 
C
Le Beau
 
MM
, et al. 
International scoring system for evaluating prognosis in myelodysplastic syndromes.
Blood
1997
, vol. 
89
 
6
(pg. 
2079
-
2088
)
4
Ebert
 
B
Golub
 
TR
Genomic approaches to hematologic malignancies.
Blood
2004
, vol. 
104
 
4
(pg. 
923
-
932
)
5
Bullinger
 
L
Dohner
 
K
Bair
 
E
, et al. 
Use of gene-expression profiling to identify prognostic subclasses in adult acute myeloid leukemia.
N Engl J Med
2004
, vol. 
350
 
16
(pg. 
1605
-
1616
)
6
Marcucci
 
G
Maharry
 
K
Radmacher
 
MD
, et al. 
Prognostic significance of, and gene and microRNA expression signatures associated with, CEBPA mutations in cytogenetically normal acute myeloid leukemia with high-risk molecular features: a Cancer and Leukemia Group B Study.
J Clin Oncol
2008
, vol. 
26
 
31
(pg. 
5078
-
5087
)
7
Miyazato
 
A
Ueno
 
S
Ohmine
 
K
, et al. 
Identification of myelodysplastic syndrome-specific genes by DNA microarray analysis with purified hematopoietic stem cell fraction.
Blood
2001
, vol. 
98
 
2
(pg. 
422
-
427
)
8
Hofmann
 
WK
de Vos
 
S
Komor
 
M
, et al. 
Characterization of gene expression of CD34+ cells from normal and myelodysplastic bone marrow.
Blood
2002
, vol. 
15
 
10
(pg. 
3553
-
3560
100
9
Ueda
 
M
Ota
 
J
Yamashita
 
Y
, et al. 
DNA microarray analysis of stage progression mechanism in myelodysplastic syndrome.
Br J Haematol
2003
, vol. 
123
 
2
(pg. 
288
-
296
)
10
Chen
 
G
Zeng
 
W
Miyazato
 
A
, et al. 
Distinctive gene expression profiles of CD34 cells from patients with myelodysplastic syndrome characterized by specific chromosomal abnormalities.
Blood
2004
, vol. 
104
 
13
(pg. 
4210
-
4218
)
11
Pellagatti
 
A
Cazzola
 
M
Giagounidis
 
AA
, et al. 
Gene expression profiles of CD34+ cells in myelodysplastic syndromes: involvement of interferon-stimulated genes and correlation to FAB subtype and karyotype.
Blood
2006
, vol. 
108
 (pg. 
337
-
345
[erratum: Blood. 2006;108:1128]
12
Pellagatti
 
A
Hellström-Lindberg
 
E
Giagounidis
 
A
, et al. 
Haploinsufficiency of RPS14 in 5q− syndrome is associated with deregulation of ribosomal- and translation-related genes.
Br J Haematol
2008
, vol. 
142
 
1
(pg. 
57
-
64
)
13
Pellagatti
 
A
Jädersten
 
M
Forsblom
 
AM
, et al. 
Lenalidomide inhibits the malignant clone and up-regulates the SPARC gene mapping to the commonly deleted region in 5q− syndrome patients.
Proc Natl Acad Sci U S A
2007
, vol. 
104
 
27
(pg. 
11406
-
11411
)
14
Boultwood
 
J
Pellagatti
 
A
Cattan
 
H
, et al. 
Gene expression profiling of CD34+ cells in patients with the 5q− syndrome.
Br J Haematol
2007
, vol. 
139
 
4
(pg. 
578
-
589
)
15
Miltenyi
 
S
Guth
 
S
Radbruch
 
A
Pfluger
 
E
Thiel
 
A
Isolation of CD34+ hematopoietic progenitor cells by high-gradient magnetic cell sorting (MACS).
Hematopoietic Stem Cells: The Mulhouse Manual
1994
Dayton, OH
AlphaMed Press
(pg. 
201
-
213
)
16
Wang
 
E
Miller
 
L
Ohnmacht
 
GA
, et al. 
High-fidelity mRNA amplification for gene profiling.
Nat Biotechnol
2000
, vol. 
18
 
4
(pg. 
457
-
459
)
17
Eisen
 
MB
Spellman
 
P
Brown
 
PO
Botstein
 
D
Cluster analysis and display of genome-wide expression patterns.
Proc Natl Acad Sci U S A
1998
, vol. 
95
 
25
(pg. 
14863
-
14868
)
18
Sherlock
 
G
Hernandez-Boussard
 
T
Kasarskis
 
A
, et al. 
The Stanford Microarray Database.
Nucleic Acids Res
2001
, vol. 
29
 
1
(pg. 
152
-
155
)
19
Tusher
 
VG
Tibshirani
 
R
Chu
 
G
Significance analysis of microarrays applied to the ionizing radiation response.
Proc Natl Acad Sci U S A
2001
, vol. 
98
 
9
(pg. 
5116
-
5121
)
20
Tibshirani
 
R
Hastie
 
T
Narasimhan
 
B
Chu
 
G
Diagnosis of multiple cancer types by shrunken centroids of gene expression.
Proc Natl Acad Sci U S A
2002
, vol. 
99
 
10
(pg. 
6567
-
6572
)
21
Diehn
 
M
Sherlock
 
G
Binkley
 
G
, et al. 
SOURCE: a unified genomic resource of functional annotations, ontologies, and gene expression data.
Nucleic Acids Res
2003
, vol. 
31
 
1
(pg. 
219
-
223
)
22
Ashburner
 
M
Ball
 
CA
Blake
 
JA
, et al. 
Gene Ontology: tool for the unification of biology.
Nat Genet
2000
, vol. 
25
 
1
(pg. 
25
-
29
)
23
Subramanian
 
A
Tamayo
 
P
Mootha
 
VK
, et al. 
Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.
Proc Natl Acad Sci U S A
2005
, vol. 
102
 
43
(pg. 
15545
-
15550
)
24
Draghici
 
S
Khatri
 
P
Martins
 
RP
, et al. 
Global functional profiling of gene expression.
Genomics
2003
, vol. 
81
 
2
(pg. 
98
-
104
)
25
Holland
 
PM
Abramson
 
RD
Watson
 
R
Gelfand
 
DH
Detection of specific polymerase chain reaction product by utilizing the 5′—3′ exonuclease activity of Thermus aquaticus DNA polymerase.
Proc Natl Acad Sci U S A
1991
, vol. 
88
 
16
(pg. 
7276
-
7280
)
26
Livak
 
KJ
Schmittgen
 
TD
Analysis of relative gene expression data using real-time quantitative PCR and the 2(-Delta Delta C(T)) method.
Methods
2001
, vol. 
25
 
4
(pg. 
402
-
408
)
27
Ruggero
 
D
Pandolfi
 
PP
Does the ribosome translate cancer?
Nat Rev Cancer
2003
, vol. 
3
 
3
(pg. 
179
-
192
)
28
Zhang
 
L
Zhou
 
W
Velculescu
 
VE
, et al. 
Gene expression profiles in normal and cancer cells.
Science
1997
, vol. 
276
 
5316
(pg. 
1268
-
1272
)
29
Bassoe
 
CF
Bruserud
 
O
Pryme
 
IF
Vedeler
 
A
Ribosomal proteins sustain morphology, function and phenotype in acute myeloid leukemia blasts.
Leuk Res
1998
, vol. 
22
 
4
(pg. 
329
-
339
)
30
Uechi
 
T
Tanaka
 
T
Kenmochi
 
N
A complete map of the human ribosomal protein genes: assignment of 80 genes to the cytogenetic map and implications for human disorders.
Genomics
2001
, vol. 
72
 
3
(pg. 
223
-
230
)
31
Warner
 
JR
McIntosh
 
KB
How common are extraribosomal functions of ribosomal proteins?
Mol Cell
2009
, vol. 
34
 
1
(pg. 
3
-
11
)
32
Wang
 
M
Hu
 
Y
Stearns
 
M
RPS2: a novel therapeutic target in prostate cancer.
J Exp Clin Cancer Res
2009
, vol. 
28
 
1
pg. 
6
 
33
Wanzel
 
M
Russ
 
AC
Kleine-Kohlbrecher
 
D
, et al. 
A ribosomal protein L23-nucleophosmin circuit coordinates Mizl function with cell growth.
Nat Cell Biol
2008
, vol. 
10
 
9
(pg. 
1051
-
1061
)
34
Pampalakis
 
G
Sotiropoulou
 
G
Tissue kallikrein proteolytic cascade pathways in normal physiology and cancer.
Biochim Biophys Acta
2007
, vol. 
1776
 
1
(pg. 
22
-
31
)
35
Emami
 
N
Diamandis
 
EP
Utility of kallikrein-related peptidases (KLKs) as cancer biomarkers.
Clin Chem
2008
, vol. 
54
 
10
(pg. 
1600
-
1607
)
36
Bassi
 
C
Mello
 
S
Cardoso
 
R
, et al. 
Transcriptional changes in U343 MG: a glioblastoma cell line exposed to ionizing radiation.
Hum Exp Toxicol
2008
, vol. 
27
 
12
(pg. 
919
-
929
)
37
Coller
 
HA
Grandori
 
C
Tamayo
 
P
, et al. 
Expression analysis with oligonucleotide microarrays reveals that MYC regulates genes involved in growth, cell cycle, signaling, and adhesion.
Proc Natl Acad Sci U S A
2000
, vol. 
97
 
7
(pg. 
3260
-
3265
)
38
Boon
 
K
Caron
 
HN
van Asperen
 
R
, et al. 
N-myc enhances the expression of a large set of genes functioning in ribosome biogenesis and protein synthesis.
EMBO J
2001
, vol. 
20
 
6
(pg. 
1383
-
1393
)
39
Menssen
 
A
Hermeking
 
H
Characterization of the c-MYC-regulated transcriptome by SAGE: identification and analysis of c-MYC target genes.
Proc Natl Acad Sci U S A
2002
, vol. 
99
 
9
(pg. 
6274
-
6279
)
40
Mikesch
 
JH
Steffen
 
B
Berdel
 
WE
, et al. 
The emerging role of Wnt signaling in the pathogenesis of acute myeloid leukemia.
Leukemia
2007
, vol. 
21
 
8
(pg. 
1638
-
1647
)
41
Majeti
 
R
Becker
 
MW
Tian
 
Q
, et al. 
Dysregulated gene expression networks in human acute myelogenous leukemia stem cells.
Proc Natl Acad Sci U S A
2009
, vol. 
106
 
9
(pg. 
3396
-
3401
)
42
Malhotra
 
S
Kincade
 
PW
Canonical Wnt pathway signaling suppresses VCAM-1 expression by marrow stromal and hematopoietic cells.
Exp Hematol
2009
, vol. 
37
 
1
(pg. 
19
-
30
)
43
He
 
TC
Sparks
 
AB
Rago
 
C
, et al. 
Identification of c-MYC as a target of the APC pathway.
Science
1998
, vol. 
281
 
5382
(pg. 
1509
-
1512
)
44
Sansom
 
OJ
Meniel
 
VS
Muncan
 
V
, et al. 
Myc deletion rescues Apc deficiency in the small intestine.
Nature
2007
, vol. 
446
 
7136
(pg. 
676
-
679
)
45
Rajapaksa
 
R
Ginzton
 
N
Rott
 
L
Greenberg
 
PL
Altered oncogene expression and apoptosis in myelodysplastic syndrome marrow cells.
Blood
1996
, vol. 
88
 
11
(pg. 
4275
-
4287
)
46
Carroll
 
M
Taking aim at protein translation in AML.
Blood
2009
, vol. 
114
 
8
(pg. 
1458
-
1459
)
47
List
 
A
Dewald
 
G
Bennett
 
J
, et al. 
Hematologic and cytogenetic response to lenalidomide in myelodysplastic syndrome with chromosome 5q deletion.
N Engl J Med
2006
, vol. 
355
 
14
(pg. 
1456
-
1465
)
48
Ebert
 
BL
Pretz
 
J
Bosco
 
J
, et al. 
Identification of RPS14 as a 5q− syndrome gene by RNA interference screen.
Nature
2008
, vol. 
451
 
7176
(pg. 
335
-
339
)
49
Dokal
 
I
Dyskeratosis congenita in all its forms.
Br J Haematol
2000
, vol. 
110
 
4
(pg. 
768
-
779
)
50
Gazda
 
HT
Sheen
 
MR
Vlachos
 
A
, et al. 
Ribosomal protein L5 and L11 mutations are associated with cleft palate and abnormal thumbs in Diamond-Blackfan anemia patients.
Am J Hum Genet
2008
, vol. 
83
 
6
(pg. 
769
-
780
)
51
Sridhar
 
K
Brown
 
PO
Tibshirani
 
R
, et al. 
Differential gene expression profiles in CD34+ myelodysplastic syndrome marrow cells [abstract].
Blood
2005
, vol. 
106
 
11
pg. 
954a
 
Sign in via your Institution