Transcription factors (TFs) and the regulatory proteins that control them play key roles in hematopoiesis, controlling basic processes of cell growth and differentiation; disruption of these processes may lead to leukemogenesis. Here we attempt to identify functionally novel and partially characterized TFs/regulatory proteins that are expressed in undifferentiated hematopoietic tissue. We surveyed our database of 15 970 genes/expressed sequence tags (ESTs) representing the normal human CD34+ cells transcriptosome (http://westsun.hema.uic.edu/cd34.html), using the UniGene annotation text descriptor, to identify genes with motifs consistent with transcriptional regulators; 285 genes were identified. We also extracted the human homologues of the TFs reported in the murine stem cell database (SCdb; http://stemcell.princeton.edu/), selecting an additional 45 genes/ESTs. An exhaustive literature search of each of these 330 unique genes was performed to determine if any had been previously reported and to obtain additional characterizing information. Of the resulting gene list, 106 were considered to be potential TFs. Overall, the transcriptional regulator dataset consists of 165 novel or poorly characterized genes, including 25 that appeared to be TFs. Among these novel and poorly characterized genes are a cell growth regulatory with ring finger domain protein (CGR19, Hs.59106), an RB-associated CRAB repressor (RBAK, Hs.7222), a death-associated transcription factor 1 (DATF1, Hs.155313), and a p38-interacting protein (P38IP, Hs. 171185). The identification of these novel and partially characterized potential transcriptional regulators adds a wealth of information to understanding the molecular aspects of hematopoiesis and hematopoietic disorders.

Transcription factors (TFs) play a critical role in the process of lineage commitment and differentiation in hematopoietic tissue.1-4 Several such factors are known to control the basic molecular mechanisms that underlie this process, and their expression is tightly regulated in a stage- and lineage-specific manner.5 For example, the level of expression of PU.1 and GATA binding proteins plays a major regulatory role in myeloid development, with PU.1 being up-regulated with myeloid differentiation,6 whereas GATA1 and GATA2 are down-regulated.6,7 Disruption in the expression, sequence, or structure of critical TFs or their associated regulatory proteins can upset the delicate balance between proliferation and differentiation and lead to leukemogenesis. Most of the consistent translocations in myeloid leukemias that have been analyzed to date result in a fusion protein that alters the normal function of a TF or a related regulatory protein8,9; it is increasingly recognized that these genes might also contribute to leukemia by functional inactivation by mutation10 or chromosomal translocation.11-13 It has been speculated that the majority of translocations that have not yet been fully characterized probably also involve transcriptional regulatory proteins.14-16 Thus, the identification of novel transcriptional regulators, especially those that are located near translocation breakpoints, may help to specify new leukemia-related proteins, leading to better understanding and treatment of this disease.

In the present study, we took a global approach to identify novel and known transcriptional regulators that might participate in hematopoiesis and leukemogenesis by surveying databases of genes that are expressed in normal hematopoietic stem cells. We searched our previously reported database of 15 970 transcripts that are present in human bone marrow CD34 antigen–positive cells17 to identify those with functional motifs consistent with transcriptional regulators. We also searched a murine stem cell database18to find the human homologues of TFs expressed in this tissue. Here we report the results of our search, which identified 330 genes that are potential transcriptional regulators, including 106 TFs, of which 25 are novel or poorly characterized. These TFs, especially those novel ones that have not been reported previously, may represent new pathways in hematopoiesis or leukemogenesis that have not yet been explored.

The human CD34+ transcriptosome database

The preparation of the human CD34+ transcriptosome database was previously reported.17 The database, which is available online at http://westsun.hema.uic.edu/cd34.html, contains 15 970 complementary DNAs (cDNAs) expressed in CD34+cells, and includes the GenBank accession number, the UniGene cluster identification number (http://www.ncbi.nlm.nih.gov/UniGene/) to which the GenBank clone belongs, its relative expression in CD34+ cells, the gene name, a functional description of the gene (from the UniGene text descriptor, build version 129), and its chromosomal location. The UniGene text descriptors of this database were searched for the following terms: transcription factor, leucine zipper, zinc finger, ring finger, helix-loop-helix, PHD, POU, forkhead, bromodomain, homeobox, oncogene, nuclear, activator, and repressor. The dataset was then updated to the most recent UniGene Build version (version 135, June 2001). Redundant cDNAs contained within the same UniGene cluster were removed, saving only the clone having the highest expression level in the CD34+ database. Each cDNA sequence was then used to search the GenBank NR database (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Nucleotide) using BlastN alignment software, to verify homology to the predicted gene, using an arbitrary cut-off of E <-40 to indicate sufficient sequence identity. If the E value was greater than e-40, then the cDNA sequence was also used to search the TIGR database (http://www.tigr.org/) to verify that it is represented by a TIGR contig containing the transcript of the predicted functions. The cDNAs that did not pass these GenBank or TIGR screens were removed from the database, as were genes of known function that clearly did not represent the categories that were being sought.

The murine stem cell database

A murine stem cell database (http://stemcell.princeton.edu) consisting of expressed genes (devoid of housekeeping genes) in mouse fetal liver stem cells has been reported.18 The GenBank accession number of all 161 entries in the “transcription factor” category of this database were selected and used to identify the corresponding murine UniGene cluster for each entry (build version 129); the UniGene annotation was used to identify the human homologue, if available. All human genes that were also present in the CD34+ transcriptosome database were then selected for inclusion in the present study and updated to their corresponding entry in build version 135 of UniGene. GenBank and TIGR homology screens were performed as described above.

Selection of genes from the human CD34+transcriptosome database

The 15 970 genes in the human CD34+ transcriptosome database were searched for cDNAs that encode known TFs, and for those containing motifs that are frequently found in TFs and their interacting proteins. The analysis was based on a text search of the UniGene descriptor of the clones in the CD34+ database, rather than a direct homology search of the clone sequence. UniGene is a database that automatically collects and partitions GenBank and expressed sequence tag (EST) sequences into a nonredundant set of gene-oriented clusters by establishing sequence overlaps; each cluster represents a single potential transcript. Each cluster is annotated with a descriptor of the transcript that is the result of automated searches for sequence homologies to proteins from 8 organisms, using both nucleotide and protein sequence alignment; thus, a fair amount of functional prediction is available for each gene cluster even if it represents an EST sequence that has not been further studied. Each cluster is assigned a chromosomal location, based on sequence alignment. Details of the construction and updating of the UniGene database are available athttp://www.ncbi.nlm.nih.gov/UniGene/.

The UniGene cluster descriptors contained in the CD34+ transcriptosome database were searched for terms that are thought likely to annotate TFs, corepressors or coactivators, nuclear factors, and other DNA-interacting proteins. The resulting genes were updated, corrected for redundancy, and verified through homology screens. The database was visually inspected, and an additional 6 genes of known function, which clearly did not contain transcriptional regulatory activity, were removed from the database. A total of 285 genes resulted. Table1 presents these genes, categorized according to their function or functional motifs, with their UniGene number, chromosomal location, and UniGene descriptor. The cDNAs in each category are presented in the order from highest abundance to lowest, based on the measured level of expression in CD34+ cells, as reported in the database.17 

Selection of genes from the murine stem cell database

The TF category of the murine hematopoietic stem cell database was analyzed to identify the human homologues of known and novel TFs expressed in human bone marrow CD34+ cells, by cross-referencing the murine and human UniGene databases. The murine UniGene clusters corresponding to each of the 161 TFs listed in the murine database were matched with the human clusters in the UniGene database version 129 resulting in 155 homologous human clusters. A total of 145 human genes remained after updating to UniGene version 135 and removing redundant entries. Of these 145 clusters, 87 were represented in the human CD34+ transcriptosome database, including 30 that had already been identified by our search using text descriptors. These 30 clusters are indicated with an asterisk in Table1. Analysis of the remaining 57 human genes for homology to their assigned UniGene cluster or to a corresponding TIGR entry, and excluding those whose known function was obviously not in the category of a transcriptional regulator, resulted in 45 additional genes/ESTs. These additional 45 genes are listed in Table2, and each entry includes the murine gene and its presumed human counterpart, its human UniGene cluster ID and descriptors, its chromosomal location, and the level of expression in human CD34+ cells. Of the 58 clusters that are not present in the CD34+ transcriptosome database, 38 were thought to be unexpressed in human CD34+cells, based on an expression level less than 3-fold over background in the CD34+ transcriptosome database, and the remaining 20 could not be evaluated because they had not been included in the original expression studies that resulted in the CD34+transcriptosome database.

Literature analysis of the TF database

After combining the datasets mined from the human and murine databases, the total number of potential TFs or regulatory proteins was determined to be 330. This includes 106 genes that are recognized as TFs and 224 genes in other categories, which include zinc fingers (90 genes), enhancers (14 genes), activators (8 genes), forkhead (11 genes), oncogenes (20 genes), ring finger (16 genes), and the combination of helix-loop-helix, homeobox, leucine zipper, nuclear, PHD, POU, and repressor categories (21 genes). The remaining 44 cDNAs represent genes that are functionally characterized as transcriptional regulators but lacked any search terms used in our mining protocol. A literature search of each of these 330 genes was performed to determine what was known about each one, emphasizing the discovery of novel genes. The following convention was used to summarize our search results: K = known gene, well characterized; PC = partially characterized, the gene was reported and some preliminary studies have been performed to indicate its function; N = novel gene, no functional information other than its chromosomal location and sequence homology to a known gene or gene family has been reported. These summaries are given in Tables 1 and 2. As a result of the literature search, 165 (50%) of the 330 transcriptional regulators identified were found to be known genes, 86 (26%) have been partially characterized, and 79 (24%) are novel. The partially characterized and novel transcriptional regulators have been further categorized by their relative level of abundance in CD34+ cells, with 92 expressing at low level (≥ 3-fold to < 10-fold over background), 27 expressing at intermediate level (≥ 10-fold to < 25-fold), 28 at high level (≥ 25-fold to < 100-fold), and 18 expressing at very high levels (≥ 100-fold), using the conventions reported in the CD34+ transcriptosome database.17 

In the current study, we emphasized the identification of novel TFs. Based on our literature search of the 106 identified TFs, 78 appear to be well characterized, known genes, whereas 18 have been partially characterized and 7 represent truly novel genes. These 25 partially characterized and novel genes are listed in Table3 along with details of their presumed function and the supporting literature references.

The current report presents our initial attempts to describe the TFs and related regulatory proteins that are present in the human CD34+ transcriptosome. The study is based primarily on the survey of a human CD34+ transcriptosome database, supplemented by homologies identified in a murine stem cell database, referring them to the CD34+ database.

The human CD34+ transcriptosome database was prepared by hybridization of filter arrays, selecting transcripts that are common to both human and baboon bone marrow CD34+antigen–positive cells.17 This database is felt to be an accurate portrayal of the transcriptosome of the CD34+ cell and was estimated to contain 50% to 75% of the transcripts expressed in this tissue. This database contains 15 970 genes/ESTs expressed in CD34+ cells, and lists their relative level of expression; random sampling of selected transcripts verified (by semiquantitative reverse transcriptase–polymerase chain reaction) that most were expressed at the predicted level.

The murine database (http://stemcell.princeton.edu/) was the result of a cDNA library study, subtracting a stem cell–depleted (AA4.1neg) cDNA library from a mouse fetal liver hematopoietic stem cell (ScaposAA4.1posKitposLinneg/lo) cDNA library.18 The subtracted library represents genome-wide gene expression in mouse hematopoietic stem cells devoid of housekeeping genes. Sequence information on each of these clones was compared by BLAST against GenBank nonredundant protein and nucleotide databases, the EST database, Swissprot, and mouse and human DOTS contigs. Each clone was categorized according to its sequence homology to genes of known functions, resulting in a “transcription factor” category containing 161 entries.

The current study reported here was based on the search of UniGene text descriptors in the CD34+ transcriptosome database domains generally present in TFs and their regulatory proteins, whereas the mining of the murine stem cell database relied on homology between the mouse TFs and human genes. The study resulted in the identification of 330 genes that are likely to be transcriptional regulators expressed in human CD34+ cells. Because this transcriptional regulator database was prepared using text descriptors rather than primary sequence analysis, it should only be regarded as a preliminary database survey, limited by the accuracy of the sequence searches compiled by UniGene and by the contents of the databases that were analyzed. Because this study relied heavily on the UniGene database, a considerable number of potential transcriptional regulators might have been missed because of the absence of search terms in the text descriptors, or the simple fact that the UniGene database does not contain complete cDNA sequences for all human genes. This explains in part why the additional 45 TFs from the murine database were not selected during the CD34+ transcriptosome database analysis.

Despite these limitations, we believe that this gene list will prove to be very useful for further studies of normal and malignant hematopoiesis. One of the most striking features of this list is that many of the genes have been assigned functional roles in numerous other tissues besides bone marrow. Also of note is the identification of 165 partially characterized and novel genes, 11 of which are expressed at a very high level in CD34+ cells, suggesting that they have an important role in this tissue but have not been previously recognized as such. Some of the interesting novel or partially characterized genes include zinc finger protein 161 (ZFP161,Hs. 156000), a cell growth regulator protein with a ring finger domain (CGR19, Hs. 59106), zinc finger protein 198 (ZNF 198, Hs.109526), RB-associated CRAB repressor (RBAK,Hs.7222), death-associated transcription factor 1 (DATF1,Hs. 155313), and a p38-interacting protein (P38IP, Hs. 171185). The human ZFP161 protein is highly homologous (98%) to ZF5, a putative murine repressor for MYC, with a growth-inhibitory function.19 We anticipate that bothZFP161 and RBAK20 are associated factors for 2 very functionally important proteins, MYC and RB, respectively, and may play important regulatory roles in cellular functions such as proliferation, differentiation, and apoptosis; to our knowledge, these genes have not been previously evaluated in hematopoiesis or leukemia. Another interesting protein is zinc finger protein 198 (ZNF 198). This gene has not been functionally characterized, but it is reported to be involved in the t(8;13) translocation,21 resulting in a fusion protein with fibroblast growth factor receptor 1 (FGFR1). Studies of these and other novel genes are underway to ascertain their potential role in cell proliferation, differentiation, and apoptosis in the hematopoietic system.

Detailed studies will be required to verify that each of these genes is indeed expressed in hematopoietic CD34+ cells at the predicted level, to obtain the complete coding sequence for the partial cDNAs/ESTs in the database, and to verify the assigned chromosomal location. We predict that some of these genes may be disrupted by chromosomal translocations, thereby contributing to leukemogenesis. Overall, the database here represents a wealth of potential new information to aid in understanding the molecular aspect of normal and malignant hematopoiesis.

Supported by Public Health Service grant P01-75606 (to C.A.W.).

The publication costs of this article were defrayed in part by page charge payment. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 U.S.C. section 1734.

1
Scott
E
Simon
M
Anastasi
J
Singh
H
Requirement of transcription factor PU.1 in the development of multiple hematopoietic lineages.
Science.
265
1994
1573
1577
2
Tenen
D
Hromas
RR
Licht
J
Yamamishi
D
Zhang
D
Transcription factors, normal myeloid development, and leukemia.
Blood.
90
1997
489
519
3
van Oostveen
J
Bijl
J
Raaphorst
F
Walboomers
J
Meijer
C
The role of homeobox genes in normal hematopoiesis and hematological malignancies.
Leukemia.
13
1999
1675
1690
4
Orkin
SH
Transcription factors and hematopoietic development.
J Biol Chem.
270
1995
4955
4958
5
Lawrence
H
Sauvageau
G
Ahmadi
N
et al
Stage- and lineage-specific expression of the HOXA10 homeobox gene in normal and leukemic hematopoietic cells.
Exp Hematol.
23
1995
1160
1166
6
Voso
M
Burn
T
Wulf
G
et al
Inhibition of hematopoiesis by competitive binding of transcription factor PU.1.
Proc Natl Acad Sci U S A.
91
1994
7932
7936
7
Lee
M
Temizer
D
Clifford
J
Quertermous
T
Cloning of the GATA-binding protein that regulates endothelin-1 gene expression in endothelial cells.
J Biol Chem.
266
1991
16188
16192
8
de The
H
Lavau
C
Marchio
A
et al
The PML-RAR alpha fusion mRNA generated by the t(15;17) translocation in acute promyelocytic leukemia encodes a functionally altered RAR.
Cell.
66
1991
675
684
9
McNeil
S
Zeng
C
Harrington
K
et al
The t(8;21) chromosomal translocation in acute myelogenous leukemia modifies intranuclear targeting of the AML1/CBFalpha2 transcription factor.
Proc Natl Acad Sci U S A.
96
1999
14882
14887
10
Pabst
T
Mueller
B
Zhang
P
et al
Dominant-negative mutations of CEBPA, encoding CCAAT/enhancer binding protein-alpha (C/EBPalpha), in acute myeloid leukemia.
Nat Genet.
27
2001
263
270
11
Brown
D
Kogan
S
Lagasse
E
et al
A PMLRARalpha transgene initiates murine acute promyelocytic leukemia.
Proc Natl Acad Sci U S A.
94
1997
2551
2556
12
Golub
T
Barker
G
Bohlander
S
et al
Fusion of the TEL gene on 12p13 to the AML1 gene on 21q22 in acute lymphoblastic leukemia.
Proc Natl Acad Sci U S A.
92
1995
4917
4921
13
Look
A
Oncogenic transcription factors in the human acute leukemias.
Science.
278
1997
1059
1064
14
Ahuja
H
Hong
J
Aplan
P
et al
t(9;11)(p22;p15) in acute myeloid leukemia results in a fusion between NUP98 and the gene encoding transcriptional coactivators p52 and p75-lens epithelium-derived growth factor (LEDGF).
Cancer Res.
60
2000
6227
6229
15
Kroon
E
Thorsteinsdottir
U
Mayotte
N
Nakamura
T
Sauvageau
G
NUP98-HOXA9 expression in hemopoietic stem cells induces chronic and acute myeloid leukemias in mice.
EMBO J.
20
2001
350
361
16
Kulkarni
S
Reiter
A
Smedley
D
Goldman
J
Cross
N
The genomic structure of ZNF198 and location of breakpoints in the t(8;13) myeloproliferative syndrome.
Genomics.
55
1999
118
121
17
Gomes
I
Sharma
T
Mahmud
N
et al
Highly abundant genes in the transcriptosome of human and baboon CD34 antigen-positive bone marrow cells.
Blood.
98
2001
93
99
18
Phillips
R
Ernst
R
Brunk
B
et al
The genetic program of hematopoietic stem cells.
Science.
288
2000
1635
1640
19
Sobek-Klocke
I
Disque-Kochem
C
Ronsiek
M
et al
The human gene ZFP161 on 18p11.21-pter encodes a putative c-myc repressor and is homologous to murine Zfp161 (Chr 17) and Zfp161-rs1 (X Chr).
Genomics.
43
1997
156
164
20
Skapek
S
Jansen
D
Wei
T
et al
Cloning and characterization of a novel Kruppel-associated box family transcriptional repressor that interacts with the retinoblastoma gene product, RB.
J Biol Chem.
275
2000
7212
7223
21
Xiao
S
McCarthy
J
Aster
J
Fletcher
J
ZNF198-FGFR1 transforming activity depends on a novel proline-rich ZNF198 oligomerization domain.
Blood.
96
2000
699
704
22
Wey
E
Schafer
BW
Identification of novel DNA binding sites recognized by the transcription factor mPOU (POU6F1).
Biochem Biophys Res Commun.
220
1996
274
279
23
Albert
TK
Lemaire
M
van Berkum
NL
et al
Isolation and characterization of human orthologs of yeast CCR4-NOT complex subunits.
Nucleic Acids Res.
28
2000
809
817
24
Hopfner
R
Mousli
M
Garnier
JM
et al
Genomic structure and chromosomal mapping of the gene coding for ICBP90, a protein involved in the regulation of the topoisomerase IIalpha gene expression.
Gene.
266
2001
15
23
25
Knoepfler
PS
Kamps
MP
The Pbx family of proteins is strongly upregulated by a post-transcriptional mechanism during retinoic acid-induced differentiation of P19 embryonal carcinoma cells.
Mech Dev.
63
1997
5
14
26
Teraoka
Y
Naruse
TK
Oka
A
et al
Genetic polymorphisms in the cell growth regulated gene, SC1 telomeric of the HLA-C gene and lack of association of psoriasis vulgaris.
Tissue Antigens.
55
2000
206
211
27
Lu
R
Misra
V
Zhangfei: a second cellular protein interacts with herpes simplex virus accessory factor HCF in a manner similar to Luman and VP16.
Nucleic Acids Res.
28
2000
2446
2454
28
Garcia-Domingo
D
Leonardo
E
Grandien
A
et al
DIO-1 is a gene involved in onset of apoptosis in vitro, whose misexpression disrupts limb development.
Proc Natl Acad Sci U S A.
96
1999
7992
7997
29
Robb
L
Mifsud
L
Hartley
L
et al
Epicardin: a novel basic helix-loop-helix transcription factor gene expressed in epicardium, branchial arch myoblasts, and mesenchyme of developing lung, gut, kidney, and gonads.
Dev Dyn.
213
1998
105
113
30
Ottolenghi
C
Veitia
R
Barbieri
M
et al
The human doublesex-related gene, DMRT2, is homologous to a gene involved in somitogenesis and encodes a potential bicistronic transcript.
Genomics.
64
2000
179
186
31
Di Rocco
G
Pennuto
M
Illi
B
et al
Interplay of the E box, the cyclic AMP response element, and HTF4/HEB in transcriptional regulation of the neurospecific, neurotrophin-inducible vgf gene.
Mol Cell Biol.
17
1997
1244
1253
32
Prevot
D
Morel
AP
Voeltzel
T
et al
Relationships of the antiproliferative proteins BTG1 and BTG2 with CAF1, the human homolog of a component of the yeast CCR4 transcriptional complex: involvement in estrogen receptor alpha signaling pathway.
J Biol Chem.
276
2001
9640
9648
33
Fletcher
CF
Jenkins
NA
Copeland
NG
et al
Exon structure of the nuclear factor I DNA-binding domain from C. elegans to mammals.
Mamm Genome.
10
1999
390
396
34
Przyborski
SA
Damjanov
I
Knowles
BB
et al
Differential expression of the zinc finger gene TCF17 in testicular tumors.
Cancer Res.
58
1998
4598
4601
35
Degar
BA
Baskaran
N
Hulspas
R
et al
The homeodomain gene Pitx2 is expressed in primitive hematopoietic stem/progenitor cells but not in their differentiated progeny.
Exp Hematol.
29
2001
894
902
36
Yoshima
T
Yura
T
Yanagi
H
Novel testis-specific protein that interacts with heat shock factor 2.
Gene.
214
1998
139
146
37
Horikawa
I
Tanaka
H
Yuasa
Y
et al
Molecular cloning of a novel human cDNA on chromosome 1q21 and its mouse homolog encoding a nuclear protein with DNA-binding ability.
Biochem Biophys Res Commun.
208
1995
999
1007
38
Kiss
H
Kedra
D
Kiss
C
et al
The LZTFL1 gene is a part of a transcriptional map covering 250 kb within the common eliminated region 1 (C3CER1) in 3p21.3.
Genomics.
73
2001
10
19

Author notes

Carol A. Westbrook, Department of Medicine, Section of Hematology and Oncology, 900 S Ashland Ave, M/C 734, Chicago, IL 60607; e-mail: cwcw@uic.edu.

Sign in via your Institution