The immunoglobulin heavy chain variable (IGHV) gene repertoire in CLL is biased and uniquely characterized by the existence of subsets of cases with “stereotyped” heavy chain complementarity-determining region 3 (HCDR3) sequences. As previously shown, HCDR3 stereotypy may have important pathogenetic and clinical implications, at least for certain subsets of CLL patients. However, the detection of stereotypy has so far been hindered, mainly due to the lack of suitable computational tools. We developed new methods for identification of sequence patterns within HCDR3 amino acid (AA) sequences using a sophisticated combinatorial pattern discovery algorithm. Included in the analysis were HCDR3 sequences from 2,845 CLL patients from our collaborating institutions, as well as 5,344 non-CLL sequences from public databases, for a total of 8,189. The identified patterns were subjected to a multiple and strict filtering process, based on the following criteria:

  • sequence relatedness;

  • location (offset) within HCDR3; and

  • HCDR3 AA length.

Clustering of sequences based on the filtered HCDR3 patterns revealed that the CLL IG repertoire can be distinguished in two broad categories: the first includes cases with heterogeneous B cell receptors (BCRs) (non-clustered cases) while the second is characterized by remarkable BCR stereotypy (clustered cases). In particular, 783/2,845 CLL sequences (27.5%) were placed in 339 ground-level clusters. Common sequences among these ground-level clusters allowed their progressive grouping in clusters at higher levels of hierarchy. High-level clusters were characterized by striking IGHV repertoire restriction, with only six IGHV genes (1–69, 1–3, 1–2, 3–21, 4–34, 4–39) accounting for >80% (382/459) of cases. While ground-level clusters provided a high-resolution picture of HCDR3 stereotypy, high-level clusters were considerably larger in size (up to 86 sequences each) and able to capture and describe more distant sequence relationships in the form of more widely shared sequence patterns. In particular, they were defined by patterns characterized by just a few critically positioned residues, reminiscent of the receptors expressed by cells participating in innate immune responses: due to this remarkable structural conservation, we consider them to be BCR “archetypes”. To test the hypothesis that sequence relatedness between IGHV genes may have structural and functional meaning for the IGHV repertoire in CLL, we also constructed sequence distance trees of functional human IGHV genes. Examination of the tree for the IGHV3 gene subgroup revealed a branching that was reflected in the repertoires of clustered vs. non-clustered CLL sequences. In particular, IGHV3-21, the foremost example of a gene with a propensity to be used in clustered rearrangements in CLL, belongs to a branch clearly distinct from other branches that include, for instance, the IGHV3-23, IGHV3-30, IGHV3-33 and IGHV3-7 genes, which were essentially absent from the repertoire of CLL sequences in high-level clusters; a similar case was also evident among IGHV1 subgroup genes. On these grounds, we argue that CLL cases with clustered (i.e. stereotyped) vs. non-clustered (i.e. heterogeneous) BCRs could derive from different progenitor cell populations evolutionarily adapted to particular antigenic challenges. In particular, prompted by the fact that clustered cases were found to express a limited set of highly conserved BCRs, we propose that in such cases the clonogenic progenitors may originate from a B cell population intermediate between a true innate immune system and the conventional adaptive B cell immune system, similar to what has been previously suggested for mouse B1 cells.

Disclosures: No relevant conflicts of interest to declare.

Author notes

Corresponding author

Sign in via your Institution