Aggressive B-cell non-Hodgkin lymphomas are haematological malignancies that account for significant morbidity and mortality worldwide. They encompass a wide range of histological and clinical features: Diffuse large B-cell lymphoma (DLBCL) represents the most frequent subtype and Burkitt lymphoma (BL) occurs less frequently. Recent advances in molecular profiling have demonstrated considerable heterogeneity at the molecular level, and further classified DLBCL into germinal centre B-cell-like (GCB), activated B-cell-like (ABC), primary mediastinal B-cell lymphoma (PMBL) and type III subgroups, where ABC associates with the most unfavourable prognosis. Gene-expression profiling (GEP) also confirmed a subgroup with features intermediate between BL and DLBCL, and these cases particularly that have concurrent chromosomal rearrangements of MYC and BCL2are often associated with poor prognosis. Despite the progress made from GEP, the classification still has limited influence on clinical treatment decision-making. In fact, as more heterogeneity beyond the subtypes above has been discovered by recent next-generation sequencing studies the approach of dividing cases into limited subgroups makes less sense in clinical practice. It is clear that the subtypes overlap to an extent in the affected signalling and regulatory pathways, and that small groupings within subtypes exhibit clear mechanistic differences and treatment responses.

We describe an alternative approach, where a large database of aggressive B-cell lymphomas is used in a similarity search to identify those cases most similar at a molecular level to a query case. The hypothesis is that the most similar cases provide the best guide to prognosis and treatment outcome in the query case, independent of any need to place the query case into a particular subtype. We used both large public datasets and data from our Haematological Malignancy Research Network (www.HMRN.org) to explore genes associated with pathogenic pathways and an unfavourable prognosis. We also defined similarity between cases according to their molecular features and treatment responses. We then trained the similarity search method, by employing a distance metric learning approach that has been successfully used in similar machine learning applications, to test our HMRN dataset which contains detailed clinical data with treatment and outcome information. The cross-validation result on the public dataset achieved nearly 90% accuracy in recognizing cases with similar overall survival, and initial test results on HMRN data also shows that over 80% cases can be correctly represented by similar cases. In summary, we present a similarity learning method as an alternative to the current sub-type classification method. This mathematical method is able to accurately recognise cases with similar molecular features and provides important information on predicted treatment response for any given query case.

Disclosures

Smith:Novartis: Research Funding; Celgene: Research Funding; Jansen Cilag: Research Funding; Amgen: Research Funding.

Author notes

*

Asterisk with author names denotes non-ASH members.

Sign in via your Institution