Inter-Reader Variability In Identifying Centroblast Cells From Digital Follicular Lymphoma Cases

Belkacem-Boussaid, Kamel; Pennell, Michael; Shana' Ah, Arwa; Gerwitz, Amy; Zhao, Weiqiang; Racke, Frederick; Hsi, Eric; Lozanski, Gerard; Gurcan, Metin

doi:10.1182/blood.V116.21.5095.5095

Abstract

Abstract 5095

Method

The goal of this research is to assess inter-reader variability in identifying centroblast (CB) cells from digitized H&E-stained Follicular Lymphoma (FL) cases. We have enrolled three board-certified hematopathologists experienced in FL grading to complete reading sessions on 500 High Power Field (HPF: 40 × magnification) images that were selected from 17 H&E digital slides by three hematopathologists. Each slide represents one patient and the dataset is comprised of lymphoma cases with all grades 1, 2, and 3 of FL. Each pathologist was asked to grade the same set of images (500 images). The pathologists examined digital images and recorded the spatial coordinates of CBs using in-house developed software that allowed pathologists to mark CB cells using only a computer mouse.

Experimental Results

The results from each reading session were analyzed in terms of FL grade which was determined by averaging the centroblast counts across the 28–30 images for a patient and assigning grade using the standard WHO guidelines: Grade I = 0–5, Grade II = 6–15, Grade III = > 15 centroblasts/image. First, we used kappa and p-values in order to measure inter-reader agreement on the three level grades and then we computed the same metrics to measure agreement on a two level diagnosis: Grade I or II (no chemoprevention assigned) versus Grade III (chemoprevention assigned).

Kappa and p-values for Three Level Grade: Grades I, II, III

Pathologists Comparison	Kappa¹	p-values²
1&2	0.514	0.0084
1&3	0.377	0.103
2&3	0.261	0.103

1

Landis and Koch [1] guidelines for degree of Kappa agreement; < 0 poor, 0-0.2 slight, 0.4-0.6 moderate, 0.6-0.8 substantial, 0.81-1 almost perfect.

2

2Holm's method [2] used to correct for multiple tests.

Kappa and p-values for Two Level Diagnosis (Grade I or II vs. III)

Pathologists Comparison	Kappa¹	p-values²
1&2	0.564	0.0822
1&3	0	0.290
2&3	0	0.191

1

Landis and Koch [1] guidelines for degree of Kappa agreement; < 0 poor, 0-0.2 slight, 0.4-0.6 moderate, 0.6-0.8 substantial, 0.81-1 almost perfect.

2

2Holm's method [2] used to correct for multiple tests.

Pathologist	Grade I	Grade II	Grade III
1	7	7	3
2	3	8	6
3	7	10	0

Summary of centroblast count determination by pathologist

Pathologist	Mean	Standard deviation*
1	12.2	16.1
2	23.5	27.7
3	6.7	4.6

*

Standard deviation of the means for each slide.

Discussion

Table 1 provides the weighted kappa statistics based on the three level grading system. There was significant moderate agreement between pathologists 1 and 2 in grade level. However, pathologist 3 shows high disagreement with respect to pathologists 1 and 2 in grade. We also examined agreement based on the clinically significant diagnosis (Grade I or II versus III) (see table 2), the kappa statistics show that pathologists 1 and 2 moderately agreed in their diagnosis, though the agreement was only marginally significant. However, we again see that pathologist 3 did not agree with pathologists 1 and 2. In these cases, the weighted kappas are equal to zero suggesting that there is no agreement between pathologist 3 and pathologists 1 and 2. Table 3 demonstrates the average grade determination for each pathologist per patient. Table 4 exhibits the mean and the standard deviation of centroblast count for each pathologist per patient. These tables demonstrate that there is a large amount of variability in both grade and centroblast count; pathologist 2 identified the most centroblasts and consequently identified the highest percentage of grade 3 cases. Pathologist 3, was considerably more conservative than pathologists 1 and 2 in identifying centroblasts and did not identify any grade 3 cases.

Conclusion

In this study, we have examined inter-reader variability in grading follicular lymphoma in digital images based on centroblast count. We found high variability in centroblast counts and grade across pathologists resulting in agreement, which ranged from none to moderate at best. A larger data set and more pathologists will be considered in the near future to improve the generalizability of our results.

References

1. J. R. Landis and G. G. Koch “The measurement of observer agreement for categorical data” in Biometrics, 1977, vol. 33, pp. 159–174.

2. S. Holm “A simple sequentially rejective multiple test procedure” in Scandinavian Journal of Statistics, 1979, vol. 6, pp. 65–70.

Disclosures:

No relevant conflicts of interest to declare.

Author notes

*

Asterisk with author names denotes non-ASH members.

2010

Sign in via your Institution

Inter-Reader Variability In Identifying Centroblast Cells From Digital Follicular Lymphoma Cases

Abstract

Author notes

Cited By

Email alerts

ASH Publications

American Society of Hematology

Inter-Reader Variability In Identifying Centroblast Cells From Digital Follicular Lymphoma Cases Free

Abstract

Author notes

This feature is available to Subscribers Only

My Account

Cited By

Email alerts

ASH Publications

American Society of Hematology

This Feature Is Available To Subscribers Only

Inter-Reader Variability In Identifying Centroblast Cells From Digital Follicular Lymphoma Cases