The most commonly used grading system for acute graft-versus-host disease (aGVHD) was introduced 30 years ago by Glucksberg; a revised system was developed by the International Bone Marrow Transplant Registry (IBMTR) in 1997. To prospectively compare the 2 classifications and to evaluate the effect of duration and severity of aGVHD on survival, we conducted a multicenter study of 607 patients receiving T-cell-replete allografts, scored weekly for aGVHD in 18 transplantation centers. Sixty-nine percent of donors were HLA-identical siblings and 28% were unrelated donors. The conditioning regimen included total body irradiation in 442 (73%) patients. The 2 classifications performed similarly in explaining variability in survival by aGVHD grade, although the Glucksberg classification predicted early survival better. There was less physician bias or error in assigning grades with the IBMTR scoring system. With either system, only the maximum observed grade had prognostic significance for survival; neither time of onset nor progression from an initially lower grade of aGVHD was associated with survival once maximum grade was considered. Regardless of scoring system, aGVHD severity accounted for only a small percentage of observed variation in survival. Validity of these results in populations receiving peripheral blood transplants or nonmyeloablative conditioning regimens remains to be tested. (Blood. 2005;106:1495-1500)

Acute graft-versus-host disease (aGVHD) is a major complication of allogeneic hematopoietic cell transplantation (HCT), occurring after 40% to 60% of HLA-identical transplantations and more frequently after unrelated donor transplantations. In 1974, Glucksberg1  published the first aGVHD classification, modified by Thomas in 1975, using data on a small number of patients receiving methotrexate alone for GVHD prophylaxis. The Glucksberg system remains the most commonly used system. Although this classification has prognostic value, it is complex. In fact there are 125 possible combinations of organ involvement and severity in the Glucksberg grading system, where each of 3 organs (skin, gastrointestinal tract, and liver) is staged from 0 to 4; staging includes objective assessment of organ function and subjective assessment of performance status. These stages are combined to calculate an overall grade. However, in the overall grading system, 62 of the possible 125 combinations are not defined. Additionally, the grading system shows significant interobserver variability, especially for grade II aGVHD. Perhaps because of this, it is routine practice to dichotomize GVHD severity into clinically insignificant grades 0-I and clinically significant grades II-IV. In recognition of the system's limitations, a consensus workshop was held in 19952  and a modified Glucksberg grading system proposed. Additionally, Rowlings3  and colleagues at the International Bone Marrow Transplant Registry (IBMTR) retrospectively analyzed 2129 adult patients receiving a non-T-cell-depleted marrow transplant with cyclosporine and methotrexate for GVHD prophylaxis between 1986 and 1992 and described a new prognostic model. The latter system retained the objective organ staging criteria of the Glucksberg system but excluded the subjective criteria of clinical performance and modestly revised the computation of overall grade to produce more homogeneity within each grade. This new score was intended to simplify GVHD grading and facilitate comparisons of data from different centers and studies.

To further explore the validity of the Glucksberg and IBMTR grading systems, we initiated a prospective study of weekly aGVHD grading and analyzed the impact of aGVHD on survival. This study also addressed previously unresolved questions about the time course of aGVHD and the impact of relapse or flare of aGVHD on transplantation outcome.

Patients

A cohort of 607 patients receiving myeloablative, allogeneic non-T-cell-depleted hematopoietic cell transplants between 1996 and 1999 was prospectively studied at 17 French transplant centers from the Société Française de Greffe de Moëlle et Thérapie Cellulaire (n = 478) and at the Dana Farber Cancer Institute (n = 129). At each transplantation center, a weekly aGVHD stage and grade was assigned by a trained investigator according to the modified Glucksberg and IBMTR grading systems.2,3  Overall grade was calculated by computer using pattern of organ stage (Table 1). Investigators were also asked to indicate overall Glucksberg and IBMTR grade on case report forms without specific instructions for computation other than reference to the published algorithms. All patients signed informed consent for use of their clinical data in research studies. Patients were followed for one year for the outcomes of survival and disease recurrence. At the Dana Farber Cancer Institute, patients were also enrolled on a randomized trial of interleukin 1 receptor antagonist (IL-1RA) for GVHD prophylaxis. No differences were observed between the active and placebo arms,4  so they were combined for the purposes of this analysis. The Institutional Review Board (IRB) of the Société Française de Greffe de Möelle et Thérapie Cellulaire approved the study, as did the Dana-Farber Cancer/Harvard Cancer Center IRB.

Statistical methods

The relative risk of mortality for patients with versus those without aGVHD after transplantation was determined using Cox proportional hazards regression. The occurrence of aGVHD was entered in the Cox model as a time-dependent variable. Because age, disease state at transplantation, and type of donor are major prognostic factors for HCT outcome, all analyses were stratified on these covariates.

We estimated the percentage of explained variability in predicting patient survival at 100, 182, and 365 days using the Brier score.5  This score compares the amount of uncertainty in predicting survival to a specified end point for patients when information about aGVHD grade at a given week was included or ignored. The Brier score is a number between 0 and 100; a score of 100 indicates the ability to predict survival with 100% certainty if the aGVHD grade is known, whereas a score of 0 indicates that there is no prognostic benefit to knowing the aGVHD grade. It is a measure of how much our prognostic ability is improved by knowing the aGVHD grade at a given time point.

Correlations between the computed IBMTR and Glucksberg scores were calculated using the Cohen κ statistic and a 95% confidence interval (CI) for the κ.6  The κ is a measure of agreement between scores. It is a number between 0 and 1, with 1 representing complete agreement and 0 representing complete nonconcordance.

For each week after transplantation, the prevalence (number of patients with aGVHD divided by number of surviving patients) of various grades of aGVHD among survivors and the κ estimate of correlation between the 2 systems were described. Also at each week after transplantation, the percentage of uncertainty in predicting subsequent survival to 100 days, 182 days (6 months), and 365 days (12 months) after transplantation was calculated and compared using aGVHD information from each grading system.

To evaluate survival outcome using a dynamic aGVHD model, that is, a model that incorporated time to onset and changes in grade over time, we used a series of logistic regressions. These examined whether the time of onset of aGVHD affected survival and whether patients with a single episode had the same survival as those with 2 or more episodes of recorded aGVHD.

Patient characteristics are summarized in Table 2. Median patient age was 36.6 years (range, 1-65 years). A total of 435 patients was treated for leukemia (54% with early disease). Sixty-nine percent of donors were HLA-identical siblings and 28% were unrelated donors. All patients received non-T-cell-depleted grafts; most grafts were bone marrow. Total body irradiation was included in the conditioning regimen for 442 (73%) patients. GVHD prophylaxis was with cyclosporine and short-course methotrexate; some patients receiving unrelated donor transplants also received antithymocyte globulin.

Incidence and time course of aGVHD

The median time to onset of any clinically evident aGVHD was 3 weeks (range, 1-14 weeks). Tables 3 and 4 show times to onset of IBMTR grades B-D and Glucksberg grades II-IV aGVHD in study subjects with their subsequent outcomes to week 14; 4-week outcomes are summarized in Table 5. The 4-week complete response rate (defined as being alive with resolution of all GVHD symptoms) was 40% to 60%, with no clear relationship to time of onset. It should be noted that treatment given for aGVHD was not recorded, so response rate is defined simply as the presence or absence of aGVHD at various time points after initial onset, without regard to what treatment was given and when.

Correlation of the 2 grading systems

The Glucksberg and IBMTR grading systems were compared for highest grade of aGVHD reported in the first 100 days. Ignoring time of onset, the overall κ coefficient for agreement between the 2 systems was 0.78 (95% CI, 0.77-0.87), indicating strong agreement. Correlations between weekly grades were also high with values between 0.72 (0.67-0.78) at week 2 and 0.85 (0.80-0.90) at week 10, indicating good agreement whether the scales were applied early or late after transplantation.

Predictive ability of the 2 grading systems

Figure 1 shows the proportion of the population with each maximum GVHD grade, and their estimated survival probabilities after HCT. Table 6 shows the percentage variation in survival at these end points that is explained by the aGVHD grade assigned by each system for each weekly score and for maximum grade. Explained variations ranged from less than 1% to a high of only about 25%; most values were less than 10%. These calculations were also done for HLA-identical sibling transplants alone; patterns were similar.

The scores performed similarly at most time points. Using the current weekly score (ie, using the grade at a given week, ignoring previous or subsequent scores), Glucksberg grade appeared to be a better predictor than IBMTR grade for early (100-day) survival, particularly in the sixth to seventh week. Using the maximum score (ie, the highest observed grade in or before the indicated week, ignoring subsequent scores), the IBMTR grade was somewhat more predictive than Glucksberg grade for 6-month survival.

Figure 1.

Probability of survival according to maximum GVHD score. (A) IBMTR grade. (B) Glucksberg grade.

Figure 1.

Probability of survival according to maximum GVHD score. (A) IBMTR grade. (B) Glucksberg grade.

Close modal

Finally, reported (by the clinical investigator) and computed (from degree of recorded organ involvement) Glucksberg and IBMTR grades were compared to assess the degree of physician bias or error in assigning overall aGVHD grade. There were 221 (3.64%) cases where the physician-assigned grade and the computed grade differed when using the Glucksberg grade and 137 (1.88%) cases where the 2 grades differed when using the IBMTR grades. The κ statistics for correlations between assigned and computed grades were calculated using data from the 478 patients treated at French centers; it was 0.90 for Glucksberg grades and 0.95 for IBMTR grades.

Dynamic aspects of aGVHD

Results of logistic regression models evaluating the association between grade of aGVHD and survival are summarized in Table 7. Mortality at 100 days for patients with maximum grades A-B (IBMTR) or I-II (Glucksberg) aGVHD was approximately 9% and was lower than for patients with grade C (IBMTR) or III (Glucksberg). Patients with grades D (IBMTR) or IV (Glucksberg) fared worst of all.

One-year survival for patients with maximum grades A to C was comparable, whereas those with grade D fared badly. Using Glucksberg scores, patients with grades I-II aGVHD had similar 1-year survival rates. Patients with grade III aGVHD had lower survival and those with grade IV the lowest survival.

We found no difference in 100-day or 1-year survival between patients with single or multiple episodes of aGVHD; only the maximum grade was important.

Finally, we tested whether early onset aGVHD was associated with a worse prognosis than late onset aGVHD using cut-points for time of onset that ranged from week 2 to week 10 and examining both 100-day and 1-year survival. Early aGVHD was defined as the onset of aGVHD in the current week or prior week to the cut-point. Late aGVHD was defined as the onset of GVHD in any week after the cut-point. Adjusting for multiple comparisons, no associations between the timing of onset of aGVHD with 100-day mortality or with 1-year mortality were found with either scoring system.

There are several uses for a valid scoring system for aGVHD. First, it may be used to estimate prognosis for individual patients at specific points in time. These estimates may use the current score and other aGVHD information (eg, prior scores, time of onset) shown to be of prognostic value. Second, related to the first, a scoring system may be used to select patients for particular treatments or clinical trials, targeting patients with poorer prognoses for more intensive or novel approaches. Third, a scoring system may be used to describe the outcome of transplantation strategies (eg, maximum grade of aGVHD associated with a particular GVHD prophylaxis regimen). To be useful, scoring systems must not only have sufficient prognostic value for the purposes described but they should also be relatively easy to use and reproducible. In the special case of clinical trials, the system should readily allow validation by examination of primary medical records. For these reasons, systems that rely on objective, readily measured criteria may be favored.

The current study shows that the IBMTR and Glucksberg grading systems perform similarly as prognostic tools. Both scales show increasing predictive value to a maximum at 8 weeks and then decrease in predictive value thereafter. However, with the exception of the 7- to 8-week time point, neither score explained much of the variation in either early or late survival. Using the maximum score at varying time points (ie, the maximum up to and including the week in which the prediction is made), the IBMTR grade was better early and the Glucksberg grade better later. To assess an individual's 100-day mortality using only his or her current aGVHD score, the Glucksberg grade was superior to IBMTR grade. Using the maximum grade, the IBMTR grade was somewhat better, especially for 6-month survival. However, differences were small and the 2 scores' explained variations (ie, the percentage of variability in survival explained by variability in aGVHD scores using the Brier statistics) were within 2% of each other in 62 of 84 (74%) of the comparisons made (Table 6). Comparison of computed versus reported grading showed less physician bias or error in assigning overall IBMTR grades but again the differences were small.

Few other studies have compared these 2 systems. In 1999, a Spanish group7  reported a retrospective comparison of both scales and concluded that the IBMTR score appeared more predictive of transplantation outcome than the Glucksberg score. However, this study relied on retrospective assignment of severity and considered only maximum severity. Furthermore, the statistical tool used in the Spanish study (Cox proportional hazards model) does not allow direct comparison of different severity scales. More indirectly, one other study from the Minneapolis group used both scales to assess response to corticosteroids as primary therapy for aGVHD.8  The overall grade using either system (IBMTR versus Glucksberg) was determined by a computer algorithm. In this study, the IBMTR severity index B and C better discriminated the likelihood of patients responding to corticosteroids.

In summary, our study and those from the Minneapolis and Spanish groups provide evidence that the IBMTR severity index might be a useful tool in assessing disease severity in patients with aGVHD. The advantage of the long experience with the Glucksberg system might lead some investigators to continue with this system, especially when focusing on early outcomes. Both scales share the disadvantage of investigator bias or error toward assigning a lower score (especially overestimate of grade II or B and underestimate of grade III or C), as recently reported by Weisdorf et al.9  This supports collection of primary data to allow direct computation of scores in clinical trials evaluating aGVHD as a primary or important secondary end point. Because both systems use the same algorithm for organ staging, collection of primary data allows computation of either grade. The IBMTR scale has the advantage of not also requiring assessment of performance status, a data field that may be difficult to validate from the primary medical record, which may be an advantage for multicenter clinical trials.

This prospective, multicenter study also provided a large cohort of patients to study the natural history of aGVHD and its impact as a dynamic process on survival. We were interested in determining whether early onset of aGVHD or multiple episodes of aGVHD conferred a particularly bad prognosis. We found that using either system, maximum grade was the primary determinant of survival with neither timing nor relapse having additional independent prognostic significance.

In summary, we failed to demonstrate a clear advantage of one system for grading aGVHD. To improve aGVHD grading will likely require new tools, perhaps including histologic and immunologic parameters. Furthermore, there is a need to evaluate the prognostic utility of aGVHD classifications in nonmyeloablative transplantation where the concomitant impact of regimen-related toxicities on comorbidity can largely be excluded.

Participating centers and investigators: Société Française de Greffe de Moëlle et de Thérapie Cellulaire: Angers (N. Ifrah); Besançon (J. Y. Cahn); Clermont-Ferrand (J. O. Bay); Grenoble (F. Garban); Lille (J. P. Jouet); Marseille (D. Blaise); Nancy (P. Bordigoni); Nantes (N. Milpied); Nice (N. Gratecos); Paris Hôtel-Dieu (B. Rio); Paris La Pitié Salpêtrière (V. Leblond); Paris Robert Debré (P. S. Rohrlich); Paris Saint Antoine (L. Fouillard); Paris Saint Louis (G. Socié); Rouen (H. Tilly); Toulouse (M. Attal); Villejuif Institut Gustave Roussy (J. H. Bouhris). Dana Farber Cancer Institute, Boston, MA (S. J. Lee, J. H. Antin).

Prepublished online as Blood First Edition Paper, May 5, 2005; DOI 10.1182/blood-2004-11-4557.

Supported, in part, by a grant from the Ligue Départementale Contre le Cancer du Doubs and by Public Health Service Grant U24-CA76518 from the National Cancer Institute, the National Institute of Allergy and Infectious Diseases, and the National Heart, Lung and Blood Institute and the Children's Leukemia Research Association.

The publication costs of this article were defrayed in part by page charge payment. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 U.S.C. section 1734.

The contents of this article are solely the responsibility of the authors and do not necessarily represent the official views of the National Cancer Institute.

1
Glucksberg H, Storb R, Fefer A, et al. Clinical manifestations of graft-versus-host disease in human recipients of marrow from HLA-matched sibling donors.
Transplantation.
1974
;
18
:
295
-304.
2
Przepiorka D, Weisdorf D, Martin P, et al. Consensus conference on acute GVHD grading.
Bone Marrow Transplant.
1995
;
15
:
825
-828.
3
Rowlings PA, Przepiorka D, Klein JP, et al. IBMTR severity index for grading acute graft-versus-host disease: retrospective comparison with Glucksberg grade.
Br J Haematol.
1997
;
97
:
855
-864.
4
Antin JH, Weisdorf D, Neuberg D, et al. Interleukin-1 blockade does not prevent acute graft-versus-host-disease: results of a randomized, double-blind, placebo-controlled trial of interleukin-1 receptor antagonist in allogeneic bone marrow transplantation.
Blood.
2002
;
100
:
3479
-3483.
5
Graf E, Schmoor C, Sauerbrei W, Schumacher M. Assessment and comparison of prognostic classification schemes for survival data.
Stat Med.
1999
;
18
:
2529
-2545.
6
Agresti A.
Categorical Data Analysis
. New York, NY: John Wiley & Sons;
2002
.
7
Martino R, Romero P, Subira M, et al. Comparison of the classic Glucksberg criteria and the IBMTR Severity Index for grading acute graft-versus-host disease following HLA-identical sibling stem cell transplantation.
Bone Marrow Transplant.
1999
;
24
:
283
-287.
8
MacMillan ML, Weisdorf DJ, Wagner JE, et al. Response of 443 patients to steroids as primary therapy for acute graft-versus-host disease: comparison of grading systems.
Biol Blood Marrow Transplant.
2002
;
8
:
387
-394.
9
Weisdorf DJ, Hurd D, Carter S, et al. Prospective grading of graft-versus-host disease after unrelated donor marrow transplantation: a grading algorithm versus blinded expert panel review.
Biol Blood Marrow Transplant.
2003
;
9
:
512
-518.
Sign in via your Institution