In Reply
In Reply.—We thank Dr Chin-Yee and colleagues for their interest in our study. They point out some practical issues with applying the machine learning (ML) algorithm we describe for JAK2 decision support. These include hemoglobin (Hgb) cutoffs, erythropoietin (EPO) availability, and the model’s generalizability when trained on a predominantly male population.
Inclusion criteria were based on Hgb concentrations above high-normal reference ranges. These cutoffs are somewhat different from those endorsed by the World Health Organization (WHO), which are intended as diagnostic criteria for polycythemia vera (PV) and optimized to differentiate PV from essential thrombocythemia.1 First, Hgb was not used as a training parameter. Additionally, platelets and relative distribution width (RDW)—which are not affected by Hgb—had the greatest impact on model performance, while red blood cell count had marginal effect. WHO Hgb criteria were used to standardize comparisons between the algorithm and rule-based systems. Under these conditions, algorithm performance was comparable (88.5% potential reduction in JAK2 testing) to cases selected by reference range criteria (90.3%) and exceeded rule-based systems (50.0% to 62.3%).2
We agree that the availability of EPO results has practical limitations as shown by this parameter—among all others—having the highest number of missing values: 26.8% for training/validation and 31.8% for out-of-sample data sets.2 EPO, which ranked third in impact on model performance, was not used in rule-based systems, nor was RDW, which ranked second. Use of these parameters in the algorithm likely contributed to better performance when compared to rule-based systems. However, it’s worth noting that 92% of false positives seen in the independent data set were associated with low EPO levels, which substantially lowered the algorithm’s positive predictive power.2 Additional comparisons between the ML algorithm described in our article, which lacks EPO inputs, or other models trained with only blood count parameters against current or new rule-based systems that might be improved by including RDW, could help identify the most effective and practical JAK2 decision support approach.
The validation data sets were highly overweighted with male cases (96.6%)2 and less so (205 of 285; 72.9%) for the JAKPOT study cited. The subset of out-of-sample cases used to compare the algorithm to JAKPOT rules had a similar proportion of males, 71%.2 Despite a male-dominant training set, there were only marginal differences in algorithm performance between male and female cases in the out-of-sample data set. See Table 1 and Figure 4 in Schifman et al.2 However, the sample size was low, and further evaluations with other defined populations and larger data sets will be necessary to resolve this concern as well as better delineate differences between the effectiveness of ML and rule-based systems for JAK2 decision support.
Contributor Notes
The author has no relevant financial interest in the products or companies described in this article.