Machine Learning for JAK2 Mutation Prediction in Erythrocytosis: Context Matters
To the Editor.—Schifman et al1 present a machine learning (ML) classifier using blood count parameters and erythropoietin (EPO) levels to predict JAK2 mutations in patients with elevated hemoglobin. Their contribution to this evolving area is commendable. However, we wish to highlight several methodologic and practical issues that limit the relevance and clinical applicability of this approach.
First, the authors define erythrocytosis by using hemoglobin thresholds greater than 15 g/dL for females and greater than 17 g/dL for males—values that diverge from World Health Organization and International Consensus Classification criteria (>16.0 g/dL and >16.5 g/dL, respectively).2,3 The female threshold in particular may capture individuals who would not be evaluated for erythrocytosis under standard diagnostic frameworks. As a result, the model may be trained on a population that differs substantially from those typically referred to hematology clinics.
Second, the inclusion of EPO introduces practical constraints, with turnaround times exceeding 1 week in most laboratories. As the authors note, EPO has limited standalone diagnostic utility,4 further underscored by the recent discovery of hepatic-like EPO variants that cause erythrocytosis despite normal EPO levels.5 While EPO may add value combined with parameters, its use may necessitate additional clinic visits, introducing diagnostic delays and limiting timely decision support. In contrast, the JAKPOT rule6 relies only on parameters available at the initial clinic visit, enabling real-time decision-making about JAK2 testing.
A more fundamental concern lies in the nature of the training and validation data. The models were trained on Veterans Affairs registry data, which may represent an older population and be validated with an independent hospital laboratory system. While large, there is lack of clinical context, particularly referral indication, inpatient versus outpatient setting, and final diagnosis. Further, the training cohort was overwhelmingly male (8190 of 8479; 96.6%), which, combined with the nonstandard hemoglobin threshold for women, raises questions about the model’s applicability to female patients in real-world practice.1 In contrast, the JAKPOT cohort comprised patients referred for elevated hemoglobin levels in outpatient internal medicine and hematology clinics,6 more closely reflecting the population in which such tools are likely to have the greatest clinical impact.
While both the ML model and the JAKPOT rule achieved 100% sensitivity and negative predictive value in validation, the authors cite greater test reduction with their model (89% versus 50%) as a key advantage. It should be noted that the population analyzed had a lower JAK2 mutation prevalence (2.7%) than that observed in real-world hematology clinics,7 raising further questions about the model’s applicability.
Machine learning holds promise to support clinical decision-making in hematology-oncology, where there is a need for tools to improve diagnostic stewardship.8 Schifman et al1 take an important step in this direction and the results presented may be promising, but any clinical benefit must be verified in larger data sets with age, sex, and JAK2 characteristics reflective of anticipated clinical application. Further, to be truly useful such tools must be developed and validated in clinically relevant populations and rely on variables that are readily available to support real-time decision-making. Toward these ends, simpler rules like JAKPOT—now undergoing prospective validation (NCT06785870), essential before adoption of any such tool—may offer a more practical path forward to support JAK2 testing decisions in everyday hematology practice.
Contributor Notes
The authors have no relevant financial interest in the products or companies described in this article.