OrdinalForest Classification

Definition

OrdinalForest (HOF — Hierarchical OrdinalForest) is a random forest-based machine learning algorithm that explicitly incorporates the ordinal structure of class labels into its optimization criterion. It is distinct from a standard Random Forest treating multi-class outcomes as nominal, and avoids the implicit assumption that all class boundaries are equal.

Used in el-balkhi-2025 for multi-class staging of chronic liver disease from HSA isoform spectral profiles.

Why ordinal classification for CLD staging?

Liver fibrosis stages (F0/F1, F2/F3, F4_A, F4_B, F4_C) are ordered — a misclassification of F4_B as F4_A is less costly than misclassifying F4_B as F0/F1. Standard accuracy metrics penalize all errors equally and are therefore misleading for staged disease.

Quadratic Weighted Kappa (QWK) is the appropriate primary metric: it penalizes predictions proportionally to their ordinal distance from the true class.

Three architectures evaluated in ALBOM

ModelDescriptionKey property
RFStandard Random Forest, ordinal outcome treated as factorIgnores ordinal structure
HRFHierarchical Random Forest — sequential binary decomposition of 6 classesCaptures hierarchy, not ordinality
HOFHierarchical OrdinalForest — incorporates ordinal loss into optimizationBest: explicit ordinal structure + hierarchy

HOF was selected as the final model.

Implementation in ALBOM study

Input features

  • Spectral features: 75 selected from the full albumin spectral region (m/z 66,000–68,000 Da) by permutation-importance feature selection
  • Clinical features: 4 routine variables (total protein, serum albumin, INR, total bilirubin)
  • Combined model: “LC-TOF + Clinical” — 75 spectral + 4 clinical

Feature selection pipeline

  1. Full normalized feature matrix → initial Random Forest fit
  2. Permutation importance scoring
  3. 5-fold cross-validated QWK over grid of k values (20, 30, 40, 50, 75, 100)
  4. Optimal: k = 75 (highest cross-validated QWK without overfitting)
  5. Applied exclusively within training partition to prevent data leakage

Preprocessing

  • Total Ion Current (TIC) normalization — correct inter-instrument intensity differences
  • Probabilistic Quotient Normalization (PQN) — correct dilution effects

Train/test split

  • 80% stratified training / 20% held-out test (stratified by fibrosis class)
  • Same pipeline applied independently to Platform 1 and Platform 2 data

Software

R v4.4.2, packages: ordinalForest, ranger, yardstick; RStudio 2025.05.1+513

Performance (ALBOM study)

Platformn_testQWK95% CI (bootstrap, 1000 iter.)
Bruker timsTOF Pro2 (P1)460.8620.735–0.923
Sciex TripleTOF 5600+ (P2)490.9160.822–0.964
FIB-4 (comparator)0.188–0.229

Secondary metric: balanced accuracy (reported to account for class imbalance).

Confusion matrix highlights (Platform 1)

  • Control class: 12/15 correctly classified
  • F2/F3 class: 8/10
  • F4_C class: 4/4 (perfect)
  • F4_A: 2/7 correctly classified — most misclassified (assigned to F2 or F4_B) — reflects biological overlap of compensated cirrhosis

Feature importance findings (from Fig S3, Supplemental data)

Top 4 features: clinical variables — total protein, routine serum albumin, INR, total bilirubin.

Spectral albumin peaks cluster in two sub-regions:

  1. ~66,230–66,600 Da — native HSA and cysteinylated isoforms (HSA+CYS, HSA+CYS+GLYC range); early-to-mid disease signal
  2. ~67,024–67,457 Da — poly-glycated albumin adducts (HSA+2GLYC, HSA+CYS+2GLYC range); advanced/end-stage disease signal

This bimodal spectral importance pattern is fully consistent with the 3-pattern biological model: the cysteinylated region captures the biphasic early-middle disease signal, while the poly-glycated region captures the monotonically-increasing end-stage signal.

Cross-platform equivalence

Same pipeline applied independently to both platforms; cross-platform agreement assessed by:

  1. McNemar’s test on paired predictions: p = 0.149 → no significant difference in classification decisions
  2. Jaccard Similarity Index of error matrices: 0.696 (>0.5 threshold → errors are biologically driven, not instrument-specific)

Generalizability notes

  • ⚠️ Training and test from same single-center cohort — reported accuracy is internal; external validation (MALAHBAR NCT06318949) is underway
  • Model currently requires LC-HR-MS input — not directly deployable to simpler assay formats
  • The 75-feature spectral approach could in principle be translated to targeted LC-MRM-MS if isoform ratios are validated as the key predictors

Key references

  • el-balkhi-2025 — primary application in CLD staging
  • R package: ordinalForest (Hornung R; CRAN)