OrdinalForest Classification
Definition
OrdinalForest (HOF — Hierarchical OrdinalForest) is a random forest-based machine learning algorithm that explicitly incorporates the ordinal structure of class labels into its optimization criterion. It is distinct from a standard Random Forest treating multi-class outcomes as nominal, and avoids the implicit assumption that all class boundaries are equal.
Used in el-balkhi-2025 for multi-class staging of chronic liver disease from HSA isoform spectral profiles.
Why ordinal classification for CLD staging?
Liver fibrosis stages (F0/F1, F2/F3, F4_A, F4_B, F4_C) are ordered — a misclassification of F4_B as F4_A is less costly than misclassifying F4_B as F0/F1. Standard accuracy metrics penalize all errors equally and are therefore misleading for staged disease.
Quadratic Weighted Kappa (QWK) is the appropriate primary metric: it penalizes predictions proportionally to their ordinal distance from the true class.
Three architectures evaluated in ALBOM
| Model | Description | Key property |
|---|---|---|
| RF | Standard Random Forest, ordinal outcome treated as factor | Ignores ordinal structure |
| HRF | Hierarchical Random Forest — sequential binary decomposition of 6 classes | Captures hierarchy, not ordinality |
| HOF | Hierarchical OrdinalForest — incorporates ordinal loss into optimization | Best: explicit ordinal structure + hierarchy |
HOF was selected as the final model.
Implementation in ALBOM study
Input features
- Spectral features: 75 selected from the full albumin spectral region (m/z 66,000–68,000 Da) by permutation-importance feature selection
- Clinical features: 4 routine variables (total protein, serum albumin, INR, total bilirubin)
- Combined model: “LC-TOF + Clinical” — 75 spectral + 4 clinical
Feature selection pipeline
- Full normalized feature matrix → initial Random Forest fit
- Permutation importance scoring
- 5-fold cross-validated QWK over grid of k values (20, 30, 40, 50, 75, 100)
- Optimal: k = 75 (highest cross-validated QWK without overfitting)
- Applied exclusively within training partition to prevent data leakage
Preprocessing
- Total Ion Current (TIC) normalization — correct inter-instrument intensity differences
- Probabilistic Quotient Normalization (PQN) — correct dilution effects
Train/test split
- 80% stratified training / 20% held-out test (stratified by fibrosis class)
- Same pipeline applied independently to Platform 1 and Platform 2 data
Software
R v4.4.2, packages: ordinalForest, ranger, yardstick; RStudio 2025.05.1+513
Performance (ALBOM study)
| Platform | n_test | QWK | 95% CI (bootstrap, 1000 iter.) |
|---|---|---|---|
| Bruker timsTOF Pro2 (P1) | 46 | 0.862 | 0.735–0.923 |
| Sciex TripleTOF 5600+ (P2) | 49 | 0.916 | 0.822–0.964 |
| FIB-4 (comparator) | — | 0.188–0.229 | — |
Secondary metric: balanced accuracy (reported to account for class imbalance).
Confusion matrix highlights (Platform 1)
- Control class: 12/15 correctly classified
- F2/F3 class: 8/10
- F4_C class: 4/4 (perfect)
- F4_A: 2/7 correctly classified — most misclassified (assigned to F2 or F4_B) — reflects biological overlap of compensated cirrhosis
Feature importance findings (from Fig S3, Supplemental data)
Top 4 features: clinical variables — total protein, routine serum albumin, INR, total bilirubin.
Spectral albumin peaks cluster in two sub-regions:
- ~66,230–66,600 Da — native HSA and cysteinylated isoforms (HSA+CYS, HSA+CYS+GLYC range); early-to-mid disease signal
- ~67,024–67,457 Da — poly-glycated albumin adducts (HSA+2GLYC, HSA+CYS+2GLYC range); advanced/end-stage disease signal
This bimodal spectral importance pattern is fully consistent with the 3-pattern biological model: the cysteinylated region captures the biphasic early-middle disease signal, while the poly-glycated region captures the monotonically-increasing end-stage signal.
Cross-platform equivalence
Same pipeline applied independently to both platforms; cross-platform agreement assessed by:
- McNemar’s test on paired predictions: p = 0.149 → no significant difference in classification decisions
- Jaccard Similarity Index of error matrices: 0.696 (>0.5 threshold → errors are biologically driven, not instrument-specific)
Generalizability notes
- ⚠️ Training and test from same single-center cohort — reported accuracy is internal; external validation (MALAHBAR NCT06318949) is underway
- Model currently requires LC-HR-MS input — not directly deployable to simpler assay formats
- The 75-feature spectral approach could in principle be translated to targeted LC-MRM-MS if isoform ratios are validated as the key predictors
Key references
- el-balkhi-2025 — primary application in CLD staging
- R package:
ordinalForest(Hornung R; CRAN)