DriftMoE – A Mixture of Experts Approach to Handle Concept Drifts
Model weights for paper DriftMoEThis repository hosts weights only so you can plug the model straight into your Python pipeline. These weights correspond to one training run on the LED_g stream. Full training code & utilities live in a separate GitHub repo: https://github.com/miguel-ceadar/drift-moe
📂 Files
We have two folders, one for the MoE-Task variant and other for the MoE-Data variant, both share this file structure:
router.pth # PyTorch state_dict for the gating MLP
expert_0.pkl # CapyMOA HoeffdingTree for each expert(pickled)
expert_1.pkl
…
expert_{N‑1}.pkl
⚡ Quick Start (CPU or GPU)
1 · Install runtime deps
You need to install Java and have a working java Runtime Environment to run Capymoa: https://openjdk.org/install/
python -m pip install torch capymoa numpy river
# git clone training repo – needed so we can recreate RouterMLP & Expert wrappers
git clone https://github.com/miguel-ceadar/drift-moe drift_moe
2 · Load the router & experts
import torch, pickle, numpy as np
from capymoa.misc import load_model
from drift_moe.driftmoe.moe_model import RouterMLP, Expert
INPUT_DIM = 24 # ↩ must match dimensions of LED_g stream
NUM_CLASSES = 10 # ↩ idem
N_EXPERTS = 12 # ↩ number of expert_*.pkl files (12 for MoE_Data and 10 for MoE_Task)
DEVICE = 'cpu' # or 'cuda'
# 2‑a) Router
router = RouterMLP(input_dim=INPUT_DIM, hidden_dim=256, output_dim=N_EXPERTS)
router.load_state_dict(torch.load('path/to/router.pth', map_location=DEVICE))
router = router.to(DEVICE).eval()
# 2‑b) Experts (pickled CapyMOA trees)
experts = []
for i in range(N_EXPERTS):
with open(f'path/to/expert_{i}.pkl', 'rb') as f:
ex = load_model(f) # HoeffdingTree object
experts.append(ex)
Reference: the official CapyMOA save & load notebook.
3 · Single‑sample inference helper
def predict_one(instance) -> int:
"""Route a single feature vector through driftMoE and return class index."""
x_vec = instance.x
x_t = torch.tensor(x_vec, dtype=torch.float32).unsqueeze(0).to(DEVICE)
with torch.no_grad():
logits = router(x_t) # shape [1, N_EXPERTS]
eid = int(torch.argmax(logits, 1).item())
return experts[eid].predict(instance)
🚰 Streaming usage
from capymoa.stream.generator import LEDGenerator
stream = LEDGenerator()
while stream.has_more_instances():
inst = stream.next_instance()
y_hat = predict_one(inst)
print(y_hat)
print(inst.y_index)
The experts are frozen; only the router runs every forward pass.
✏️ Citation
@misc{aspis2025driftmoemixtureexpertsapproach,
title={DriftMoE: A Mixture of Experts Approach to Handle Concept Drifts},
author={Miguel Aspis and Sebastián A. Cajas Ordónez and Andrés L. Suárez-Cetrulo and Ricardo Simón Carbajo},
year={2025},
eprint={2507.18464},
archivePrefix={arXiv},
primaryClass={stat.ML},
url={https://arxiv.org/abs/2507.18464},
}
Questions or issues? Open an issue on the GitHub repo and we’ll be happy to help.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support