🚀 MobileTransformers: Full On-Device LLM Training, Inference and RAG Stack Running Natively on Android #26747
martinkorelic
started this conversation in
Show & Tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Coming from a previous post, I wanted to fully share something I've been working on that pushes ONNX Runtime into territory I haven't seen much discussion about: complete on-device LLM fine-tuning on smartphones.
MobileTransformers (or ORTransformersMobile) is an end-to-end framework that enables:
Everything runs natively on Android hardware (tested on Google Pixel 6) using ONNX Runtime's training and inference APIs. No cloud, no emulation, just pure on-device ML with ONNX Runtime.
Appreciation and Plea to the ONNX Runtime Maintainers
I realize the training APIs have been largely deprecated and haven't seen much recent activity, but I wanted to share what's possible when they're pushed to their limits. This is both an appreciation post for the incredible foundation you've built and a respectful plea to consider the research and production use cases that still depend on it. In my experiments, training could only be ran exclusively on CPU execution provider. I am unsure if on-device GPU/NPU training support exists or can be enabled now, but that would dramatically accelerate on-device fine-tuning workflows.
The combination of ONNX Runtime's training API, cross-platform support, and hardware execution providers made this work possible. Without that foundation, there would be no practical path to on-device LLM fine-tuning on mobile hardware.
I also want to acknowledge onnxruntime-genai—it's excellent work and I explored integrating it directly into the application. However, for this project I needed much deeper control over, which is why I decided to do my own limited inference that works with training/inference/RAG.
Code base
Full code, application and research is available at https://gitlab.fri.uni-lj.si/lrk/mobiletransformers
The framework is also designed in mind to be extensible for future on-device ML workflows (reinforcement learning, federated learning, continual learning etc...) 🚀
Feel free to explore, there are definitely optimizations to improve and bugs to squash 😅
Beta Was this translation helpful? Give feedback.
All reactions