🚀 MobileTransformers: Full On-Device LLM Training, Inference and RAG Stack Running Natively on Android #26747

martinkorelic · 2025-12-07T19:39:57Z

martinkorelic
Dec 7, 2025

Coming from a previous post, I wanted to fully share something I've been working on that pushes ONNX Runtime into territory I haven't seen much discussion about: complete on-device LLM fine-tuning on smartphones.

MobileTransformers (or ORTransformersMobile) is an end-to-end framework that enables:

✅ On-device training of custom open-source language models through PEFT methods like LoRA
✅ Weight merging and exporting directly on the smartphone
✅ Optimized inference from newly trained weights
✅ RAG integration with on-device vector databases and embeddings

Everything runs natively on Android hardware (tested on Google Pixel 6) using ONNX Runtime's training and inference APIs. No cloud, no emulation, just pure on-device ML with ONNX Runtime.

Complete on-device workflow: fine-tuning, inference/RAG of 500M-1B parameter models (Qwen2-0.5B, TinyLlama-1.1B) running natively on a Pixel 6.

Appreciation and Plea to the ONNX Runtime Maintainers

I realize the training APIs have been largely deprecated and haven't seen much recent activity, but I wanted to share what's possible when they're pushed to their limits. This is both an appreciation post for the incredible foundation you've built and a respectful plea to consider the research and production use cases that still depend on it. In my experiments, training could only be ran exclusively on CPU execution provider. I am unsure if on-device GPU/NPU training support exists or can be enabled now, but that would dramatically accelerate on-device fine-tuning workflows.

The combination of ONNX Runtime's training API, cross-platform support, and hardware execution providers made this work possible. Without that foundation, there would be no practical path to on-device LLM fine-tuning on mobile hardware.

I also want to acknowledge onnxruntime-genai—it's excellent work and I explored integrating it directly into the application. However, for this project I needed much deeper control over, which is why I decided to do my own limited inference that works with training/inference/RAG.

Code base

Full code, application and research is available at https://gitlab.fri.uni-lj.si/lrk/mobiletransformers

The framework is also designed in mind to be extensible for future on-device ML workflows (reinforcement learning, federated learning, continual learning etc...) 🚀

Feel free to explore, there are definitely optimizations to improve and bugs to squash 😅

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🚀 MobileTransformers: Full On-Device LLM Training, Inference and RAG Stack Running Natively on Android #26747

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

🚀 MobileTransformers: Full On-Device LLM Training, Inference and RAG Stack Running Natively on Android #26747

Uh oh!

Uh oh!

martinkorelic Dec 7, 2025

Appreciation and Plea to the ONNX Runtime Maintainers

Code base

Replies: 0 comments

martinkorelic
Dec 7, 2025