-
Notifications
You must be signed in to change notification settings - Fork 70
Open
Labels
enhancementNew feature or requestNew feature or requestquestionFurther information is requestedFurther information is requested
Description
Right now if the detected VRAM is 0mb, we only allow 1 model to run at a time. In workflows where we need to query multiple models this can be quite the limitation because DMR will constantly swap the models in and out of memory even when they'd both fit, thrashing the storage and hugely slowing down the UX.
In my specific example, I'm using AMD Strix Halo with 128gb of ram with Vulkan on the llama.cpp backend.
I think it'd be nice if either:
- DMR didn't limit the number of model slots;
- There was an option to configure this behavior system-wide
Any thoughts around this?
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestquestionFurther information is requestedFurther information is requested