Allow multiple model slots even when detected VRAM is 0

Right now if the detected VRAM is 0mb, we only allow 1 model to run at a time. In workflows where we need to query multiple models this can be quite the limitation because DMR will constantly swap the models in and out of memory even when they'd both fit, thrashing the storage and hugely slowing down the UX.

In my specific example, I'm using AMD Strix Halo with 128gb of ram with Vulkan on the llama.cpp backend. 

I think it'd be nice if either:
- DMR didn't limit the number of model slots;
- There was an option to configure this behavior system-wide

Any thoughts around this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow multiple model slots even when detected VRAM is 0 #453

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Allow multiple model slots even when detected VRAM is 0 #453

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions