You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If the entire model and context size don't fit, can we have a feature -ngl -1 where that automatically calculates the maximum number of layers that could fit into the gpu, and offload the rest to cpu?