Skip to content

Getting started. #2597

Answered by ianscrivener
arthurwolf asked this question in Q&A
Aug 13, 2023 · 2 comments · 5 replies
Discussion options

You must be logged in to vote

@arthurwolf,
llama.cpp can definately do the job! eg "I'm succesfully running llama-2-70b-chat.ggmlv3.q3_K_S on my 32 GB RAM on cpu with speed of 1.2 tokens/s without any GPU offloading (i dont have a descrete gpu), using full 4k context and kobold.cpp on windows 11 pro. mentioned here

Were it me; I'd start my experimentations with 38.8Gb llama-2-70b-chat.ggmlv3.q4_0.bin

We know GPU is significantly faster than CPU. I use Runpod for Cloud GPU. An A6000 has 62Gb GPU RAM... so could run the above Llama-2-70B-Chat model in GPU. RunPod has a template, also by TheBloke, that is a good starting point doco here

There seems to be far more disussion of the MLOps stuff (model selection, hardware sp…

Replies: 2 comments 5 replies

Comment options

You must be logged in to vote
5 replies
@arthurwolf
Comment options

@arthurwolf
Comment options

@ianscrivener
Comment options

@arthurwolf
Comment options

@ianscrivener
Comment options

Answer selected by arthurwolf
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants