Skip to content

give different answer for the same question between llama.cpp‘s main.exe and this project #384

Open
@zhiyixu

Description

@zhiyixu

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

Please provide a detailed written description of what you were trying to do, and what you expected llama-cpp-python to do.
what i am trying to do: i want the model translate a sentence from chinese to english for me.
when i call the model with original llama.cpp with cmd

main -m ../llama.cpp/zh-models/7B/ggml-model-q4_0.bin --color -f prompts/alpaca.txt -ins -c 2048 --temp 0.2 -n 256 --repeat_penalty 1.3

the model works fine and give the right output like:
good response
notice that the yellow line Below is an ...... is the content for a prompt file , the file has been passed to the model with -f prompts/alpaca.txt and i can't find this param in this project thus i can't tell whether it is the reason for this issue.

Current Behavior

when i run the same thing with llama-cpp-python like this:

# gpt-manager.py
from llama_cpp import Llama  # type:ignore

class GPTManager(object):

    def __init__(self, n_thread=4):
        self._n_thread = n_thread

    def gen_response(self, user_input: str, model_path: str):
        promote = user_input.strip()  
        if len(promote) > 0:
            llm = Llama(
                model_path=model_path, n_threads=self._n_thread)
            user_ctx = "Q:" + promote + " A: "
            output = llm(user_ctx, max_tokens=256, stop=[
                         "Q:"], echo=True, temperature=0.2)
            print(output)
            return output["choices"][0]["text"].replace(user_ctx, "") # type:ignore
        else:
            return "Input Can Not Be Empty!"


if __name__ == "__main__":
    GPT = GPTManager()
    u_in = """
    将下边的句子翻译成英文"一个可爱的女孩在海滩上奔跑"
    """
    m_path = "./ggml-model-q5_1.bin"
    opt = GPT.gen_response(u_in,m_path)
    print(opt)

the output were:
bad response

you can see that in this way, the model just return the content to me instead of translate it.

Environment and Context

Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.

  • Physica

Linux xxxxx 5.15.0-73-generic #80~20.04.1-Ubuntu SMP Wed May 17 14:58:14 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

$ python3 --version `3.9.0`
$ make --version  `Make 4.2.1`
$ g++ --version `g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0`

Steps to Reproduce

Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.

  1. step 1: just run it as above

it worked but not as the way i want, so i don't think the questions below will help thus remove them.
I can totally understand that models are bulid on probability things so they may give answers with little differentce but i still want to get some help here.
thanks in advance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    qualityQuality of model output

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions