give different answer for the same question between llama.cpp‘s main.exe and this project

# Prerequisites

Please answer the following questions for yourself before submitting an issue.

- [x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [x] I carefully followed the [README.md](https://github.com/abetlen/llama-cpp-python/blob/main/README.md).
- [x] I [searched using keywords relevant to my issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/filtering-and-searching-issues-and-pull-requests) to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/abetlen/llama-cpp-python/discussions), and have a new bug or useful enhancement to share.

# Expected Behavior

Please provide a detailed written description of what you were trying to do, and what you expected `llama-cpp-python` to do.
what i am trying to do: i want the model translate a sentence from chinese to english for me.  
when i call the model with original llama.cpp with cmd 
```bash
main -m ../llama.cpp/zh-models/7B/ggml-model-q4_0.bin --color -f prompts/alpaca.txt -ins -c 2048 --temp 0.2 -n 256 --repeat_penalty 1.3
```    
the model works fine and give the right output like:
![good response](https://github.com/abetlen/llama-cpp-python/assets/97584093/a96b41df-a8b3-450a-9743-5b3ac4d9aa25)
notice that the yellow line `Below is an ......` is the content for a  prompt file , the file has been passed to the model with `-f prompts/alpaca.txt` and i can't find this param in this project thus i can't tell whether it is the reason for this issue.



# Current Behavior

when i run the same thing with llama-cpp-python like this:
```python
# gpt-manager.py
from llama_cpp import Llama  # type:ignore

class GPTManager(object):

    def __init__(self, n_thread=4):
        self._n_thread = n_thread

    def gen_response(self, user_input: str, model_path: str):
        promote = user_input.strip()  
        if len(promote) > 0:
            llm = Llama(
                model_path=model_path, n_threads=self._n_thread)
            user_ctx = "Q:" + promote + " A: "
            output = llm(user_ctx, max_tokens=256, stop=[
                         "Q:"], echo=True, temperature=0.2)
            print(output)
            return output["choices"][0]["text"].replace(user_ctx, "") # type:ignore
        else:
            return "Input Can Not Be Empty!"


if __name__ == "__main__":
    GPT = GPTManager()
    u_in = """
    将下边的句子翻译成英文"一个可爱的女孩在海滩上奔跑"
    """
    m_path = "./ggml-model-q5_1.bin"
    opt = GPT.gen_response(u_in,m_path)
    print(opt)
```
the output were:  
![bad response](https://github.com/abetlen/llama-cpp-python/assets/97584093/d885f8ca-1c5e-42c7-8bc9-e4c4c7ffd269)

you can see that in this way, the model just return the content to me instead of translate it.

# Environment and Context

Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.


* Physica

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
Linux xxxxx 5.15.0-73-generic #80~20.04.1-Ubuntu SMP Wed May 17 14:58:14 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     


```
$ python3 --version `3.9.0`
$ make --version  `Make 4.2.1`
$ g++ --version `g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0`
```


# Steps to Reproduce

Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.

1. step 1:  just run it as above 

it worked but not as the way i want, so i don't think the questions below will help thus remove them.  
I can totally understand that models are bulid on probability things so they may give answers with little differentce but i still want to get some help here.  
thanks in advance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

give different answer for the same question between llama.cpp‘s main.exe and this project #384

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Steps to Reproduce

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

give different answer for the same question between llama.cpp‘s main.exe and this project #384

Description

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Steps to Reproduce

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions