Quality of answers after converting hf safetensors to gguf format #8319

1oid · 2024-07-05T06:18:52Z

1oid
Jul 5, 2024

Environments

Model: shenzhi-wang/Llama3-8B-Chinese-Chat
Finetuning method: lora
Dataset: some yaml format data
Compute type: f16

use huggingface inference

the output content is only yaml data, which is the format I expected

convert hf (merge and export the model with lora by llama factory) to gguf format

python convert-hf-to-gguf.py /root/path/poc1k_merged --outfile /root/path/poc1k.gguf --outtype f16

use llama-cli

./build/bin/llama-cli -m /root/autodl-fs/poc1k.gguf -n 512 --top-p 0.7 --temp 0.95  -t 50 -p "write a yaml poc, request /path.txt and response should be 200 and contains 123456 string"

response:

system_info: n_threads = 50 / 192 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 0 |
sampling:
	repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
	top_k = 40, tfs_z = 1.000, top_p = 0.700, min_p = 0.050, typical_p = 1.000, temp = 0.950
	mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampling order:
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature
generate: n_ctx = 8192, n_batch = 2048, n_predict = 512, n_keep = 0


write a yaml poc, request /path.txt and response should be 200 and contains 123456 string
            poc-yaml name: poc-yaml-digest-auth-bypass-2
            level: 2
            finger: |
              finger.name.lcontains("cisco")
            rules:
              - method: GET
                path: /path.txt
                expression: |
                  response.status == 200 && response.body.bcontains(b"123456")
        - method: GET
          path: /path.txt
          headers:
            Authorization: Digest username="cisco", realm="Cisco IOS", nonce="MTE0NTY0OTg2MzA2MTc4OTUw", response="1d0c5a7b2d4f9e6a2b0a9d1c0b8f6a6"
          expression: |
            response.status == 200 && response.body.bcontains(b"123456")
    others:
      cisco: 9.8
    detail:
      author: ez
      tvul_id: 82864

  - Filename: poc-yaml-digest-auth-bypass-1.yml

level: 2
finger: |
.....other response...

You can see that the replies are irrelevant and there are many other responses.
Why?And how to solve it?

ggerganov · 2024-07-05T06:45:02Z

ggerganov
Jul 5, 2024
Maintainer

Try to pass the -cnv argument so that llama-cli knows to use the built-in chat template. Pass your instruction as the first message when prompted:

./build/bin/llama-cli -m /root/autodl-fs/poc1k.gguf -n 512 --top-p 0.7 --temp 0.95 -t 50 -cnv

2 replies

1oid Jul 5, 2024
Author

chat template

chat template example: <|start_header_id|>system<|end_header_id|>

You are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>

Hello<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Hi there<|eot_id|><|start_header_id|>user<|end_header_id|>

How are you?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

1oid Jul 5, 2024
Author

response when prompt

system




>  write a yaml poc, request /path.txt and response should be 200 and contains 123456 string
name: poc-yaml-file-upload-cve-2021-33514
level: 2
finger: |
  finger.name.lcontains("cisco")
rules:
  - method: GET
    path: /path.txt
    expression: |
      response.status == 200 && response.body.bcontains(b"123456")
detail:
  author:
  tvul_id:

> name: poc-yaml-file-read-1
level: 2
finger: |
  finger.name.lcontains("cisco")
rules:
  - method: GET
    path: /path.txt
    expression: |
      response.status == 200 && response.body.bcontains(b"123456")
detail:
  author:
  tvul_id:

> level: 1
level 1 means you want a low difficulty exploit, so I'll provide you with a different exploit. This one is for a different vulnerability, but it also works on a Cisco device.

name: poc-yaml-cisco-firmware-info-cve-2021-26855
level: 1
finger: |
  finger.name.lcontains("")
rules:
  - method: GET
    path: /firmware/Info.html
    expression: |
      response.status == 200
... other response...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Quality of answers after converting hf safetensors to gguf format #8319

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Quality of answers after converting hf safetensors to gguf format #8319

Uh oh!

1oid Jul 5, 2024

Environments

use huggingface inference

convert hf (merge and export the model with lora by llama factory) to gguf format

use llama-cli

Replies: 1 comment · 2 replies

Uh oh!

ggerganov Jul 5, 2024 Maintainer

Uh oh!

1oid Jul 5, 2024 Author

Uh oh!

1oid Jul 5, 2024 Author

1oid
Jul 5, 2024

Replies: 1 comment 2 replies

ggerganov
Jul 5, 2024
Maintainer

1oid Jul 5, 2024
Author

1oid Jul 5, 2024
Author