Skip to content
This repository was archived by the owner on Jun 24, 2024. It is now read-only.
This repository was archived by the owner on Jun 24, 2024. It is now read-only.

Behavior when missing quantization version #447

@Reichenbachian

Description

@Reichenbachian

The problem happened below. Turns out it didn't include the "general.quantization_version" metadata. In the case that llama.cpp reads a file without a version, it assumes 2 (grep for the line gguf_set_val_u32(ctx_out, "general.quantization_version", GGML_QNT_VERSION);), so this model works with llama.cpp but fails with rusformers/llm.

model_name = "meta-llama/Llama-2-7b-chat-hf"
    model = AutoModelForCausalLM.from_pretrained(model_name)
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    tokenizer.save_pretrained(local_dir)
    torch.save(model.state_dict(), os.path.join(local_dir, "pytorch_model.bin"))
python llm/crates/ggml/sys/llama-cpp/convert.py models/ --vocab-dir models/ --ctx 4096 --outtype q8_0
let model = llm::load(
                path,
                llm::TokenizerSource::Embedded,
                parameters,
                llm::load_progress_callback_stdout,
            )
            .unwrap_or_else(|err| panic!("Failed to load model: {err}"));

thread '<unnamed>' panicked at llm/inference/src/llms/local/llama2.rs:45:35:
Failed to load model: quantization version was missing, despite model containing quantized tensors

My solution was to just get rid of this whole block

    let any_quantized = gguf
        .tensor_infos
        .values()
        .any(|t| t.element_type.is_quantized());
    // if any_quantized {
    //     match quantization_version {
    //         Some(MetadataValue::UInt32(2)) => {
    //             // Currently supported version
    //         }
    //         Some(quantization_version) => {
    //             return Err(LoadError::UnsupportedQuantizationVersion {
    //                 quantization_version: quantization_version.clone(),
    //             })
    //         }
    //         None => return Err(LoadError::MissingQuantizationVersion),
    //     }
    // }

Unsure how you want to handle this since it does remove a check.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions