A special token '\u0000' will cause an assert error in 'llm_load_vocab' #5111

SolenoidWGT · 2024-01-24T14:23:34Z

SolenoidWGT
Jan 24, 2024

I'm trying to fit an InternLM2 model for llama.cpp, but I get an assertion error when using llama.cpp for inference, below is the error stack. The commit ID of llama.cpp code is 77bc1bb

$  ./main -m  ./internlm2-base-7b/ggml-model-f16.gguf -n 400  -e -p "Building a website can be done in 10 simple steps:\nStep 1:"
Log start
main: build = 1930 (f8ca46e0)
main: built with gcc (GCC) 10.2.0 for x86_64-pc-linux-gnu
main: seed  = 1706096206
llama_model_loader: loaded meta data with 17 key-value pairs and 227 tensors from ./internlm2-base-7b/ggml-model-f16.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = internlm2
llama_model_loader: - kv   1:                               general.name str              = InternLM
llama_model_loader: - kv   2:                   internlm2.context_length u32              = 32768
llama_model_loader: - kv   3:                      internlm2.block_count u32              = 32
llama_model_loader: - kv   4:                 internlm2.embedding_length u32              = 4096
llama_model_loader: - kv   5:              internlm2.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                   internlm2.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv   7:             internlm2.attention.head_count u32              = 32
llama_model_loader: - kv   8: internlm2.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv   9:          internlm2.attention.head_count_kv u32              = 8
llama_model_loader: - kv  10:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  11:                      tokenizer.ggml.tokens arr[str,92544]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  12:                      tokenizer.ggml.scores arr[f32,92544]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  13:                  tokenizer.ggml.token_type arr[i32,92544]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  14:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  15:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  16:            tokenizer.ggml.padding_token_id u32              = 2
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type  f16:  162 tensors
GGML_ASSERT: llama.cpp:3074: codepoints_from_utf8(word).size() > 0
No symbol table is loaded.  Use the "file" command.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007fadd375c12c in waitpid () from /lib64/libpthread.so.0
No symbol "frame" in current context.

I further checked and found that the token that caused the error was token \u0000 in the InternLM2 vocabulary, which would be converted into string "\u0000" by codepoints_from_utf8, which corresponds to the string terminator in C language, resulting in word size is 0, causing the assertion here to report an error (because I added some debug code, the actual error line number is llama.cpp:3053)

I tried to comment out the assertion at llama.cpp:3053, and the model could be generated normally without any other errors. So I would like to ask about the significance of this assertion. Can we relax the assertion conditions here? If I can't remove the assertion, I'd love some advice on how to get around it, thanks.

I searched and found a dscussion similar to my problem, but I didn't get much information.

Here is my sys & env info.

Collecting environment information...
PyTorch version: 1.13.1+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A

OS: CentOS Linux 7 (Core) (x86_64)
GCC version: (conda-forge gcc 13.1.0-0) 13.1.0
Clang version: Could not collect
CMake version: version 3.26.4
Libc version: glibc-2.17

Python version: 3.10.0 (default, Mar  3 2022, 09:58:08) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-3.10.0-957.el7.x86_64-x86_64-with-glibc2.17
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                256
On-line CPU(s) list:   0-255
Thread(s) per core:    2
Core(s) per socket:    64
Socket(s):             2
NUMA node(s):          8
Vendor ID:             AuthenticAMD
CPU family:            23
Model:                 49
Model name:            AMD EPYC 7H12 64-Core Processor
Stepping:              0
CPU MHz:               2600.000
CPU max MHz:           2600.0000
CPU min MHz:           1500.0000
BogoMIPS:              5199.78
Virtualization:        AMD-V
L1d cache:             32K
L1i cache:             32K
L2 cache:              512K
L3 cache:              16384K
NUMA node0 CPU(s):     0-15,128-143
NUMA node1 CPU(s):     16-31,144-159
NUMA node2 CPU(s):     32-47,160-175
NUMA node3 CPU(s):     48-63,176-191
NUMA node4 CPU(s):     64-79,192-207
NUMA node5 CPU(s):     80-95,208-223
NUMA node6 CPU(s):     96-111,224-239
NUMA node7 CPU(s):     112-127,240-255
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc art rep_good nopl xtopology nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 cpb cat_l3 cdp_l3 hw_pstate sme retpoline_amd ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip overflow_recov succor smca

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

A special token '\u0000' will cause an assert error in 'llm_load_vocab' #5111

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

A special token '\u0000' will cause an assert error in 'llm_load_vocab' #5111

Uh oh!

SolenoidWGT Jan 24, 2024

Replies: 0 comments

SolenoidWGT
Jan 24, 2024