Skip to content

白盒蒸馏的时候returncode=0 #4

@cathryn0131

Description

@cathryn0131

easydistill --config /easydistill/configs/kd_white_box.json
2025-06-10 20:03:26,249 - INFO - Running command: python /easydistill/easydistill/kd/infer.py --config /easydistill/configs/kd_white_box.json
2025-06-10 20:03:36,784 - INFO - 2025-06-10 20:03:36.784495: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2025-06-10 20:03:36,784 - ERROR - Detected error in output: 2025-06-10 20:03:36.784495: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2025-06-10 20:03:36,806 - INFO - 2025-06-10 20:03:36.806417: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-06-10 20:03:36,833 - INFO - WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
2025-06-10 20:03:36,833 - INFO - E0000 00:00:1749557016.833063 814 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-06-10 20:03:36,841 - INFO - E0000 00:00:1749557016.841735 814 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-06-10 20:03:36,869 - INFO - 2025-06-10 20:03:36.869151: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
2025-06-10 20:03:36,869 - INFO - To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-06-10 20:03:41,562 - INFO - INFO 06-10 20:03:41 [init.py:239] Automatically detected platform cuda.
2025-06-10 20:03:44,964 - INFO - 2025-06-10 20:03:44,964 - INFO - Generating distillation data from the teacher model!
2025-06-10 20:03:44,968 - INFO - 2025-06-10 20:03:44,968 - INFO - Loading ckpt and tokenizer: /mnt/bn/summary/model_hub/qwen/Qwen2-5-3B-Instruct/
2025-06-10 20:03:45,447 - INFO - 2025-06-10 20:03:45,447 - INFO - Initial eos_token_id 151645 from tokenizer
2025-06-10 20:03:45,448 - INFO - 2025-06-10 20:03:45,448 - INFO - tokenizer's eos_token: <|im_end|>, pad_token: <|im_end|>
2025-06-10 20:03:45,448 - INFO - 2025-06-10 20:03:45,448 - INFO - tokenizer's eos_token_id: 151645, pad_token_id: 151645
2025-06-10 20:04:06,308 - INFO - INFO 06-10 20:04:06 [config.py:689] This model supports multiple tasks: {'embed', 'score', 'generate', 'classify', 'reward'}. Defaulting to 'generate'.
2025-06-10 20:04:06,310 - INFO - INFO 06-10 20:04:06 [config.py:1901] Chunked prefill is enabled with max_num_batched_tokens=8192.
2025-06-10 20:04:08,608 - INFO - INFO 06-10 20:04:08 [core.py:61] Initializing a V1 LLM engine (v0.8.4) with config: model='/mnt/bn/summary/model_hub/qwen/Qwen2-5-3B-Instruct/', speculative_config=None, tokenizer='/mnt/bn/summary/model_hub/qwen/Qwen2-5-3B-Instruct/', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=4096, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='auto', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=None, served_model_name=/mnt/bn/summary/model_hub/qwen/Qwen2-5-3B-Instruct/, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"level":3,"custom_ops":["none"],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output"],"use_inductor":true,"compile_sizes":[],"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":512}
2025-06-10 20:04:09,602 - INFO - WARNING 06-10 20:04:09 [utils.py:2444] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0x7ff9479c48b0>
2025-06-10 20:04:11,176 - INFO - INFO 06-10 20:04:11 [parallel_state.py:959] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0
2025-06-10 20:04:11,177 - INFO - INFO 06-10 20:04:11 [cuda.py:221] Using Flash Attention backend on V1 engine.
2025-06-10 20:04:11,220 - INFO - INFO 06-10 20:04:11 [gpu_model_runner.py:1276] Starting to load model /mnt/bn/summary/model_hub/qwen/Qwen2-5-3B-Instruct/...
2025-06-10 20:04:11,674 - INFO - WARNING 06-10 20:04:11 [topk_topp_sampler.py:69] FlashInfer is not available. Falling back to the PyTorch-native implementation of top-p & top-k sampling. For the best performance, please install FlashInfer.
2025-06-10 20:04:11,683 - INFO -
2025-06-10 20:04:11,683 - INFO - Loading safetensors checkpoint shards: 0% Completed | 0/2 [00:00<?, ?it/s]
2025-06-10 20:04:12,598 - INFO -
2025-06-10 20:04:12,598 - INFO - Loading safetensors checkpoint shards: 50% Completed | 1/2 [00:00<00:00, 1.09it/s]
2025-06-10 20:04:13,325 - INFO -
2025-06-10 20:04:13,325 - INFO - Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:01<00:00, 1.24it/s]
2025-06-10 20:04:13,326 - INFO -
2025-06-10 20:04:13,326 - INFO - Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:01<00:00, 1.22it/s]
2025-06-10 20:04:13,326 - INFO -
2025-06-10 20:04:13,426 - INFO - INFO 06-10 20:04:13 [loader.py:458] Loading weights took 1.74 seconds
2025-06-10 20:04:14,005 - INFO - INFO 06-10 20:04:14 [gpu_model_runner.py:1291] Model loading took 5.7916 GiB and 2.211553 seconds
2025-06-10 20:04:27,097 - INFO - INFO 06-10 20:04:27 [backends.py:416] Using cache directory: /root/.cache/vllm/torch_compile_cache/20e2d2865e/rank_0_0 for vLLM's torch.compile
2025-06-10 20:04:27,100 - INFO - INFO 06-10 20:04:27 [backends.py:426] Dynamo bytecode transform time: 13.09 s
2025-06-10 20:04:28,228 - INFO - INFO 06-10 20:04:28 [backends.py:115] Directly load the compiled graph for shape None from the cache
2025-06-10 20:04:45,039 - INFO - INFO 06-10 20:04:45 [monitor.py:33] torch.compile takes 13.09 s in total
2025-06-10 20:04:46,273 - INFO - INFO 06-10 20:04:46 [kv_cache_utils.py:634] GPU KV cache size: 1,847,824 tokens
2025-06-10 20:04:46,274 - INFO - INFO 06-10 20:04:46 [kv_cache_utils.py:637] Maximum concurrency for 4,096 tokens per request: 451.13x
2025-06-10 20:05:29,238 - INFO - INFO 06-10 20:05:29 [gpu_model_runner.py:1626] Graph capturing finished in 43 secs, took 1.81 GiB
2025-06-10 20:05:29,251 - INFO - INFO 06-10 20:05:29 [core.py:163] init engine (profile, create kv cache, warmup model) took 75.25 seconds
2025-06-10 20:05:29,417 - INFO - INFO 06-10 20:05:29 [core_client.py:435] Core engine process 0 ready.
2025-06-10 20:05:29,418 - INFO - 2025-06-10 20:05:29,418 - INFO - vLLM model loaded successfully
2025-06-10 20:05:29,424 - INFO -
2025-06-10 20:05:29,425 - INFO - Generating responses: 0it [00:00, ?it/s]
2025-06-10 20:05:29,425 - INFO - Generating responses: 0it [00:00, ?it/s]
2025-06-10 20:05:32,717 - ERROR - Command failed (returncode=0, errors detected)
2025-06-10 20:05:32,718 - ERROR - Infer failed, skipping training

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions