-
Notifications
You must be signed in to change notification settings - Fork 66
Description
Hi All,
I am trying to run cargo run --release
and I get the following:
Finished release
profile [optimized] target(s) in 0.21s
Running target/release/ai00_server
2024-05-04T04:35:17.007Z INFO [ai00_server] reading config assets/configs/Config.toml...
2024-05-04T04:35:17.008Z INFO [ai00_server::middleware] ModelInfo {
version: V5,
num_layer: 32,
num_emb: 2560,
num_hidden: 8960,
num_vocab: 65536,
num_head: 40,
time_mix_adapter_size: 0,
time_decay_adapter_size: 0,
}
2024-05-04T04:35:17.008Z INFO [ai00_server::middleware] type: SafeTensors
2024-05-04T04:35:17.013Z WARN [wgpu_hal::gles::egl] EGL_MESA_platform_surfaceless not available. Using default platform
2024-05-04T04:35:17.027Z WARN [wgpu_hal::gles::egl] No config found!
2024-05-04T04:35:17.027Z WARN [wgpu_hal::gles::egl] No config found!
2024-05-04T04:35:17.048Z INFO [ai00_server] server started at 0.0.0.0:65530 with tls
2024-05-04T04:35:17.071Z ERROR [ai00_server::middleware] reload model failed: failed to request device
Currently, it is Cpu ins Config.toml. Also, I have Nvidia 22C (24GB) GPU and it gives the same error when I set it to Gpu.
Result of lsb_release -a
:
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.6 LTS
Release: 18.04
Codename: bionic
Config.toml:
[model]
embed_device = "Cpu" # Device to put the embed tensor ("Cpu" or "Gpu").
max_batch = 8 # The maximum batches that are cached on GPU.
model_name = "RWKV-5-World-3B-v2-20231113-ctx4096.st" # Name of the model.
model_path = "assets/models" # Path to the folder containing all models.
quant = 32 # Layers to be quantized.
quant_type = "Int8" # Quantization type ("Int8" or "NF4").
stop = ["\n\n"] # Additional stop words in generation.
token_chunk_size = 128 # Size of token chunk that is inferred at once. For high end GPUs, this could be 64 or 128 (faster).
Can anyone help? Thanks