Skip to content

Server Startup Issue #110

@ItsCRC

Description

@ItsCRC

Hi All,

I am trying to run cargo run --release and I get the following:
Finished release profile [optimized] target(s) in 0.21s
Running target/release/ai00_server
2024-05-04T04:35:17.007Z INFO [ai00_server] reading config assets/configs/Config.toml...
2024-05-04T04:35:17.008Z INFO [ai00_server::middleware] ModelInfo {
version: V5,
num_layer: 32,
num_emb: 2560,
num_hidden: 8960,
num_vocab: 65536,
num_head: 40,
time_mix_adapter_size: 0,
time_decay_adapter_size: 0,
}
2024-05-04T04:35:17.008Z INFO [ai00_server::middleware] type: SafeTensors
2024-05-04T04:35:17.013Z WARN [wgpu_hal::gles::egl] EGL_MESA_platform_surfaceless not available. Using default platform
2024-05-04T04:35:17.027Z WARN [wgpu_hal::gles::egl] No config found!
2024-05-04T04:35:17.027Z WARN [wgpu_hal::gles::egl] No config found!
2024-05-04T04:35:17.048Z INFO [ai00_server] server started at 0.0.0.0:65530 with tls
2024-05-04T04:35:17.071Z ERROR [ai00_server::middleware] reload model failed: failed to request device

Currently, it is Cpu ins Config.toml. Also, I have Nvidia 22C (24GB) GPU and it gives the same error when I set it to Gpu.
Result of lsb_release -a:

No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.6 LTS
Release:        18.04
Codename:       bionic

Config.toml:

[model]
embed_device = "Cpu"                                       # Device to put the embed tensor ("Cpu" or "Gpu").
max_batch = 8                                              # The maximum batches that are cached on GPU.
model_name = "RWKV-5-World-3B-v2-20231113-ctx4096.st" # Name of the model.
model_path = "assets/models"                               # Path to the folder containing all models.
quant = 32                                                  # Layers to be quantized.
quant_type = "Int8"                                        # Quantization type ("Int8" or "NF4").
stop = ["\n\n"]                                            # Additional stop words in generation.
token_chunk_size = 128                                     # Size of token chunk that is inferred at once. For high end GPUs, this could be 64 or 128 (faster).

Can anyone help? Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions