Server Startup Issue

Hi All,

I am trying to run `cargo run --release` and I get the following:
    Finished `release` profile [optimized] target(s) in 0.21s
     Running `target/release/ai00_server`
2024-05-04T04:35:17.007Z INFO  [ai00_server] reading config assets/configs/Config.toml...
2024-05-04T04:35:17.008Z INFO  [ai00_server::middleware] ModelInfo {
    version: V5,
    num_layer: 32,
    num_emb: 2560,
    num_hidden: 8960,
    num_vocab: 65536,
    num_head: 40,
    time_mix_adapter_size: 0,
    time_decay_adapter_size: 0,
}
2024-05-04T04:35:17.008Z INFO  [ai00_server::middleware] type: SafeTensors
2024-05-04T04:35:17.013Z WARN  [wgpu_hal::gles::egl] EGL_MESA_platform_surfaceless not available. Using default platform
2024-05-04T04:35:17.027Z WARN  [wgpu_hal::gles::egl] No config found!
2024-05-04T04:35:17.027Z WARN  [wgpu_hal::gles::egl] No config found!
2024-05-04T04:35:17.048Z INFO  [ai00_server] server started at 0.0.0.0:65530 with tls
2024-05-04T04:35:17.071Z ERROR [ai00_server::middleware] reload model failed: failed to request device


Currently, it is Cpu ins Config.toml. Also, I have Nvidia 22C (24GB) GPU and it gives the same error when I set it to Gpu.
Result of `lsb_release -a`:

```
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.6 LTS
Release:        18.04
Codename:       bionic

```

Config.toml:
```
[model]
embed_device = "Cpu"                                       # Device to put the embed tensor ("Cpu" or "Gpu").
max_batch = 8                                              # The maximum batches that are cached on GPU.
model_name = "RWKV-5-World-3B-v2-20231113-ctx4096.st" # Name of the model.
model_path = "assets/models"                               # Path to the folder containing all models.
quant = 32                                                  # Layers to be quantized.
quant_type = "Int8"                                        # Quantization type ("Int8" or "NF4").
stop = ["\n\n"]                                            # Additional stop words in generation.
token_chunk_size = 128                                     # Size of token chunk that is inferred at once. For high end GPUs, this could be 64 or 128 (faster).

```
Can anyone help? Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Server Startup Issue #110

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Server Startup Issue #110

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions