Skip to content

suggestions for performance improvements from claude #30

@earonesty

Description

@earonesty

app.py

get_model_size() does a lot of regex matching on every request. This could be optimized by pre-compiling the regexes and/or caching the size lookups.
augment_reply() is called multiple times per stream. Can refactor to do the common logic once.
check_bill_usage()makes an HTTP request to bill on every completion.

  • check_bill_usage is already asynchronous
  • Exception handling could be improved
  • Batching billing calls could further optimize
  • Reusing an HTTPX client avoids overhead

alt_models() does repetitive regex searches. Can build a map once at startup.
Redundant get_reg_mgr() calls - can store it locally instead.
worker_stats() regenerates stats each time - cache periodically.
Lots of log.debug calls, which can be slowed down by IO. Use selectively.

stats.py

StatsContainer recalculates stats on every call. Can cache results, rebuild on update, or lazy rebuild as needed.
pick_best() sorts every time it is called.

  • Store perf data separately from heap, to allow fast lookup in update step.
  • For tied perf, use gpu info as tie-breaker for deterministic results.

StatsStore` separate thread may contend - profile to see if lock contention.
Stats calculations under contention can use lock-free approaches (e.g. atomic counters).

General

Use a profiler to identify any other hot spots missed above.
Benchmark different approaches to quantify gains.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions