Skip to content

Commit 59ab382

Browse files
committed
Updated Pkg. Version
1 parent dd6bbec commit 59ab382

File tree

3 files changed

+84
-2
lines changed

3 files changed

+84
-2
lines changed

CHANGELOG.md

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,88 @@
22

33
All notable changes to LocalLab will be documented in this file.
44

5+
## [0.8.0] - 2025-07-04
6+
7+
### 🎉 Major Release - Comprehensive Model Loading Fixes
8+
9+
This release addresses critical issues that were preventing text generation LLMs from working properly with LocalLab, particularly the Qwen2.5-VL model and disk offloading errors.
10+
11+
### Fixed
12+
13+
#### 🔧 Critical Disk Offloading Issue
14+
- **Fixed "You are trying to offload the whole model to the disk" error** that was preventing all text generation LLMs from loading
15+
- Implemented intelligent device mapping strategy that prevents disk offloading:
16+
- **GPU Memory Detection**: Automatically checks available GPU memory before device placement
17+
- **Safe Device Selection**: Uses specific device assignments (`cuda:0` or `cpu`) instead of problematic `device_map: "auto"`
18+
- **CPU Fallback Logic**: Automatically uses CPU when GPU memory is insufficient (<4GB)
19+
- **Error Recovery**: Detects disk offloading errors and automatically retries with CPU-only configuration
20+
21+
#### 🔧 Qwen2.5-VL Model Loading
22+
- **Fixed Qwen2.5-VL model loading errors** with proper model class detection
23+
- Added comprehensive fallback logic for different model types:
24+
- `AutoModelForCausalLM``AutoModel``Qwen2_5_VLForConditionalGeneration``AutoModelForVision2Seq`
25+
- Enhanced processor/tokenizer loading with smart detection:
26+
- **Vision-Language Models**: Use `AutoProcessor` for models with "vl", "vision", or "qwen2.5-vl" in name
27+
- **Text-Only Models**: Use `AutoTokenizer` for all other models
28+
29+
#### 🔧 Server Stability Issues
30+
- **Fixed repeated startup callbacks** that were spamming logs every 30 seconds
31+
- Added completion flag to prevent callback loops during server initialization
32+
- Enhanced server startup process for cleaner, one-time initialization
33+
34+
#### 🔧 Enhanced Error Recovery
35+
- **CPU Retry Logic**: When GPU loading fails with disk offloading errors, automatically retry with CPU-only configuration
36+
- **Comprehensive Error Detection**: Intelligently detects various error patterns and triggers appropriate fallbacks
37+
- **Memory-Aware Loading**: Considers available system resources when selecting device mapping strategy
38+
39+
### Added
40+
41+
#### 🚀 Smart Device Management
42+
- **New `_get_safe_device_map()` method** for intelligent device selection
43+
- **GPU Memory Inspection**: Checks GPU memory capacity before attempting GPU loading
44+
- **Adaptive Configuration**: Automatically adjusts quantization settings based on available hardware
45+
- **Multi-Level Fallbacks**: Multiple fallback strategies ensure models load successfully
46+
47+
#### 🚀 Enhanced Model Support
48+
- **Universal Text Generation Support**: All text generation LLMs now work properly with the package
49+
- **Vision-Language Model Support**: Proper handling of multimodal models like Qwen2.5-VL
50+
- **Cross-Platform Compatibility**: Works reliably across different hardware configurations
51+
52+
### Changed
53+
54+
#### ⚡ Improved Model Loading Process
55+
- **Updated quantization configuration** to use safe device mapping across all scenarios
56+
- **Enhanced model class detection** with comprehensive fallback chains
57+
- **Optimized memory usage** with intelligent device selection
58+
- **Better error messages** with clear guidance for troubleshooting
59+
60+
#### ⚡ Dependencies
61+
- **Updated transformers requirement** to `>=4.49.0` (minimum version for Qwen2.5-VL support)
62+
- Ensured compatibility with latest Hugging Face ecosystem
63+
64+
### Technical Details
65+
66+
#### Files Modified
67+
- `locallab/model_manager.py`: Core model loading logic with device mapping and error recovery
68+
- `locallab/server.py`: Fixed startup callback loop
69+
- `requirements.txt` & `setup.py`: Updated transformers version
70+
71+
#### Key Improvements
72+
- **Device Mapping Strategy**: Prevents disk offloading by using specific device assignments
73+
- **Error Recovery Mechanisms**: Multiple fallback strategies ensure successful model loading
74+
- **Memory Management**: Intelligent resource allocation based on available hardware
75+
- **Cross-Model Compatibility**: Universal support for text generation and vision-language models
76+
77+
### Impact
78+
79+
This release ensures that **ALL text generation LLMs work properly** with LocalLab:
80+
- ✅ Qwen2.5-VL models load successfully
81+
- ✅ Large models don't fail with disk offloading errors
82+
- ✅ GPU models use GPU when memory is sufficient
83+
- ✅ CPU fallback works seamlessly when needed
84+
- ✅ Server startup is clean and stable
85+
- ✅ Universal compatibility across different model types
86+
587
## [0.7.2] - 2025-05-19
688

789
### Fixed

locallab/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
# This ensures Hugging Face's progress bars are displayed correctly
77
from .utils.early_config import configure_hf_logging
88

9-
__version__ = "0.7.2" # Fixed max_time parameter handling in generation endpoints
9+
__version__ = "0.8.0" # Major fixes for model loading, device mapping, and disk offloading issues
1010

1111
# Only import what's necessary initially, lazy-load the rest
1212
from .logger import get_logger

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@
4747

4848
setup(
4949
name="locallab",
50-
version="0.7.2",
50+
version="0.8.0",
5151
packages=find_packages(include=["locallab", "locallab.*"]),
5252
install_requires=install_requires,
5353
extras_require={

0 commit comments

Comments
 (0)