|
2 | 2 |
|
3 | 3 | All notable changes to LocalLab will be documented in this file.
|
4 | 4 |
|
| 5 | +## [0.8.0] - 2025-07-04 |
| 6 | + |
| 7 | +### 🎉 Major Release - Comprehensive Model Loading Fixes |
| 8 | + |
| 9 | +This release addresses critical issues that were preventing text generation LLMs from working properly with LocalLab, particularly the Qwen2.5-VL model and disk offloading errors. |
| 10 | + |
| 11 | +### Fixed |
| 12 | + |
| 13 | +#### 🔧 Critical Disk Offloading Issue |
| 14 | +- **Fixed "You are trying to offload the whole model to the disk" error** that was preventing all text generation LLMs from loading |
| 15 | +- Implemented intelligent device mapping strategy that prevents disk offloading: |
| 16 | + - **GPU Memory Detection**: Automatically checks available GPU memory before device placement |
| 17 | + - **Safe Device Selection**: Uses specific device assignments (`cuda:0` or `cpu`) instead of problematic `device_map: "auto"` |
| 18 | + - **CPU Fallback Logic**: Automatically uses CPU when GPU memory is insufficient (<4GB) |
| 19 | + - **Error Recovery**: Detects disk offloading errors and automatically retries with CPU-only configuration |
| 20 | + |
| 21 | +#### 🔧 Qwen2.5-VL Model Loading |
| 22 | +- **Fixed Qwen2.5-VL model loading errors** with proper model class detection |
| 23 | +- Added comprehensive fallback logic for different model types: |
| 24 | + - `AutoModelForCausalLM` → `AutoModel` → `Qwen2_5_VLForConditionalGeneration` → `AutoModelForVision2Seq` |
| 25 | +- Enhanced processor/tokenizer loading with smart detection: |
| 26 | + - **Vision-Language Models**: Use `AutoProcessor` for models with "vl", "vision", or "qwen2.5-vl" in name |
| 27 | + - **Text-Only Models**: Use `AutoTokenizer` for all other models |
| 28 | + |
| 29 | +#### 🔧 Server Stability Issues |
| 30 | +- **Fixed repeated startup callbacks** that were spamming logs every 30 seconds |
| 31 | +- Added completion flag to prevent callback loops during server initialization |
| 32 | +- Enhanced server startup process for cleaner, one-time initialization |
| 33 | + |
| 34 | +#### 🔧 Enhanced Error Recovery |
| 35 | +- **CPU Retry Logic**: When GPU loading fails with disk offloading errors, automatically retry with CPU-only configuration |
| 36 | +- **Comprehensive Error Detection**: Intelligently detects various error patterns and triggers appropriate fallbacks |
| 37 | +- **Memory-Aware Loading**: Considers available system resources when selecting device mapping strategy |
| 38 | + |
| 39 | +### Added |
| 40 | + |
| 41 | +#### 🚀 Smart Device Management |
| 42 | +- **New `_get_safe_device_map()` method** for intelligent device selection |
| 43 | +- **GPU Memory Inspection**: Checks GPU memory capacity before attempting GPU loading |
| 44 | +- **Adaptive Configuration**: Automatically adjusts quantization settings based on available hardware |
| 45 | +- **Multi-Level Fallbacks**: Multiple fallback strategies ensure models load successfully |
| 46 | + |
| 47 | +#### 🚀 Enhanced Model Support |
| 48 | +- **Universal Text Generation Support**: All text generation LLMs now work properly with the package |
| 49 | +- **Vision-Language Model Support**: Proper handling of multimodal models like Qwen2.5-VL |
| 50 | +- **Cross-Platform Compatibility**: Works reliably across different hardware configurations |
| 51 | + |
| 52 | +### Changed |
| 53 | + |
| 54 | +#### ⚡ Improved Model Loading Process |
| 55 | +- **Updated quantization configuration** to use safe device mapping across all scenarios |
| 56 | +- **Enhanced model class detection** with comprehensive fallback chains |
| 57 | +- **Optimized memory usage** with intelligent device selection |
| 58 | +- **Better error messages** with clear guidance for troubleshooting |
| 59 | + |
| 60 | +#### ⚡ Dependencies |
| 61 | +- **Updated transformers requirement** to `>=4.49.0` (minimum version for Qwen2.5-VL support) |
| 62 | +- Ensured compatibility with latest Hugging Face ecosystem |
| 63 | + |
| 64 | +### Technical Details |
| 65 | + |
| 66 | +#### Files Modified |
| 67 | +- `locallab/model_manager.py`: Core model loading logic with device mapping and error recovery |
| 68 | +- `locallab/server.py`: Fixed startup callback loop |
| 69 | +- `requirements.txt` & `setup.py`: Updated transformers version |
| 70 | + |
| 71 | +#### Key Improvements |
| 72 | +- **Device Mapping Strategy**: Prevents disk offloading by using specific device assignments |
| 73 | +- **Error Recovery Mechanisms**: Multiple fallback strategies ensure successful model loading |
| 74 | +- **Memory Management**: Intelligent resource allocation based on available hardware |
| 75 | +- **Cross-Model Compatibility**: Universal support for text generation and vision-language models |
| 76 | + |
| 77 | +### Impact |
| 78 | + |
| 79 | +This release ensures that **ALL text generation LLMs work properly** with LocalLab: |
| 80 | +- ✅ Qwen2.5-VL models load successfully |
| 81 | +- ✅ Large models don't fail with disk offloading errors |
| 82 | +- ✅ GPU models use GPU when memory is sufficient |
| 83 | +- ✅ CPU fallback works seamlessly when needed |
| 84 | +- ✅ Server startup is clean and stable |
| 85 | +- ✅ Universal compatibility across different model types |
| 86 | + |
5 | 87 | ## [0.7.2] - 2025-05-19
|
6 | 88 |
|
7 | 89 | ### Fixed
|
|
0 commit comments