Releases: openvinotoolkit/openvino.genai
Releases · openvinotoolkit/openvino.genai
2025.1.0.0
What's Changed
- skip failing Chinese prompt on Win by @pavel-esir in #1573
- Bump product version 2025.1 by @akladiev in #1571
- Bump tokenizers submodule by @akladiev in #1575
- [LLM_BENCH] relax md5 checks and allow pass cb config without use_cb by @eaidova in #1570
- [VLM] Add Qwen2VL by @yatarkan in #1553
- Fix links, remind about ABI by @Wovchena in #1585
- Add nightly to instructions similar to requirements by @Wovchena in #1582
- GHA: use nightly from 2025.1.0 by @ilya-lavrenov in #1577
- NPU LLM Pipeline: Switch to STATEFUL by default by @dmatveev in #1561
- Verify not empty rendered chat template by @yatarkan in #1574
- [RTTI] Fix passes rtti definitions by @t-jankowski in #1588
- Test
add_special_tokens
properly by @pavel-esir in #1586 - Add indentation for llm_bench json report dumping by @nikita-savelyevv in #1584
- prioretize config model type under path-based task determination by @eaidova in #1587
- Replace openvino.runtime imports with openvino by @helena-intel in #1579
- Add tests for Whisper static pipeline by @eshiryae in #1250
- CB: removed handle_dropped() misuse by @ilya-lavrenov in #1594
- Bump timm from 1.0.13 to 1.0.14 by @dependabot in #1595
- Update samples readme by @olpipi in #1545
- [ Speculative decoding ][ Prompt lookup ] Enable Perf Metrics for assisting pipelines by @iefode in #1599
- [LLM] [NPU] StaticLLMPipeline: Export blob by @smirnov-alexey in #1601
- [llm_bench] enable prompt permutations for prevent prefix caching and fix vlm image load by @eaidova in #1607
- LLM: use set_output_seq_len instead of WA by @ilya-lavrenov in #1611
- CB: support different number of K and V heads per layer by @ilya-lavrenov in #1610
- LLM: fixed Slice / Gather of last MatMul by @ilya-lavrenov in #1616
- Switch to VS 2022 by @mryzhov in #1598
- Add Phi-3.5-vision-instruct and Phi-3-vision-128k-instruct by @Wovchena in #1609
- Whisper pipeline: apply slice matmul by @as-suvorov in #1623
- GHA: use OV master in mac.yml by @ilya-lavrenov in #1622
- [Image Generation] Image2Image for FLUX by @likholat in #1621
- add missed ignore_eos in generation config by @eaidova in #1625
- Master increase priority for rt info to fix Phi-3.5-vision-instruct and Phi-3-vision-128k-instruct by @Wovchena in #1626
- Correct model name by @wgzintel in #1624
- Token rotation by @vshampor in #987
- Whisper pipeline: use Sampler by @as-suvorov in #1615
- Fix setting eos_token_id with kwarg by @Wovchena in #1629
- Extract cacheopt E2E tests into separate test matrix field by @vshampor in #1630
- [CB] Split token streaming and generation to different threads for all CB based pipelines by @iefode in #1544
- Don't silence a error if a file can't be opened by @Wovchena in #1620
- [CMAKE]: use different version for macOS arm64 by @ilya-lavrenov in #1632
- Test invalid fields assignment raises in GenerationConfig by @Wovchena in #1633
- do_sample=False for NPU in chat_sample, add NPU to README by @helena-intel in #1637
- [JS] Add GenAI Node.js bindings by @vishniakov-nikolai in #1193
- CB: preparation for relying on KV cache precisions from plugins by @ilya-lavrenov in #1634
- [LLM bench]support providing adapter config mode by @eaidova in #1644
- Automatically apply chat template in non-chat scenarios by @sbalandi in #1533
- beam_search_causal_lm.cpp: delete wrong comment by @Wovchena in #1639
- [WWB]: Fixed chat template usage in VLM GenAI pipeline by @AlexKoff88 in #1643
- [WWB]: Fixed nano-Llava preprocessor selection by @AlexKoff88 in #1646
- [WWB]: Added config to preprocessor call in VLMs by @AlexKoff88 in #1638
- CB: remove DeviceConfig class by @ilya-lavrenov in #1640
- [WWB]: Added initialization of nano-llava in case of Transformers model by @AlexKoff88 in #1649
- WWB: simplify code around start_chat / use_template by @ilya-lavrenov in #1650
- Tokenizers update by @ilya-lavrenov in #1653
- DOCS: reorganized support models for image generation by @ilya-lavrenov in #1655
- Fix using lm_bemch/wwb with version w/o apply_chat_template by @sbalandi in #1651
- Fix Qwen2VL generation without images by @yatarkan in #1645
- Parallel sampling with threadpool by @mzegla in #1252
- [Coverity] Enabling coverity scan by @akazakov-github in #1657
- [ CB ] Fix streaming in case of empty outputs by @iefode in #1647
- Allow overriding eos_token_id by @Wovchena in #1654
- CB: remove GenerationHandle:back by @ilya-lavrenov in #1662
- Fix tiny-random-llava-next in VLM Pipeline by @yatarkan in #1660
- [CB] Add KVHeadConfig parameters to PagedAttention's rt_info by @sshlyapn in #1666
- Bump py-build-cmake from 0.3.4 to 0.4.0 by @dependabot in #1668
- pin optimum version by @pavel-esir in #1675
- [LLM] Enabled CB by default by @ilya-lavrenov in #1455
- SAMPLER: fixed hang during destruction of ThreadPool by @ilya-lavrenov in #1681
- CB: use optimized scheduler config for cases when user explicitly asked CB backend by @ilya-lavrenov in #1679
- [CB] Return Block manager asserts to destructors by @iefode in #1569
- phi3_v: allow images, remove unused var by @Wovchena in #1670
- [Image Generation] Inpainting for FLUX by @likholat in #1685
- [WWB]: Added support for SchedulerConfig in LLMPipeline by @AlexKoff88 in #1671
- Add LongBench validation by @l-bat in #1220
- Fix Tokenizer for several added special tokens by @pavel-esir in #1659
- Unpin optimum-intel version by @ilya-lavrenov in #1680
- Image generation: proper error message when encode() is used w/o encoder passed to ctor by @ilya-lavrenov in #1683
- Fix excluding stop str from output for some tokenizer by @sbalandi in #1676
- [VLM] Fix chat template fallback in chat mode with defined system message by @yatarkan in https://github.com/openvinotoolkit/openvino.genai/pull/...
2025.0.0.0
Please check out the latest documentation pages related to the new openvino_genai
package!
2024.6.0.0
Please check out the latest documentation pages related to the new openvino_genai
package!
2024.5.0.0
Please check out the latest documentation pages related to the new openvino_genai
package!
2024.4.1.0
Please check out the latest documentation pages related to the new openvino_genai
package!
What's Changed
- Bump OV version to 2024.4.1 by @akladiev in #894
- Update requirements.txt and add requirements_2024.4.txt by @wgzintel in #893
Full Changelog: 2024.4.0.0...2024.4.1.0
2024.4.0.0
Please check out the latest documentation pages related to the new openvino_genai
package!
What's Changed
- Support chat conversation for StaticLLMPipeline by @TolyaTalamanov in #580
- Prefix caching. by @popovaan in #639
- Allow to build GenAI with OpenVINO via extra modules by @ilya-lavrenov in #726
- Simplified partial preemption algorithm. by @popovaan in #730
- Add set_chat_template by @Wovchena in #734
- Detect KV cache sequence length axis by @as-suvorov in #744
- Enable u8 KV cache precision for CB by @ilya-lavrenov in #759
- Add test case for native pytorch model by @wgzintel in #722
- Prefix caching improvements by @popovaan in #758
- Add USS metric by @wgzintel in #762
- Prefix caching optimization by @popovaan in #785
- Transition to default int4 compression configs from optimum-intel by @nikita-savelyevv in #689
- Control KV-cache size for StaticLLMPipeline by @TolyaTalamanov in #795
- [2024.4] update optimum intel commit to include mxfp4 conversion by @eaidova in #828
- [2024.4] use perf metrics for genai in llm bench by @eaidova in #830
- Update Pybind to version 13 by @mryzhov in #836
- Introduce stop_strings and stop_token_ids sampling params [2024.4 base] by @mzegla in #817
- StaticLLMPipeline: Handle single element list of prompts by @TolyaTalamanov in #848
- Fix Meta-Llama-3.1-8B-Instruct chat template by @pavel-esir in #846
- Add GPU support for continuous batching [2024.4] by @sshlyapn in #858
Full Changelog: 2024.3.0.0...2024.4.0.0
2024.3.0.0
Please check out the latest documentation pages related to the new openvino_genai
package!
2024.2.0.0
Please check out the latest documentation pages related to the new openvino_genai
package!