Release 2025.1.0.0 · openvinotoolkit/openvino.genai

What's Changed

skip failing Chinese prompt on Win by @pavel-esir in #1573
Bump product version 2025.1 by @akladiev in #1571
Bump tokenizers submodule by @akladiev in #1575
[LLM_BENCH] relax md5 checks and allow pass cb config without use_cb by @eaidova in #1570
[VLM] Add Qwen2VL by @yatarkan in #1553
Fix links, remind about ABI by @Wovchena in #1585
Add nightly to instructions similar to requirements by @Wovchena in #1582
GHA: use nightly from 2025.1.0 by @ilya-lavrenov in #1577
NPU LLM Pipeline: Switch to STATEFUL by default by @dmatveev in #1561
Verify not empty rendered chat template by @yatarkan in #1574
[RTTI] Fix passes rtti definitions by @t-jankowski in #1588
Test add_special_tokens properly by @pavel-esir in #1586
Add indentation for llm_bench json report dumping by @nikita-savelyevv in #1584
prioretize config model type under path-based task determination by @eaidova in #1587
Replace openvino.runtime imports with openvino by @helena-intel in #1579
Add tests for Whisper static pipeline by @eshiryae in #1250
CB: removed handle_dropped() misuse by @ilya-lavrenov in #1594
Bump timm from 1.0.13 to 1.0.14 by @dependabot in #1595
Update samples readme by @olpipi in #1545
[ Speculative decoding ][ Prompt lookup ] Enable Perf Metrics for assisting pipelines by @iefode in #1599
[LLM] [NPU] StaticLLMPipeline: Export blob by @smirnov-alexey in #1601
[llm_bench] enable prompt permutations for prevent prefix caching and fix vlm image load by @eaidova in #1607
LLM: use set_output_seq_len instead of WA by @ilya-lavrenov in #1611
CB: support different number of K and V heads per layer by @ilya-lavrenov in #1610
LLM: fixed Slice / Gather of last MatMul by @ilya-lavrenov in #1616
Switch to VS 2022 by @mryzhov in #1598
Add Phi-3.5-vision-instruct and Phi-3-vision-128k-instruct by @Wovchena in #1609
Whisper pipeline: apply slice matmul by @as-suvorov in #1623
GHA: use OV master in mac.yml by @ilya-lavrenov in #1622
[Image Generation] Image2Image for FLUX by @likholat in #1621
add missed ignore_eos in generation config by @eaidova in #1625
Master increase priority for rt info to fix Phi-3.5-vision-instruct and Phi-3-vision-128k-instruct by @Wovchena in #1626
Correct model name by @wgzintel in #1624
Token rotation by @vshampor in #987
Whisper pipeline: use Sampler by @as-suvorov in #1615
Fix setting eos_token_id with kwarg by @Wovchena in #1629
Extract cacheopt E2E tests into separate test matrix field by @vshampor in #1630
[CB] Split token streaming and generation to different threads for all CB based pipelines by @iefode in #1544
Don't silence a error if a file can't be opened by @Wovchena in #1620
[CMAKE]: use different version for macOS arm64 by @ilya-lavrenov in #1632
Test invalid fields assignment raises in GenerationConfig by @Wovchena in #1633
do_sample=False for NPU in chat_sample, add NPU to README by @helena-intel in #1637
[JS] Add GenAI Node.js bindings by @vishniakov-nikolai in #1193
CB: preparation for relying on KV cache precisions from plugins by @ilya-lavrenov in #1634
[LLM bench]support providing adapter config mode by @eaidova in #1644
Automatically apply chat template in non-chat scenarios by @sbalandi in #1533
beam_search_causal_lm.cpp: delete wrong comment by @Wovchena in #1639
[WWB]: Fixed chat template usage in VLM GenAI pipeline by @AlexKoff88 in #1643
[WWB]: Fixed nano-Llava preprocessor selection by @AlexKoff88 in #1646
[WWB]: Added config to preprocessor call in VLMs by @AlexKoff88 in #1638
CB: remove DeviceConfig class by @ilya-lavrenov in #1640
[WWB]: Added initialization of nano-llava in case of Transformers model by @AlexKoff88 in #1649
WWB: simplify code around start_chat / use_template by @ilya-lavrenov in #1650
Tokenizers update by @ilya-lavrenov in #1653
DOCS: reorganized support models for image generation by @ilya-lavrenov in #1655
Fix using lm_bemch/wwb with version w/o apply_chat_template by @sbalandi in #1651
Fix Qwen2VL generation without images by @yatarkan in #1645
Parallel sampling with threadpool by @mzegla in #1252
[Coverity] Enabling coverity scan by @akazakov-github in #1657
[ CB ] Fix streaming in case of empty outputs by @iefode in #1647
Allow overriding eos_token_id by @Wovchena in #1654
CB: remove GenerationHandle:back by @ilya-lavrenov in #1662
Fix tiny-random-llava-next in VLM Pipeline by @yatarkan in #1660
[CB] Add KVHeadConfig parameters to PagedAttention's rt_info by @sshlyapn in #1666
Bump py-build-cmake from 0.3.4 to 0.4.0 by @dependabot in #1668
pin optimum version by @pavel-esir in #1675
[LLM] Enabled CB by default by @ilya-lavrenov in #1455
SAMPLER: fixed hang during destruction of ThreadPool by @ilya-lavrenov in #1681
CB: use optimized scheduler config for cases when user explicitly asked CB backend by @ilya-lavrenov in #1679
[CB] Return Block manager asserts to destructors by @iefode in #1569
phi3_v: allow images, remove unused var by @Wovchena in #1670
[Image Generation] Inpainting for FLUX by @likholat in #1685
[WWB]: Added support for SchedulerConfig in LLMPipeline by @AlexKoff88 in #1671
Add LongBench validation by @l-bat in #1220
Fix Tokenizer for several added special tokens by @pavel-esir in #1659
Unpin optimum-intel version by @ilya-lavrenov in #1680
Image generation: proper error message when encode() is used w/o encoder passed to ctor by @ilya-lavrenov in #1683
Fix excluding stop str from output for some tokenizer by @sbalandi in #1676
[VLM] Fix chat template fallback in chat mode with defined system message by @yatarkan in #1674
Set stop token ids from default generation config by @yatarkan in #1612
[Tokenizers] add max_lengh parametrisation to encode by @pavel-esir in #1518
WWB: run FLUX inpainting test by @ilya-lavrenov in #1692
Whisper pipeline: parallel streaming with async/wait by @as-suvorov in #1687
GHA: fixed build on Windows by @ilya-lavrenov in #1694
Fix Falcon validation by @AlexKoff88 in #1664
Add a choice of how to end streaming from callback: STOP or CANCEL by @sbalandi in #1476
Bump py-build-cmake from 0.4.0 to 0.4.1 by @dependabot in #1699
Bump einops from 0.8.0 to 0.8.1 in /samples by @dependabot in #1696
Fixed warnings by @ilya-lavrenov in #1695
Add egg package name to avoid issues after pip freeze by @wkobielx in #1703
fix llm bench config based model search for gptj by @eaidova in #1701
Fixed typo by @ilya-lavrenov in #1708
Added GIL release for time consuming methods. by @popovaan in #1673
fix analyze_model return value by @eaidova in #1711
Fix error after second start_chat() for StatefulLLMPipeline by @sbalandi in #1684
[ Test ][ PR1 ] Splitting Common.py by @iefode in #1691
[CPU] Remove memset WA for PagedAttention by @luo-cheng2021 in #1678
Update to the latest tokenizers with StringPack/Unpack from opset by @pavel-esir in #1562
GHA: update workflows by @ilya-lavrenov in #1720
[ Test ][ PR2 ] Splitting Common.py by @iefode in #1702
[Sampler] Fix stop strings offset for speculative decoding by @iefode in #1719
Omit --trust-remote-code from export command for whisper in the README by @nikita-savelyevv in #1726
tokenizer: read simplified_chat_template by @Wovchena in #1712
CB pipelines: use threaded streamer by @as-suvorov in #1690
Remove core_tokenizers unused code by @mryzhov in #1710
Set default permision for job_vlm by @jszczepa in #1721
[GHA] Win pipeline refactoring by @mryzhov in #1714
Enable test_load_special_tokens_from_tokenizer_config_json() by @Wovchena in #1729
[llm bench] fix prompt file parcing for vlm by @eaidova in #1727
CMAKE: use object librry between shared OpenVINO GenAI and tests by @ilya-lavrenov in #1705
Add more samples to ci check by @olpipi in #1724
docs: 2024-2025 by @Wovchena in #1733
Make TextStreamer public & add unit-tests by @pavel-esir in #1700
[GHA] macOS pipeline refactoring by @mryzhov in #1731
Simplify installation verification by @Wovchena in #1739
Update for LoRA Adapters: Derived adapters and support for FLUX (#1602) (master) by @slyalin in #1652
Threaded streamer: add tests by @as-suvorov in #1715
Revert "Threaded streamer: add tests" by @akladiev in #1741
[llm bench] improve catch unicode encoding errors during generated print by @eaidova in #1736
tokenizer: don't store CompiledModel by @Wovchena in #1740
Set English language by default for all the LLM models by @AlexKoff88 in #1686
PYTHON: remove py::object as ov::Any by @ilya-lavrenov in #1745
[LLM] [NPU] StaticLLMPipeline: support weightless caching by @smirnov-alexey in #1635
[WWB]: Fixed internvl inference with Transformers lib by @AlexKoff88 in #1749
do not convert tokenizer on the fly in llm bench by @eaidova in #1752
TESTS: skip test_perf_metrics by @ilya-lavrenov in #1754
[GHA][WIN] Use self-hosted runners by @mryzhov in #1732
VLM: updated chat template mapping for llava-next by @ilya-lavrenov in #1756
Add flag to use full history on each generation in chat mode by @sbalandi in #1750
Align streamers output by @Wovchena in #1759
Bump py-build-cmake from 0.4.1 to 0.4.2 by @dependabot in #1757
[JS] Add a dependency on the openvino-node package by @Retribution98 in #1667
Revert "VLM: updated chat template mapping for llava-next" by @ilya-lavrenov in #1760
Add benchmark_genai python sample to precommit by @olpipi in #1747
Clean up Static LLM Pipeline by @TolyaTalamanov in #1748
[LLM bench]: remove convert.py and split requirements by @eaidova in #1734
add performance statistics for image generation by @xufang-lisa in #1405
get_lm_encoded_results: use remote tensor by @Wovchena in #1669
[JS] Setup genai nodejs bindings compilation for windows by @Retribution98 in #1697
phi3_v: apply chat template by default by @Wovchena in #1762
[TESTS] Retry model downloading and conversion by @mryzhov in #1758
Whisper samples: align timestamps precision by @as-suvorov in #1770
Simplify python read_image() by @Wovchena in #1763
typo fix and message improvement by @isanghao in #1773
Update README.md to update OV2025.0 GenAI Whisper NPU requirement by @luke-lin-vmc in #1772
[Tokenizer] Fix max_length, pad_to_max_length for models with 2 RaggedToDense ops by @pavel-esir in #1764
Speculative decoding fix by @dkalinowski in #1767
SD3 Img2Img and Inpainting by @likholat in #1737
Fix "images" parameter in VLM to allow single image. by @popovaan in #1761
[ Test ][ PR3 ] Splitting Common.py by @iefode in #1718
Whisper pipeline: use remote tensor for encoder->decoder by @as-suvorov in #1723
VLMPipeline: remove duplicate reset by @Wovchena in #1776
Fix speculative decoding internal metrics by @sammysun0711 in #1771
[GHA] Samples tests refactoring by @mryzhov in #1661
Parsing model names of deepseek and flan-t5-xxl by @wgzintel in #1735
GHA: pin OpenVINO commit by @ilya-lavrenov in #1781
[ Test ][ PR4 ] Splitting & Refactoring Common.py by @iefode in #1722
Move tokenized/templated history difference handling to KVCacheState by @sbalandi in #1716
[JS] Preparing the JS package for preview release by @Retribution98 in #1775
Bump timm from 1.0.14 to 1.0.15 in /samples by @dependabot in #1787
Deprecated API usage by @ilya-lavrenov in #1785
VLM: more informative error message by @ilya-lavrenov in #1786
reduce sleep during memcomp measurement, handle unicode in input by @eaidova in #1792
Update SUPPORTED_MODELS.md by @Huanli-Gong in #1784
[JS] Basic configuration of typescrtipt for nodejs package by @Retribution98 in #1790
[GHA] replaced cpp-prompt_lookup_decoding_lm-ubuntu by @mryzhov in #1795
[GHA] Replaced cpp-beam_search_causal_lm-ubuntu by @mryzhov in #1793
StreamerBase: add write tokens vector by @as-suvorov in #1769
[GHA][MAC] Samples tests by @mryzhov in #1782
Add py bindings for encrypted models and sample by @olpipi in #1751
Bump pybind11-stubgen from 2.5.1 to 2.5.3 by @dependabot in #1801
Continuous Batching in VLM [Draft] by @popovaan in #1704
Fixed path to the Supported Models Section by @Huanli-Gong in #1804
[GHA] Replaced cpp-speculative_decoding_lm-ubuntu by @mryzhov in #1794
Streaming: use write with vector by @ilya-lavrenov in #1807
[GHA][WIN] Samples tests by @mryzhov in #1780
VLM: create per-model folder with implementation details by @ilya-lavrenov in #1803
Samples build instructions by @DimaPastushenkov in #1604
StatefulLLMPipeline: Fix attention mask by @smirnov-alexey in #1812
LLaVA: align the number of tokens in history and kv_cache by @Wovchena in #1788
Text2ImagePipeline heterogenous compile by @RyanMetcalfeInt8 in #1768
Fixed VLM metrics test. by @popovaan in #1810
[TEST][PR5] Implementing test infra by @iefode in #1797
InputsEmbedderLLaVANext: push_back() embeddings by @Wovchena in #1813
Added mutex to add_request() with images. by @popovaan in #1808
Allow build w/o python by @ilya-lavrenov in #1822
GHA: pinned OpenVINO by @ilya-lavrenov in #1821
Tokenizer: fixed decode of special tokens during init stage by @ilya-lavrenov in #1823
Add ov::Tensor from_npy(), remove duplicate print_tensor() by @Wovchena in #1824
SD3 Reshape + Heterogenous Compile by @RyanMetcalfeInt8 in #1818
[ImageGeneration] FLUX pipeline assert for strength by @likholat in #1825
[GHA][WIN] use azure runners by @mryzhov in #1819
flux_pipeline: Add support for heterogeneous compile by @RyanMetcalfeInt8 in #1828
update supported model_ids by @eaidova in #1834
Switch NPU LLM execution to ov::genai::StatefulLLMPipeline by @TolyaTalamanov in #1677
Revert "GHA: pinned OpenVINO" by @ilya-lavrenov in #1835
[GHA] Use TinyLlama-1.1B-Chat-v1.0 instead of LaMini-GPT-124M by @mryzhov in #1816
Add support for inpainting/image2image pipeline to llm_bench by @sbalandi in #1806
Update README.md by @SunnyLi2015 in #1837
[JS] Setup genai nodejs bindings compilation for macos by @Retribution98 in #1738
[CB]: allow int8 KV cache precision for CPU by @ilya-lavrenov in #1552
[JS] Fix NPM CPACK_GENERATOR by @Retribution98 in #1842
Update heterogeneous_stable_diffusion.py by @ilya-lavrenov in #1844
[GHA] reworked lcm_dreamshaper and stable_diffusion_1_5_cpp pipelines by @mryzhov in #1836
[llm_bench] Fix way with relative path of media for json prompts by @sbalandi in #1843
CB: rely on CPU logic for KV cache precision and shape by @ilya-lavrenov in #1838
[GHA] replaced benchmark_genai-ubuntu by @mryzhov in #1817
[GHA] Replaced visual_language_chat_sample-ubuntu-llava by @mryzhov in #1802
Strengthen perf_metrics test, rename var by @Wovchena in #1847
Use get_max_new_tokens() insted of max_new_tokens field when stopping… by @michalkulakowski in #1417
Fixed inference of HF version of internvl by @AlexKoff88 in #1849
Prefix caching for sequences with embeddings. by @popovaan in #1841
Extend VLM to run LM on NPU by @TolyaTalamanov in #1783
[PR6][Test infra] Starting to use of pipeline types by @iefode in #1814
CB: added error messages when CB backend is explicitly asked, but not available by @ilya-lavrenov in #1693
Fix running generate with encoded inputs one after another with the same input data by @sbalandi in #1850
[WWB]: align phi3_v by @Wovchena in #1853
[Docs] Add initial docs pages version by @yatarkan in #1765
Static Whisper: transformations for dual-model whisper by @eshiryae in #1820
Implement CANCEL for streaming with VLM Pipeline by @sbalandi in #1725
Add C API for LLMPipeline by @apinge in #1778
Use Sampler for StaticWhisperPipeline by @eshiryae in #1713
CB: rely on GPU logic for KV cache precision and shape by @sshlyapn in #1848
[Image Generation] Flux Fill inpainting pipeline by @likholat in #1857
MiniCPM-V-2_6: add native tag by @Wovchena in #1858
[tokenizer] Fix setting max_legth and special tokens flags by @pavel-esir in #1860
GHA pin OpenVINO by @ilya-lavrenov in #1865
FLUX: move FluxPipeline with pipeline type to protected by @ilya-lavrenov in #1864
[JS] Split building and testing NodeJS for Linux CI by @Retribution98 in #1859
NPU LLM: Add prefill hint (dynamic/static) by @dmatveev in #1867
Add averaged results dumping to llm_bench output json by @nikita-savelyevv in #1862
Add heterogenous compile API for image2image & inpainting by @RyanMetcalfeInt8 in #1868
CB constructor from ModelsMap by @popovaan in #1863
BUILD: set rpath for GenAI to OpenVINO C when building wheel by @ilya-lavrenov in #1869
SD3: fix case w/o T5 by @ilya-lavrenov in #1876
Revert "GHA pin OpenVINO" by @ilya-lavrenov in #1880
[Docs] Add supported models & introduction pages by @yatarkan in #1839
Qwen2-VL: add native tag by @Wovchena in #1884
Update tests to check encoded inputs with chat by @sbalandi in #1866
BUILD: second attempt to fix RPATHs by @ilya-lavrenov in #1886
Adjust the LLM pipeline C API to ensure it can determine the required sufficient size for the output. by @apinge in #1871
[GHA] replaced cpp-Phi-1_5 by @mryzhov in #1798
Tokenizers update by @ilya-lavrenov in #1879
[GHA] removed cpp-greedy_causal_lm-windows by @mryzhov in #1887
Chat mode for VLM Continuous Batching by @popovaan in #1872
Removed LMS Discrete by @ilya-lavrenov in #1892
Dedicated library for OpenVINO GenAI C by @ilya-lavrenov in #1896
[Docs] Align list of supported LLMs with Optimum-Intel by @yatarkan in #1893
Update cpp sample CMakeLists.txt by @sammysun0711 in #1898
Fixed stubgen issue by @ilya-lavrenov in #1894
Bump prismjs from 1.29.0 to 1.30.0 in /site in the npm_and_yarn group across 1 directory by @dependabot in #1881
[GHA] Mac pipeline fixes by @mryzhov in #1903
Added C API to cpack by @ilya-lavrenov in #1908
[GHA] Removed visual_language_chat_sample-ubuntu-internvl2 and visual_language_chat_sample-ubuntu-qwen2vl by @mryzhov in #1888
VLM: fix image separators by @Wovchena in #1902
Use circular buffer of infer requests in VLM components by @mzegla in #1833
Bump the npm_and_yarn group across 1 directory with 3 updates by @dependabot in #1899
Update perf metrics for image generation pipeline to use get_performance_metrics by @sbalandi in #1895
[llm_bench] Allow to provide scheduler config for vlm by @sbalandi in #1906
Store EncodedImage's in VLM CB chat history. by @popovaan in #1901
Move speculative decoding from streamer to metrics benchmarking approach by @sbalandi in #1904
Updated tokenizers by @ilya-lavrenov in #1912
Add function to create usm host tensor by @ahnyoung-paul in #1900
VisionEncoderPhi3V: fix multiple infers by @Wovchena in #1914
GHA: switch to 2025.1 branches by @ilya-lavrenov in #1919
Enhance the flexibility of the c streamer by @apinge in #1940
Revert perf regression changes by @dkalinowski in #1944
VLM: change infer to start_async/wait by @dkalinowski in #1947
Added possibility to generate base text on GPU for text evaluation by @ljaljushkin in #1955
SDL tokenizers fixes by @mryzhov in #1958
Synchronize entire embeddings calculation phase by @mzegla in #1967

New Contributors

@t-jankowski made their first contribution in #1588
@vishniakov-nikolai made their first contribution in #1193
@akazakov-github made their first contribution in #1657
@wkobielx made their first contribution in #1703
@jszczepa made their first contribution in #1721
@luke-lin-vmc made their first contribution in #1772
@Huanli-Gong made their first contribution in #1784
@SunnyLi2015 made their first contribution in #1837
@michalkulakowski made their first contribution in #1417
@ahnyoung-paul made their first contribution in #1900

Full Changelog: 2025.0.1.0...2025.1.0.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

2025.1.0.0

What's Changed

New Contributors

Contributors

Uh oh!