Add EXAONE 4.0 model architecture #14630

lgai-exaone · 2025-07-11T09:15:34Z

Add EXAONE 4.0 modeling code in preparation for official model release by LG AI Research.

This PR adds the modeling code for EXAONE 4.0 along with conversion codes for HuggingFace transformers checkpoints. The feature was requested in issue: #14474. The implementation is based on the modeling code from this PR in the transformers repository.

We have tested it internally with our checkpoints, and the relevant parts will be updated once our official checkpoints are released.

convert_hf_to_gguf_update.py

convert_hf_to_gguf.py

gguf-py/gguf/tensor_mapping.py

include/llama.h

convert_hf_to_gguf.py

src/llama-model.cpp

convert_hf_to_gguf.py

lgai-exaone · 2025-07-14T08:17:04Z

Hello, maintainers!

We've encountered two issues when using EXAONE 4.0 models with this implementation:

When using llama-cli with argument -cnv, it appears that reasoning token (<think>) is forcefully added to end.
- Is there a way to control the enable_thinking parameter in the tokenizer when using llama-cli in conversational mode? The /think and /no_think commands don't seem to be working.
When requesting chat completion from llama-server with enable_thinking=True, EXAONE 4.0 should close its reasoning block with </think>, Instead, it finishes generation with all content placed in reasoning_content rather than content.
- We suspect this stems from chat template logic in llama.cpp, but we haven't been able to find any solutions to resolve this issue.

Could you help us solve these problems? We can provide test results and additional details if needed.

Our full chat template is shown below:

{%- if not skip_think is defined %}
  {%- set skip_think = true %}
{%- endif %}

{%- set role_indicators = {
    'user': '[|user|]\n',
    'assistant': '[|assistant|]\n',
    'system': '[|system|]\n',
    'tool': '[|tool|]\n'
} %}
{%- set end_of_turn = '[|endofturn|]\n' %}


{%- macro available_tools(tools) %}
    {{- "# Available Tools" }}
    {{- "\nYou can use none, one, or multiple of the following tools by calling them as functions to help with the user’s query." }}
    {{- "\nHere are the tools available to you in JSON format within <tool> and </tool> tags:\n" }}
    {%- for tool in tools %}
        {{- "<tool>" }}
        {{- tool | tojson(ensure_ascii=False) | safe }}
        {{- "</tool>\n" }}
    {%- endfor %}

    {{- "\nFor each function call you want to make, return a JSON object with function name and arguments within <tool_call> and </tool_call> tags, like:" }}
    {{- "\n<tool_call>{\"name\": function_1_name, \"arguments\": {argument_1_name: argument_1_value, argument_2_name: argument_2_value}}</tool_call>" }}
    {{- "\n<tool_call>{\"name\": function_2_name, \"arguments\": {...}}</tool_call>\n..." }}
    {{- "\nNote that if no argument name is specified for a tool, you can just print the argument value directly, without the argument name or JSON formatting." }}
{%- endmacro %}


{%- set ns = namespace(last_query_index = messages|length - 1) %}
{%- for message in messages %}
    {%- if message.role == "user" and message.content is string %}
        {%- set ns.last_query_index = loop.index0 -%}
    {%- endif %}
{%- endfor %}

{%- for i in range(messages | length) %}
    {%- set msg = messages[i] %}
    {%- set role = msg.role %}
    {%- if role not in role_indicators %}
        {{- raise_exception('Unknown role: ' ~ role) }}
    {%- endif %}
    
    {%- if i == 0 %}
        {%- if role == 'system' %}
            {{- role_indicators['system'] }}
            {{- msg.content }}
            {%- if tools is defined and tools %}
                {{- "\n\n" }}{{- available_tools(tools) }}
            {%- endif %}
            {{- end_of_turn -}}
            {%- continue %}
        {%- elif tools is defined and tools %}            
            {{- role_indicators['system'] }}
            {{- available_tools(tools) }}
            {{- end_of_turn -}}            
        {%- endif %}
    {%- endif %}

    {%- if role == 'assistant' %}
        {{- role_indicators['assistant'] }}

        {%- if msg.content %}        
            {%- if "</think>" in msg.content %}
                {%- set content = msg.content.split('</think>')[-1].strip() %}
                {%- set reasoning_content = msg.content.split('</think>')[0].strip() %}
                {%- if reasoning_content.startswith("<think>") %}
                    {%- set reasoning_content = reasoning_content[9:].strip() %}
                {%- endif %}
            {%- else %}
                {%- set content = msg.content %}
            {%- endif %}

            {%- if msg.reasoning_content %}
                {%- set reasoning_content = msg.reasoning_content %}
            {%- endif %}

            {%- if (not skip_think and loop.last) and reasoning_content is defined %}
                {{- "<think>\n" }}
                {{- reasoning_content}}
                {{- "\n</think>\n\n" }}
            {%- else %}
                {{- "<think>\n\n</think>\n\n" }}
            {%- endif %}
            {{- content }}
        {%- endif %}

        {%- if msg.tool_calls %}
            {%- if msg.content %}
                {{- "\n" }}
            {%- else %}
                {{- "<think>\n\n</think>\n\n" }}
            {%- endif %}
            {%- for tool_call in msg.tool_calls %}
                {%- if tool_call.function is defined %}
                    {%- set tool_call = tool_call.function %}
                {%- endif %}

                {%- if tool_call.arguments is defined %}
                    {%- set arguments = tool_call.arguments %}
                {%- elif tool_call.parameters is defined %}
                    {%- set arguments = tool_call.parameters %}
                {%- else %}
                    {{- raise_exception('arguments or parameters are mandatory: ' ~ tool_call) }}
                {%- endif %}

                {{- "<tool_call>" }}{"name": "{{- tool_call.name }}", "arguments": {{ arguments | tojson(ensure_ascii=False) | safe }}}{{- "</tool_call>" }}

                {%- if not loop.last %}
                    {{- "\n" }}
                {%- endif %}

            {%- endfor %}
        {%- endif %}
        {{- end_of_turn -}}

    {%- elif role == "tool" %}
        {%- if i == 0 or messages[i - 1].role != "tool" %}
            {{- role_indicators['tool'] }}
        {%- endif %}
        {%- if msg.content is defined %}            
            {{- "<tool_result>" }}{"result": {{ msg.content | tojson(ensure_ascii=False) | safe }}}{{- "</tool_result>" }}            
        {%- endif %}
        {%- if loop.last or messages[i + 1].role != "tool" %}
            {{- end_of_turn -}}
        {%- else %}
            {{- "\n" }}
        {%- endif %}

    {%- else %}
        {{- role_indicators[role] }}
        {{- msg.content }}
        {{- end_of_turn -}}
    {%- endif %}
{% endfor %}


{%- if add_generation_prompt %}
    {{- role_indicators['assistant'] }}
    {%- if enable_thinking is defined and enable_thinking is true %}
        {{- "<think>\n" }}
    {%- else %}
        {{- "<think>\n\n</think>\n\n" }}
    {%- endif %}
{%- endif %}

Our simplified chat template for llama.cpp is as below:

{%- set end_of_turn = '[|endofturn|]\n' %}

{%- macro available_tools(tools) %}
    {{- "# Available Tools" }}
    {{- "\nYou can use none, one, or multiple of the following tools by calling them as functions to help with the user’s query." }}
    {{- "\nHere are the tools available to you in JSON format within <tool> and </tool> tags:\n" }}
    {%- for tool in tools %}
        {{- "<tool>" }}
        {{- tool | tojson | safe }}
        {{- "</tool>\n" }}
    {%- endfor %}

    {{- "\nFor each function call you want to make, return a JSON object with function name and arguments within <tool_call> and </tool_call> tags, like:" }}
    {{- "\n<tool_call>{\"name\": function_1_name, \"arguments\": {argument_1_name: argument_1_value, argument_2_name: argument_2_value}}</tool_call>" }}
    {{- "\n<tool_call>{\"name\": function_2_name, \"arguments\": {...}}</tool_call>\n..." }}
    {{- "\nNote that if no argument name is specified for a tool, you can just print the argument value directly, without the argument name or JSON formatting." }}
{%- endmacro %}


{%- set ns = namespace(last_query_index = messages|length - 1) %}
{%- for message in messages %}
    {%- if message.role == "user" and message.content is string %}
        {%- set ns.last_query_index = loop.index0 -%}
    {%- endif %}
{%- endfor %}

{%- for i in range(messages | length) %}
    {%- set msg = messages[i] %}
    {%- set role = msg.role %}
    
    {%- if i == 0 %}
        {%- if role == 'system' %}
            {{- "[|system|]" }}
            {{- msg.content }}
            {%- if tools is defined and tools %}
                {{- "\n\n" }}{{- available_tools(tools) }}
            {%- endif %}
            {{- end_of_turn -}}
            {%- continue %}
        {%- elif tools is defined and tools %}            
            {{- "[|system|]" }}
            {{- available_tools(tools) }}
            {{- end_of_turn -}}            
        {%- endif %}
    {%- endif %}

    {%- if role == 'assistant' %}
        {{- "[|assistant|]" }}

        {%- if msg.content %}        
            {%- if "</think>" in msg.content %}
                {%- set content = msg.content.split('</think>')[-1].strip() %}
                {%- set reasoning_content = msg.content.split('</think>')[0].strip() %}
                {%- if reasoning_content.startswith("<think>") %}
                    {%- set reasoning_content = reasoning_content[9:].strip() %}
                {%- endif %}
            {%- else %}
                {%- set content = msg.content %}
            {%- endif %}

            {%- if msg.reasoning_content %}
                {%- set reasoning_content = msg.reasoning_content %}
            {%- endif %}

            {{- content }}
        {%- endif %}

        {%- if msg.tool_calls %}
            {%- if msg.content %}
                {{- "\n" }}
            {%- else %}
                {{- "<think>\n\n</think>\n\n" }}
            {%- endif %}
            {%- for tool_call in msg.tool_calls %}
                {%- if tool_call.function is defined %}
                    {%- set tool_call = tool_call.function %}
                {%- endif %}

                {%- if tool_call.arguments is defined %}
                    {%- set arguments = tool_call.arguments %}
                {%- elif tool_call.parameters is defined %}
                    {%- set arguments = tool_call.parameters %}
                {%- else %}
                    {{- raise_exception('arguments or parameters are mandatory: ' ~ tool_call) }}
                {%- endif %}

                {{- "<tool_call>" }}{"name": "{{- tool_call.name }}", "arguments": {{ arguments | tojson | safe }}}{{- "</tool_call>" }}

                {%- if not loop.last %}
                    {{- "\n" }}
                {%- endif %}

            {%- endfor %}
        {%- endif %}
        {{- end_of_turn -}}

    {%- elif role == "tool" %}
        {%- if i == 0 or messages[i - 1].role != "tool" %}
            {{- "[|tool|]" }}
        {%- endif %}
        {%- if msg.content is defined %}            
            {{- "<tool_result>" }}{"result": {{ msg.content | tojson | safe }}}{{- "</tool_result>" }}            
        {%- endif %}
        {%- if loop.last or messages[i + 1].role != "tool" %}
            {{- end_of_turn -}}
        {%- else %}
            {{- "\n" }}
        {%- endif %}

    {%- else %}
        {{- "[|user|]" }}
        {{- msg.content }}
        {{- end_of_turn -}}
    {%- endif %}
{% endfor %}


{%- if add_generation_prompt %}
    {{- "[|assistant|]" }}
    {%- if enable_thinking is defined and enable_thinking is true %}
        {{- "<think>\n" }}
    {%- else %}
        {{- "<think>\n\n</think>\n\n" }}
    {%- endif %}
{%- endif %}

CISC · 2025-07-14T10:02:13Z

1. When using `llama-cli` with argument `-cnv`, it appears that reasoning token (`<think>`) is forcefully added to end.

Strange, might be minja bug though.

   * Is there a way to control the `enable_thinking` parameter in the tokenizer when using `llama-cli` in conversational mode? The `/think` and `/no_think` commands don't seem to be working.

Unfortunately this is not supported by llama-cli yet, see #13196 (comment)

2. When requesting chat completion from `llama-server` with `enable_thinking=True`, EXAONE 4.0 should close its reasoning block with `</think>`, Instead, it finishes generation with all content placed in `reasoning_content` rather than `content`.
   
   * We suspect this stems from chat template logic in `llama.cpp`, but we haven't been able to find any solutions to resolve this issue.

There are quite a few outstanding issues regarding thinking, so it's not unlikely that it is broken, try playing around with the --reasoning-format option though.

Also be aware that rewriting chat history between turns will confuse llama-cli, see #13404

    {%- for tool in tools %}
        {{- "<tool>" }}
        {{- tool | tojson(ensure_ascii=False) | safe }}
        {{- "</tool>\n" }}
    {%- endfor %}

I see you use ensure_ascii=False and the safe filter everywhere, you shouldn't though, ensure_ascii is disabled by default and safe has the opposite effect in chat templates as opposed to in normal jinja2 templates, generating unparseable JSON.

lgai-exaone · 2025-07-14T18:50:42Z

Thank you for your explanation.

Based on our understanding, there is no need to update EXAONE 4.0 specific code, and we believe this PR is ready to merge. 😀

CISC · 2025-07-14T19:18:46Z

Based on our understanding, there is no need to update EXAONE 4.0 specific code, and we believe this PR is ready to merge. 😀

Yes, as soon as the final repo url is added to convert_hf_to_gguf_update.py (does not have to be public as long as someone has access and can verify the tokenizer with test-tokenizer-0).

lgai-exaone · 2025-07-15T01:43:21Z

We are happy to announce that our EXAONE 4.0 models are released!
https://github.com/LG-AI-EXAONE/EXAONE-4.0

convert_hf_to_gguf_update.py

convert_hf_to_gguf.py

CISC · 2025-07-15T05:25:50Z

Looks like something is off, test-tokenizer-0 fails...

src/llama-vocab.cpp

src/llama-vocab.h

CISC · 2025-07-17T21:11:29Z

src/llama-chat.cpp

+            } else if (role == "assistant_tool_call") {
+                ss << "[|tool|]" << trim(message->content) << "[|endofturn|]\n";


Suggested change

} else if (role == "assistant_tool_call") {

ss << "[|tool|]" << trim(message->content) << "[|endofturn|]\n";

} else if (role == "tool") {

ss << "[|tool|]" << trim(message->content) << "[|endofturn|]\n";

This is most likely what you meant, however I see in your chat template that you also have <tool_result>/</tool_result> wrapped around some additional JSON.

Either way, this probably does not matter as I think --jinja is required for tool calling to work properly.

Add EXAONE 4.0 model architecture

8d0f632

lgai-exaone force-pushed the add-exaone4 branch from cd722ae to 8d0f632 Compare July 11, 2025 09:18

CISC requested changes Jul 11, 2025

View reviewed changes

github-actions bot added the python python script changes label Jul 11, 2025

lgai-exaone added 5 commits July 12, 2025 01:14

Merge remote-tracking branch 'upstream/master' into add-exaone4

a79e389

Apply PR feedback

e625aea

Fix typo

bd734e8

Merge remote-tracking branch 'upstream/master' into add-exaone4

17ac98d

Add missing enum: EXAONE4

5b69c4f

CISC requested changes Jul 12, 2025

View reviewed changes

arch-btw reviewed Jul 12, 2025

View reviewed changes

convert_hf_to_gguf.py Outdated Show resolved Hide resolved

lgai-exaone added 2 commits July 12, 2025 23:46

Apply PR feedback

28e6531

Merge remote-tracking branch 'upstream/master' into add-exaone4

b15ccbc

arch-btw reviewed Jul 15, 2025

View reviewed changes

convert_hf_to_gguf_update.py Outdated Show resolved Hide resolved

lgai-exaone added 2 commits July 15, 2025 12:07

Update codes

3a7b5c8

Revert changes

7ffdb37

CISC requested changes Jul 15, 2025

View reviewed changes

convert_hf_to_gguf.py Outdated Show resolved Hide resolved

Apply PR feedback

2e66538

CISC requested changes Jul 15, 2025

View reviewed changes

src/llama-vocab.cpp Outdated Show resolved Hide resolved

Update src/llama-vocab.cpp

33f1bb9

CISC approved these changes Jul 15, 2025

View reviewed changes

CISC requested a review from ggerganov July 15, 2025 06:41

CISC reviewed Jul 15, 2025

View reviewed changes

src/llama-vocab.cpp Outdated Show resolved Hide resolved

src/llama-vocab.cpp Outdated Show resolved Hide resolved

src/llama-vocab.h Outdated Show resolved Hide resolved

rick-github mentioned this pull request Jul 15, 2025

Support EXAONE-4.0 ollama/ollama#11433

Open

Apply PR feedback

f5d6967

CISC reviewed Jul 17, 2025

View reviewed changes

		} else if (role == "assistant_tool_call") {
		ss << "[\|tool\|]" << trim(message->content) << "[\|endofturn\|]\n";

Add EXAONE 4.0 model architecture #14630

Are you sure you want to change the base?

Add EXAONE 4.0 model architecture #14630

Conversation

lgai-exaone commented Jul 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lgai-exaone commented Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CISC commented Jul 14, 2025

Uh oh!

lgai-exaone commented Jul 14, 2025

Uh oh!

CISC commented Jul 14, 2025

Uh oh!

lgai-exaone commented Jul 15, 2025

Uh oh!

Uh oh!

Uh oh!

CISC commented Jul 15, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CISC Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lgai-exaone commented Jul 14, 2025 •

edited

Loading