Skip to content

Add EXAONE 4.0 model architecture #14630

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

lgai-exaone
Copy link

Add EXAONE 4.0 modeling code in preparation for official model release by LG AI Research.

This PR adds the modeling code for EXAONE 4.0 along with conversion codes for HuggingFace transformers checkpoints. The feature was requested in issue: #14474. The implementation is based on the modeling code from this PR in the transformers repository.

We have tested it internally with our checkpoints, and the relevant parts will be updated once our official checkpoints are released.

@github-actions github-actions bot added the python python script changes label Jul 11, 2025
@lgai-exaone
Copy link
Author

lgai-exaone commented Jul 14, 2025

Hello, maintainers!

We've encountered two issues when using EXAONE 4.0 models with this implementation:

  1. When using llama-cli with argument -cnv, it appears that reasoning token (<think>) is forcefully added to end.
    • Is there a way to control the enable_thinking parameter in the tokenizer when using llama-cli in conversational mode? The /think and /no_think commands don't seem to be working.
  2. When requesting chat completion from llama-server with enable_thinking=True, EXAONE 4.0 should close its reasoning block with </think>, Instead, it finishes generation with all content placed in reasoning_content rather than content.
    • We suspect this stems from chat template logic in llama.cpp, but we haven't been able to find any solutions to resolve this issue.

Could you help us solve these problems? We can provide test results and additional details if needed.

Our full chat template is shown below:

{%- if not skip_think is defined %}
  {%- set skip_think = true %}
{%- endif %}

{%- set role_indicators = {
    'user': '[|user|]\n',
    'assistant': '[|assistant|]\n',
    'system': '[|system|]\n',
    'tool': '[|tool|]\n'
} %}
{%- set end_of_turn = '[|endofturn|]\n' %}


{%- macro available_tools(tools) %}
    {{- "# Available Tools" }}
    {{- "\nYou can use none, one, or multiple of the following tools by calling them as functions to help with the user’s query." }}
    {{- "\nHere are the tools available to you in JSON format within <tool> and </tool> tags:\n" }}
    {%- for tool in tools %}
        {{- "<tool>" }}
        {{- tool | tojson(ensure_ascii=False) | safe }}
        {{- "</tool>\n" }}
    {%- endfor %}

    {{- "\nFor each function call you want to make, return a JSON object with function name and arguments within <tool_call> and </tool_call> tags, like:" }}
    {{- "\n<tool_call>{\"name\": function_1_name, \"arguments\": {argument_1_name: argument_1_value, argument_2_name: argument_2_value}}</tool_call>" }}
    {{- "\n<tool_call>{\"name\": function_2_name, \"arguments\": {...}}</tool_call>\n..." }}
    {{- "\nNote that if no argument name is specified for a tool, you can just print the argument value directly, without the argument name or JSON formatting." }}
{%- endmacro %}


{%- set ns = namespace(last_query_index = messages|length - 1) %}
{%- for message in messages %}
    {%- if message.role == "user" and message.content is string %}
        {%- set ns.last_query_index = loop.index0 -%}
    {%- endif %}
{%- endfor %}

{%- for i in range(messages | length) %}
    {%- set msg = messages[i] %}
    {%- set role = msg.role %}
    {%- if role not in role_indicators %}
        {{- raise_exception('Unknown role: ' ~ role) }}
    {%- endif %}
    
    {%- if i == 0 %}
        {%- if role == 'system' %}
            {{- role_indicators['system'] }}
            {{- msg.content }}
            {%- if tools is defined and tools %}
                {{- "\n\n" }}{{- available_tools(tools) }}
            {%- endif %}
            {{- end_of_turn -}}
            {%- continue %}
        {%- elif tools is defined and tools %}            
            {{- role_indicators['system'] }}
            {{- available_tools(tools) }}
            {{- end_of_turn -}}            
        {%- endif %}
    {%- endif %}

    {%- if role == 'assistant' %}
        {{- role_indicators['assistant'] }}

        {%- if msg.content %}        
            {%- if "</think>" in msg.content %}
                {%- set content = msg.content.split('</think>')[-1].strip() %}
                {%- set reasoning_content = msg.content.split('</think>')[0].strip() %}
                {%- if reasoning_content.startswith("<think>") %}
                    {%- set reasoning_content = reasoning_content[9:].strip() %}
                {%- endif %}
            {%- else %}
                {%- set content = msg.content %}
            {%- endif %}

            {%- if msg.reasoning_content %}
                {%- set reasoning_content = msg.reasoning_content %}
            {%- endif %}

            {%- if (not skip_think and loop.last) and reasoning_content is defined %}
                {{- "<think>\n" }}
                {{- reasoning_content}}
                {{- "\n</think>\n\n" }}
            {%- else %}
                {{- "<think>\n\n</think>\n\n" }}
            {%- endif %}
            {{- content }}
        {%- endif %}

        {%- if msg.tool_calls %}
            {%- if msg.content %}
                {{- "\n" }}
            {%- else %}
                {{- "<think>\n\n</think>\n\n" }}
            {%- endif %}
            {%- for tool_call in msg.tool_calls %}
                {%- if tool_call.function is defined %}
                    {%- set tool_call = tool_call.function %}
                {%- endif %}

                {%- if tool_call.arguments is defined %}
                    {%- set arguments = tool_call.arguments %}
                {%- elif tool_call.parameters is defined %}
                    {%- set arguments = tool_call.parameters %}
                {%- else %}
                    {{- raise_exception('arguments or parameters are mandatory: ' ~ tool_call) }}
                {%- endif %}

                {{- "<tool_call>" }}{"name": "{{- tool_call.name }}", "arguments": {{ arguments | tojson(ensure_ascii=False) | safe }}}{{- "</tool_call>" }}

                {%- if not loop.last %}
                    {{- "\n" }}
                {%- endif %}

            {%- endfor %}
        {%- endif %}
        {{- end_of_turn -}}

    {%- elif role == "tool" %}
        {%- if i == 0 or messages[i - 1].role != "tool" %}
            {{- role_indicators['tool'] }}
        {%- endif %}
        {%- if msg.content is defined %}            
            {{- "<tool_result>" }}{"result": {{ msg.content | tojson(ensure_ascii=False) | safe }}}{{- "</tool_result>" }}            
        {%- endif %}
        {%- if loop.last or messages[i + 1].role != "tool" %}
            {{- end_of_turn -}}
        {%- else %}
            {{- "\n" }}
        {%- endif %}

    {%- else %}
        {{- role_indicators[role] }}
        {{- msg.content }}
        {{- end_of_turn -}}
    {%- endif %}
{% endfor %}


{%- if add_generation_prompt %}
    {{- role_indicators['assistant'] }}
    {%- if enable_thinking is defined and enable_thinking is true %}
        {{- "<think>\n" }}
    {%- else %}
        {{- "<think>\n\n</think>\n\n" }}
    {%- endif %}
{%- endif %}

Our simplified chat template for llama.cpp is as below:

{%- set end_of_turn = '[|endofturn|]\n' %}

{%- macro available_tools(tools) %}
    {{- "# Available Tools" }}
    {{- "\nYou can use none, one, or multiple of the following tools by calling them as functions to help with the user’s query." }}
    {{- "\nHere are the tools available to you in JSON format within <tool> and </tool> tags:\n" }}
    {%- for tool in tools %}
        {{- "<tool>" }}
        {{- tool | tojson | safe }}
        {{- "</tool>\n" }}
    {%- endfor %}

    {{- "\nFor each function call you want to make, return a JSON object with function name and arguments within <tool_call> and </tool_call> tags, like:" }}
    {{- "\n<tool_call>{\"name\": function_1_name, \"arguments\": {argument_1_name: argument_1_value, argument_2_name: argument_2_value}}</tool_call>" }}
    {{- "\n<tool_call>{\"name\": function_2_name, \"arguments\": {...}}</tool_call>\n..." }}
    {{- "\nNote that if no argument name is specified for a tool, you can just print the argument value directly, without the argument name or JSON formatting." }}
{%- endmacro %}


{%- set ns = namespace(last_query_index = messages|length - 1) %}
{%- for message in messages %}
    {%- if message.role == "user" and message.content is string %}
        {%- set ns.last_query_index = loop.index0 -%}
    {%- endif %}
{%- endfor %}

{%- for i in range(messages | length) %}
    {%- set msg = messages[i] %}
    {%- set role = msg.role %}
    
    {%- if i == 0 %}
        {%- if role == 'system' %}
            {{- "[|system|]" }}
            {{- msg.content }}
            {%- if tools is defined and tools %}
                {{- "\n\n" }}{{- available_tools(tools) }}
            {%- endif %}
            {{- end_of_turn -}}
            {%- continue %}
        {%- elif tools is defined and tools %}            
            {{- "[|system|]" }}
            {{- available_tools(tools) }}
            {{- end_of_turn -}}            
        {%- endif %}
    {%- endif %}

    {%- if role == 'assistant' %}
        {{- "[|assistant|]" }}

        {%- if msg.content %}        
            {%- if "</think>" in msg.content %}
                {%- set content = msg.content.split('</think>')[-1].strip() %}
                {%- set reasoning_content = msg.content.split('</think>')[0].strip() %}
                {%- if reasoning_content.startswith("<think>") %}
                    {%- set reasoning_content = reasoning_content[9:].strip() %}
                {%- endif %}
            {%- else %}
                {%- set content = msg.content %}
            {%- endif %}

            {%- if msg.reasoning_content %}
                {%- set reasoning_content = msg.reasoning_content %}
            {%- endif %}

            {{- content }}
        {%- endif %}

        {%- if msg.tool_calls %}
            {%- if msg.content %}
                {{- "\n" }}
            {%- else %}
                {{- "<think>\n\n</think>\n\n" }}
            {%- endif %}
            {%- for tool_call in msg.tool_calls %}
                {%- if tool_call.function is defined %}
                    {%- set tool_call = tool_call.function %}
                {%- endif %}

                {%- if tool_call.arguments is defined %}
                    {%- set arguments = tool_call.arguments %}
                {%- elif tool_call.parameters is defined %}
                    {%- set arguments = tool_call.parameters %}
                {%- else %}
                    {{- raise_exception('arguments or parameters are mandatory: ' ~ tool_call) }}
                {%- endif %}

                {{- "<tool_call>" }}{"name": "{{- tool_call.name }}", "arguments": {{ arguments | tojson | safe }}}{{- "</tool_call>" }}

                {%- if not loop.last %}
                    {{- "\n" }}
                {%- endif %}

            {%- endfor %}
        {%- endif %}
        {{- end_of_turn -}}

    {%- elif role == "tool" %}
        {%- if i == 0 or messages[i - 1].role != "tool" %}
            {{- "[|tool|]" }}
        {%- endif %}
        {%- if msg.content is defined %}            
            {{- "<tool_result>" }}{"result": {{ msg.content | tojson | safe }}}{{- "</tool_result>" }}            
        {%- endif %}
        {%- if loop.last or messages[i + 1].role != "tool" %}
            {{- end_of_turn -}}
        {%- else %}
            {{- "\n" }}
        {%- endif %}

    {%- else %}
        {{- "[|user|]" }}
        {{- msg.content }}
        {{- end_of_turn -}}
    {%- endif %}
{% endfor %}


{%- if add_generation_prompt %}
    {{- "[|assistant|]" }}
    {%- if enable_thinking is defined and enable_thinking is true %}
        {{- "<think>\n" }}
    {%- else %}
        {{- "<think>\n\n</think>\n\n" }}
    {%- endif %}
{%- endif %}

@CISC
Copy link
Collaborator

CISC commented Jul 14, 2025

1. When using `llama-cli` with argument `-cnv`, it appears that reasoning token (`<think>`) is forcefully added to end.

Strange, might be minja bug though.

   * Is there a way to control the `enable_thinking` parameter in the tokenizer when using `llama-cli` in conversational mode? The `/think` and `/no_think` commands don't seem to be working.

Unfortunately this is not supported by llama-cli yet, see #13196 (comment)

2. When requesting chat completion from `llama-server` with `enable_thinking=True`, EXAONE 4.0 should close its reasoning block with `</think>`, Instead, it finishes generation with all content placed in `reasoning_content` rather than `content`.
   
   * We suspect this stems from chat template logic in `llama.cpp`, but we haven't been able to find any solutions to resolve this issue.

There are quite a few outstanding issues regarding thinking, so it's not unlikely that it is broken, try playing around with the --reasoning-format option though.

Also be aware that rewriting chat history between turns will confuse llama-cli, see #13404

    {%- for tool in tools %}
        {{- "<tool>" }}
        {{- tool | tojson(ensure_ascii=False) | safe }}
        {{- "</tool>\n" }}
    {%- endfor %}

I see you use ensure_ascii=False and the safe filter everywhere, you shouldn't though, ensure_ascii is disabled by default and safe has the opposite effect in chat templates as opposed to in normal jinja2 templates, generating unparseable JSON.

@lgai-exaone
Copy link
Author

Thank you for your explanation.

Based on our understanding, there is no need to update EXAONE 4.0 specific code, and we believe this PR is ready to merge. 😀

@CISC
Copy link
Collaborator

CISC commented Jul 14, 2025

Based on our understanding, there is no need to update EXAONE 4.0 specific code, and we believe this PR is ready to merge. 😀

Yes, as soon as the final repo url is added to convert_hf_to_gguf_update.py (does not have to be public as long as someone has access and can verify the tokenizer with test-tokenizer-0).

@lgai-exaone
Copy link
Author

We are happy to announce that our EXAONE 4.0 models are released!
https://github.com/LG-AI-EXAONE/EXAONE-4.0

@CISC
Copy link
Collaborator

CISC commented Jul 15, 2025

Looks like something is off, test-tokenizer-0 fails...

@CISC CISC requested a review from ggerganov July 15, 2025 06:41
Comment on lines +545 to +546
} else if (role == "assistant_tool_call") {
ss << "[|tool|]" << trim(message->content) << "[|endofturn|]\n";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
} else if (role == "assistant_tool_call") {
ss << "[|tool|]" << trim(message->content) << "[|endofturn|]\n";
} else if (role == "tool") {
ss << "[|tool|]" << trim(message->content) << "[|endofturn|]\n";

This is most likely what you meant, however I see in your chat template that you also have <tool_result>/</tool_result> wrapped around some additional JSON.

Either way, this probably does not matter as I think --jinja is required for tool calling to work properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python python script changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants