Skip to content

Enable tool use #151

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 29, 2025
Merged

Enable tool use #151

merged 5 commits into from
Jan 29, 2025

Conversation

DePasqualeOrg
Copy link
Contributor

@DePasqualeOrg DePasqualeOrg commented Jan 1, 2025

Once johnmai-dev/Jinja#8 gets merged, this will enable tool use.

#158 needs to be merged before this PR.

@DePasqualeOrg DePasqualeOrg force-pushed the images-and-tools branch 6 times, most recently from be1f482 to 706e9a1 Compare January 3, 2025 09:03
@pcuenca
Copy link
Member

pcuenca commented Jan 3, 2025

Happy to review this one when it's ready and #150 has been merged.

@DePasqualeOrg
Copy link
Contributor Author

@pcuenca, this is ready to go as soon as we can merge the update to Swift Jinja as well as the formatting in this repo. I'd love to move forward with this so that we can showcase the new capabilities in mlx-swift-examples.

@DePasqualeOrg DePasqualeOrg marked this pull request as ready for review January 22, 2025 08:00
@DePasqualeOrg
Copy link
Contributor Author

@pcuenca, this is now using the latest release of Swift Jinja, so it's ready to review.

@pcuenca
Copy link
Member

pcuenca commented Jan 22, 2025

Can we please remove the formatting changes from this PR 🙏? I'm still going through all the settings in swift-format and will provide feedback on #158.

@DePasqualeOrg
Copy link
Contributor Author

DePasqualeOrg commented Jan 22, 2025

That would require a lot of tedious edits. If you changed your mind on your preferred formatting, a much easier solution would be for you to first merge your own formatting PR, and I'll rebase this PR using the new formatting. Or just use the formatting that we already agreed on.

I ran formatting first on this PR because formatting differences often cause merge conflicts, and my Xcode uses two spaces for indentation. It's much easier to simply run the formatting before committing, and it's difficult to contribute to a repo that doesn't have consistent formatting.

I'd also like to point out that automated formatting is an issue that other contributors have been asking for since April of last year, but you never followed through on that PR.

@DePasqualeOrg
Copy link
Contributor Author

DePasqualeOrg commented Jan 23, 2025

@pcuenca, can we please move forward with this? I have contributed a significant amount of free labor so that people can use the capabilities of the latest models in Swift. There's a lot of interest right now in using DeepSeek R1 in particular. The solution here is simple and doesn't require much effort from you.

You recently referred to this library as merely "experimental", but in fact it has become a critical piece of infrastructure for the Swift MLX ecosystem. As a contributor to this library as well as others downstream and upstream of it, I would appreciate it if you could be more responsive to PRs.

Cc @julien-c

@DePasqualeOrg
Copy link
Contributor Author

Here is my proposed path forward, which will make things very easy for everyone:

This repo needs some kind of automated formatting, which other contributors have been asking for since last year. Automated formatting is a part of nearly every major repository that people collaborate on, including nearly all of Hugging Face's public repos, because people have different IDE settings that result in different indentation and other differences.

Automated formatting allows people to keep their preferred IDE settings and run one command to eliminate merge conflicts and unnecessary diffs. It's also very useful when pasting in code from LLMs, which often use different formatting. Running automated formatting allows you to see a clean diff.

As for the specific rules, I will leave that entirely up to @pcuenca's judgement. Even an extremely minimal config with only rules for indentation and line length would already solve the problem.

You can merge your preferred formatting with swift-format (see my PR for an example), and then I will format, squash, and rebase, and you will see only the changes relevant to this PR in one single commit. That's the beauty of formatting.

@pcuenca
Copy link
Member

pcuenca commented Jan 26, 2025

Sure, we can do that, but:

  • Swift formatting, in my opinion, has less priority than fixing existing tokenizers or supporting the Deepseek tokenizer. These tasks unblock MLX and fix other downstream use cases. Hence Fix Phi 4 tokenization #153, Bert fixes #157, Deepseek R1 tokenization support #159, Upgrade jinja #160, Override Llama tokenizer post-processor if necessary #163 were all handled with urgency. Add metadata support with tests #155 is also super important for the long-term utility of the library, and I'd like to review it as soon as possible.
  • Tools support is also more important than agreeing on a syntax. I do appreciate your contribution a lot, as well as all your previous efforts!
  • I dislike some of the defaults applied by swift-format and spent some time trying to find a reasonable ground, but couldn't finish yet. I thought that it would only apply the rules that appear in the format file, but it applies several rules by default. I don't take source code changes lightly as they make history more difficult to browse.
  • I can't see why cherry-picking the relevant commits from this PR (is it just 24187a2?) into a separate branch would be complicated. Please, forgive me if I'm wrong, but that would have allowed us to review and test that feature while we work (more slowly) in the swift-format changes.

As I tried to explain offline, swift-format has not been a huge priority so far, but I'll happily adopt it seeing you are contributing regularly to this project and other related ones. However, I don't think it'll be beneficial for the project to hastily adopt the set of syntax changes that were included in the PR.

So my current thinking is to work on the following priorities this week:

@DePasqualeOrg
Copy link
Contributor Author

Thanks, I was able to cherry pick the key commit and resolve some conflicts. I didn't think of that possibility before, but I guess that's the beauty of git. Your plan sounds good to me. You can review this PR whenever you're able to.

It's very difficult to test the tool use templates in Swift because of Swift's non-deterministic key order in dictionaries. We're already testing this in Jinja. I added a disabled test as a starting point in case we want to do this later. That would require importing swift-collections only for the tests and using OrderedDictionary instead of native Swift dictionaries. I have already tested tool use in ml-explore/mlx-swift-examples#174, and it works as expected.

Copy link
Member

@pcuenca pcuenca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @DePasqualeOrg, thanks for the contribution and sorry for the delay in getting to it!

I think the changes are fine, but I think it'd still be useful to have a test. I propose to extract the tools section from the rendered string, as shown in a comment.

//"""
//
// XCTAssertEqual(decoded, target)
// }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit cumbersome, but given that you already have the tools dictionary, we could decode the tools section from the decoded string as a dictionary and compare that way:

    func testQwen2_5WithTools() async throws {
        let tokenizer = try await AutoTokenizer.from(pretrained: "mlx-community/Qwen2.5-7B-Instruct-4bit")

        let weatherQueryMessages: [[String: String]] = [
            [
                "role": "user",
                "content": "What is the weather in Paris today?",
            ]
        ]

        let getCurrentWeatherToolSpec: [String: Any] = [
            "type": "function",
            "function": [
                "name": "get_current_weather",
                "description": "Get the current weather in a given location",
                "parameters": [
                    "type": "object",
                    "properties": [
                        "location": [
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA"
                        ],
                        "unit": [
                            "type": "string",
                            "enum": ["celsius", "fahrenheit"]
                        ]
                    ],
                    "required": ["location"]
                ]
            ]
        ]

        let encoded = try tokenizer.applyChatTemplate(messages: weatherQueryMessages, tools: [getCurrentWeatherToolSpec])
        let decoded = tokenizer.decode(tokens: encoded)

        func assertDictsAreEqual(_ actual: [String: Any], _ expected: [String: Any]) {
            for (key, value) in actual {
                if let nestedDict = value as? [String: Any], let nestedDict2 = expected[key] as? [String: Any] {
                    assertDictsAreEqual(nestedDict, nestedDict2)
                } else if let arrayValue = value as? [String] {
                    let expectedArrayValue = expected[key] as? [String]
                    XCTAssertNotNil(expectedArrayValue)
                    XCTAssertEqual(Set(arrayValue), Set(expectedArrayValue!))
                } else {
                    XCTAssertEqual(value as? String, expected[key] as? String)
                }
            }
        }

        if let startRange = decoded.range(of: "<tools>\n"),
           let endRange = decoded.range(of: "\n</tools>", range: startRange.upperBound..<decoded.endIndex) {
            let toolsSection = String(decoded[startRange.upperBound..<endRange.lowerBound])
            if let toolsDict = try? JSONSerialization.jsonObject(with: toolsSection.data(using: .utf8)!) as? [String : Any] {
                assertDictsAreEqual(toolsDict, getCurrentWeatherToolSpec)
            } else {
                XCTFail("Failed to decode tools section")
            }
        } else {
            XCTFail("Failed to find tools section")
        }
    }

I'd also suggest to add additional sanity checks that would be easier to implement: the string starts and ends with the known fragments, or something along those lines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I've added the test as you suggested.

Copy link
Member

@pcuenca pcuenca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, I just realized about a small API breaking change. Would be interested to know. your thoughts.

More context: #148

/// A chat template can optionally be provided or specified by name when several templates are included in the tokenizer config. Normally this is not necessary.
chatTemplate: ChatTemplateArgument?,
addGenerationPrompt: Bool,
truncation: Bool,
maxLength: Int?,
tools: [[String: Any]]?
tools: [ToolSpec]?,
additionalContext: [String: Any]?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This argument actually introduces a breaking API change for users that provide their own Tokenizer implementations. Do you think we could introduce a default implementation somehow? For example, we could keep the previous declaration, and create a default implementation of this method that calls the old one when additionalContext is nil.

Otherwise we can always introduce this as part of 0.2.0, but I'd rather add it as an incremental update so most people can use it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I added an overload for the previous function signature.


/// The chat template is provided as a string literal
func applyChatTemplate(messages: [[String: String]], chatTemplate: String) throws -> [Int]
func applyChatTemplate(messages: [Message], chatTemplate: String) throws -> [Int]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func applyChatTemplate(messages: [Message], chatTemplate: String) throws -> [Int]
func applyChatTemplate(messages: [Message], chatTemplate: String) throws -> [Int]
func applyChatTemplate(
messages: [Message],
/// A chat template can optionally be provided or specified by name when several templates are included in the tokenizer config. Normally this is not necessary.
chatTemplate: ChatTemplateArgument?,
addGenerationPrompt: Bool,
truncation: Bool,
maxLength: Int?,
tools: [ToolSpec]?
) throws -> [Int]

We keep the previous declaration in the protocol.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we also need something like this:

extension Tokenizer {
    /// Call previous signature for backwards compatibility
    func applyChatTemplate(
        messages: [Message],
        /// A chat template can optionally be provided or specified by name when several templates are included in the tokenizer config. Normally this is not necessary.
        chatTemplate: ChatTemplateArgument?,
        addGenerationPrompt: Bool,
        truncation: Bool,
        maxLength: Int?,
        tools: [ToolSpec]?,
        additionalContext: [String: Any]?
    ) throws -> [Int] {
        if additionalContext == nil {
            try applyChatTemplate(messages: messages, chatTemplate: chatTemplate, addGenerationPrompt: addGenerationPrompt, truncation: truncation, maxLength: maxLength, tools: tools)
        } else {
            throw TokenizerError.chatTemplate("Not implemented")
        }
    }
}

It's a bit confusing, but the idea is: if you have written your own Tokenizer implementation in your project, you will have implemented your own version of applyChatTemplate with the previous signature. But now there's a new protocol requirement that takes a new argument (the additionalContext) that does not exist in your code, because you implemented the old version. So your code will not compile. With the extension above, we call the "old" implementation, so the code will still compile and work.

I don't think this affects our implementation in PreTrainedTokenizer below, everything should just keep working.

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be easier if you make the change yourself. I'm not sure if I fully understand the problem.

@pcuenca
Copy link
Member

pcuenca commented Jan 29, 2025

it would be easier if you make the change yourself

I can't. Can you check if you allowed "edits from maintainers"?

@pcuenca
Copy link
Member

pcuenca commented Jan 29, 2025

Otherwise we can merge and I'll open a followup pr.

@DePasqualeOrg
Copy link
Contributor Author

"Allow edits and access to secrets by maintainers" is checked.

@pcuenca
Copy link
Member

pcuenca commented Jan 29, 2025

Ah my bad, it worked now. Let's wait for the CI to clear 🤞

@pcuenca
Copy link
Member

pcuenca commented Jan 29, 2025

Merging, I hope I didn't break anything! Thanks @DePasqualeOrg!

@pcuenca pcuenca merged commit ff81749 into huggingface:main Jan 29, 2025
1 check passed
@DePasqualeOrg
Copy link
Contributor Author

Thank you! Whenever you're ready, could you create a new release tag so that this can get picked up in my PRs for tool use and vision chat templates in mlx-swift-examples?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants