Skip to content

Commit 22b85ac

Browse files
Solve encoding issue of repocard.py (#3235)
I discover a problem (even lots of problems...) when I wanted to use push_to_hub on an agent of smolagents, and the problem is due to huggingface_hub, in the file : Bug #1: UnicodeEncodeError when creating README.md Error: UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f440' (for exemple) in position 32: character maps to <undefined> Root cause: Location: Path.write_text() in huggingface_hub/repocard.py:279 Problem: Windows uses CP1252 encoding by default instead of UTF-8 Trigger: A Unicode emoji (\U0001f440 = 👀 for exemple) in the README.md metadata Context: smolagents automatically generates a README.md with emojis, but Windows cannot encode them using CP1252 Bug mechanism: agent.push_to_hub() calls metadata_update() metadata_update() creates a RepoCard with an emoji RepoCard.push_to_hub() uses Path.write_text() without specifying UTF-8 Windows defaults to CP1252 → crash on emoji Solution: So I juste use encoding="utf-8" in write_text()
1 parent 9e0493c commit 22b85ac

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

src/huggingface_hub/repocard.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -276,7 +276,7 @@ def push_to_hub(
276276

277277
with SoftTemporaryDirectory() as tmpdir:
278278
tmp_path = Path(tmpdir) / constants.REPOCARD_NAME
279-
tmp_path.write_text(str(self))
279+
tmp_path.write_text(str(self), encoding="utf-8")
280280
url = upload_file(
281281
path_or_fileobj=str(tmp_path),
282282
path_in_repo=constants.REPOCARD_NAME,

0 commit comments

Comments
 (0)