Skip to content

UnicodeDecodeError for Pydantic models with Chinese characters #60

@lostmypillow

Description

@lostmypillow

Description of problem

Given a Pydantic model with Chinese attributes:

class TrialRecord(BaseModel):
    學號: str

OR with a Chinese alias:

class TrialRecord(BaseModel):
    student_id: str = Field(alias="學號")

pydantic2ts will fail to clean output file, like so

PS ...> pydantic2ts --module backend\src\models\trial_record.py --output backend\output.ts
2025-06-08 20:49:10,988 Finding pydantic models...
2025-06-08 20:49:11,032 Generating JSON schema from pydantic models...
2025-06-08 20:49:11,039 Converting JSON schema to typescript definitions...
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "...\.venv\Scripts\pydantic2ts.exe\__main__.py", line 7, in <module>
    sys.exit(main())
             ~~~~^^
  File "...\.venv\Lib\site-packages\pydantic2ts\cli\script.py", line 404, in main
    return generate_typescript_defs(
        args.module,
    ...<2 lines>...
        args.json2ts_cmd,
    )
  File "...\.venv\Lib\site-packages\pydantic2ts\cli\script.py", line 354, in generate_typescript_defs
    _clean_output_file(output)
    ~~~~~~~~~~~~~~~~~~^^^^^^^^
  File "...\.venv\Lib\site-packages\pydantic2ts\cli\script.py", line 212, in _clean_output_file
    lines = f.readlines()
  File "...\AppData\Local\Programs\Python\Python313\Lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 165: character maps to <undefined>

Suggested Solution

Add encoding="utf-8" to the with open() at lines 211 and 236 at /pydantic2ts/cli/script.py.
This is what I use as workaround after I install this package via pip, but I'd like to see this become a permanent change. To that effect I've also opened a pull request.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions