Skip to content

Feature translator #49

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 4, 2025
Merged

Feature translator #49

merged 4 commits into from
Apr 4, 2025

Conversation

WangYuyang1013
Copy link
Contributor

Project Environment
In package.json, new dependencies required for Rollup bundling as well as other newly used packages have been added. The build logic has been switched from tsc to Rollup to better support integration with the Source Academy frontend. A new Rollup configuration file was created to bundle src/index.ts into dist/index.js, making it easier to deploy and use.
In tsconfig.json, the module type was changed from commonjs to ESNext to support bundling, and the exclude list was updated to ignore directories and files such as dist, docs, and tests.

Codebase Changes
In the newly added file types.ts, two AST node types None and ComplexLiteral were introduced to represent Python’s None and complex numbers respectively. The class PyComplexNumber was also introduced to store and compute complex numbers. Additionally, some error-handling logic was added to the file.

In translator.ts, support was added for translating the new None and ComplexLiteral nodes.

In ast-types.ts, two new expression nodes were added: None and Complex. The Complex node stores its value using the PyComplexNumber class after parsing a complex number string during construction.

For tokenizer.ts
Enhanced string handling logic: Supports parsing multi-line strings enclosed in triple quotes, and accurately handles escape characters such as \n, \, and ". Use lexemeBuffer to record the raw content of strings. Supports detection and handling of illegal escape characters (e.g., \z will throw a SyntaxWarning).
Identifiers can now include legal Unicode characters, validated using the isLegalUnicode function.
Complex numbers: Recognizes characters j and J in numeric contexts as indicating imaginary numbers.
Underscore in numeric literals: Numbers like 1_000_000 are now parsed correctly.

In tokens.ts:
A new token type COMPLEX was added to represent complex number literals.

In parser.ts:
Parsing logic was enhanced to handle integers with underscores and complex numbers.

In resolver.ts:
Built-in Python functions are now handled by registering their names into the environment (without handling function logic yet). Handling logic for the None and ComplexLiteral AST nodes was also added.

Copy link
Contributor

@Fidget-Spinner Fidget-Spinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really great work! Just have one comment.

return "-inf";
}

if (Math.abs(num) >= 1e16 || (num !== 0 && Math.abs(num) < 1e-4)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment stating how you derived these bounds? These magic numbers are a little confusing otherwise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I overlooked this part. I will add comments to this section in the next version.
This code is located in toPythonComplexFloat, where it converts the real and imaginary parts of a complex number into strings. In Python, the real and imag parts of a complex number are stored as floats, so this is essentially about converting floats to strings.
Both Python and TypeScript determine whether to use standard decimal notation or scientific notation when outputting floating-point numbers, but their decision criteria and triggering conditions differ. To ensure that py-slang accurately simulates Python’s behavior, this code forces TypeScript to adopt Python’s logic when deciding how to format floating-point numbers.
Simply put, Python uses scientific notation for numbers less than 1e-4 or greater than or equal to 1e16 (see the format_float_short function in https://github.com/python/cpython/blob/main/Python/pystrtod.c), whereas TypeScript uses different thresholds. So that what 1e-4 and 1e16 in the code stand for.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is super impressive. Great investigative work. I did not know that!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really appreciate that! I'm trying to fully replicate the behavior of the Python interpreter, but there are still some precision differences in the csemachine part, which will be coverred later on.

Copy link
Contributor

@Fidget-Spinner Fidget-Spinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK let's merge it after you add the comment. I'll leave it to you to merge.

@WangYuyang1013 WangYuyang1013 marked this pull request as ready for review April 4, 2025 07:52
@WangYuyang1013 WangYuyang1013 merged commit bd87748 into main Apr 4, 2025
4 checks passed
@WangYuyang1013 WangYuyang1013 deleted the feature-translator branch April 4, 2025 07:53
@WangYuyang1013
Copy link
Contributor Author

Thanks for your review! I've added the comment and just merged the PR as discussed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants