Skip to content

Cse machine core #52

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 13, 2025
Merged

Cse machine core #52

merged 4 commits into from
Apr 13, 2025

Conversation

WangYuyang1013
Copy link
Contributor

This PR is basically the core logic of cse-machine.

First of all, source_1.pdf
Defines the syntax rules of Python Chapter 1 using BNF notation. Includes dynamic type checking and implicit type conversion. This file was originally intended to be placed in docs/python/specs, but since the current update requires referencing its content, it is temporarily pushed here. It will be moved to the correct directory in a future PR.

Interpreter file:
addPrint: Appends the string to be output to cseFinalPrint so that it can be collected and displayed later. If intermediate output needs to be recorded during code execution, this function can be used to add to the final output. Due to Python’s characteristics, in Python Chapter 1 only the print function can output strings to the UI, so only the print function in stdlib can call this method. All functions in stdlib will be added in the next PR update.

CSEResultPromise: Returns the corresponding Promise based on the CSE Machine’s evaluation result.
If the value is a CSEBreak, it indicates that evaluation is suspended, and returns { status: 'suspended-cse-eval' }.
If the value is a CseError, it indicates a runtime error was encountered.
Otherwise, it packages the finished status (finished), the context, and the result together.
Currently, only Error and Finished results exist; CSEBreak will not be triggered.

evaluate: This is the entry point for the explicit-control evaluator. It initializes the control stack and data stack in the Context, then calls runCSEMachine to actually execute the program and returns the final string cseFinalPrint.

evaluateImports: Scans and processes import declarations, loading the corresponding module functions or variables into the current environment.
It uses filterImportDeclarations to obtain the modules to be imported along with their ASTs, and then declares and binds the functions or objects from the module in the current environment via declareIdentifier and defineVariable.
This function is still uncertain if it can be fully adapted in the CSE Machine, as the adaptation to Source Academy modules has not yet been completed.

runCSEMachine: Drives the main loop of the CSE Machine, actually executing the commands in the control stack. It calls generateCSEMachineStateStream to get an iterator, then loops through this iterator until the instructions are executed or a step limit is reached. Finally, it returns the value at the top of the stack. In fact, the value at the top of the stack is not read after returning; it just preserves the basic structure from js-slang.

generateCSEMachineStateStream: A generator function that yields the state of the CSE Machine after each step for debugging or step-by-step execution.
It maintains a steps counter and, after executing each command, yields { stash, control, steps }. It terminates when either envSteps or stepLimit is reached.

printEnvironmentVariables: Used for debugging; prints the current values of all environments and their variables.

cmdEvaluators: This is a "dispatch table" that maps each type of AST node or instruction type to its corresponding processing logic.
Each key is a type (for example, 'Program', 'BinaryExpression', or InstrType.ASSIGNMENT), and its corresponding value is a function of the form (command, context, control, stash, isPrelude) => void.
Below are the main functions of each dispatch function:
Program:
Serves as the entry point of the program, ensuring that the current environment is programEnvironment or global/prelude. If not, it first cleans up any extra environments. Depending on the program's content, it pushes either the first statement or all statements onto the control stack for execution.
BlockStatement:
Creates a new block environment and declares functions or variables inside if necessary (only creates a new environment inside functions). Although an if statement may also trigger this node, according to Python’s characteristics, it does not create a new environment. The block’s internal statements are pushed onto the stack as a StatementSequence.
StatementSequence:
If there is only one statement, it executes it immediately; otherwise, it splits multiple statements and pushes them onto the control stack (possibly inserting a popInstr to pop values).
IfStatement:
Converts a Python if statement into a ternary expression or branch instruction (via reduceConditional), relying on subsequent instructions to evaluate the test and execute either the consequent or alternate branch.
ExpressionStatement:
Directly executes the corresponding method of the inner expression.
VariableDeclaration / FunctionDeclaration:
Creates assignment instructions or a new arrow function and pushes them onto the stack. Eventually, it calls defineVariable to declare them in the environment.
ReturnStatement:
Inserts an instruction of type InstrType.RESET to reset the function call and processes the return value.
ImportDeclaration:
Does not process anything, as the content of import statements has been processed before execution in the CSE Machine.
Literal:
Encapsulates literals (number, string, bool, bigint, complex, etc.) into a Value and pushes it onto the stack.
NoneType:
Constructs a Value representing None and pushes it onto the stack.
ConditionalExpression:
Converts Python’s ternary expression into condition branch instructions and test expressions on the control stack for the CSE Machine to execute.
Identifier:
For an identifier reference or call, if it is a built-in constant (from builtInConstants), it is pushed directly; otherwise, it searches for the variable’s value in the current or parent environments and pushes it onto the stack.
UnaryExpression / BinaryExpression / LogicalExpression:
Converts operators and operands into corresponding instructions (InstrType.UNARY_OP / BINARY_OP) and pushes them onto the stack, with subsequent instructions executing the actual operation (via evaluateUnaryExpression / evaluateBinaryExpression).
ArrowFunctionExpression:
Constructs a closure using Closure.makeFromArrowFunction and pushes it onto the stack.
CallExpression:
Creates an application instruction (InstrType.APPLICATION) based on the number of arguments and pushes the arguments onto the stack. Alternatively, if a special operator (such as __py_adder) is detected, it converts it into a binOpInstr for execution.

For instructions, the dispatch functions include:
RESET: Clears or resets a function call.
ASSIGNMENT: Binds the top value from the stack to a variable in the environment.
UNARY_OP / BINARY_OP: Pops operands and executes the corresponding operation.
POP: Pops the top value off the stack.
APPLICATION: Handles function calls or built-in function calls, checks the number of arguments, and executes closures or built-in logic.
BRANCH: Evaluates the top Boolean value on the stack to choose between executing consequent or alternate.
ENVIRONMENT: Pops scopes based on a specified environment ID to restore a particular environment.

Operator file:
BinaryOperator: First, it defines BinaryOperator, listing all possible binary operators as type constraints, which facilitates matching operators in subsequent functions.

evaluateUnaryExpression: Evaluates unary operators.
For ! (logical NOT) when the value is of type 'bool', it returns the opposite Boolean value.
For - (negation), it returns the corresponding negative value object depending on whether the value is a bigint or a number.
For typeof, it returns a string description.

evaluateBinaryExpression:
This is the core function that handles binary operations, covering the following scenarios:
String concatenation:
If both left and right are strings and the operator identifier is __py_adder, it concatenates the strings directly.
String comparison:
If the identifier is not __py_adder (e.g., >, >=), it eventually returns a { type: 'bool', value: comparisonResult }.
Numeric operations (including number, bigint, complex):
For operators like __py_adder, __py_minuser, __py_multiplier, __py_divider, __py_modder, __py_powerer, it performs addition, subtraction, multiplication, division, modulo, and exponentiation accordingly.
It supports complex numbers by using PyComplexNumber for arithmetic (e.g., when left.type === 'complex').
Type conversion follows Python’s rules during numeric operations, as described in source_1.pdf.
Comparison operations (e.g., >, >=, ===, !==):
If both operands are numbers or both are complex, it directly compares their values.
If one operand is a bigint and the other is a number, or if there are extreme cases like Infinity/-Infinity, it uses pyCompare() for a more precise comparison.
For complex numbers, only === or !== are allowed; otherwise, it triggers an error.

pyCompare:
Follows native Python rules to compare Python integers (big integers) and floats.
Its logic adheres to the implementation inhttps://github.com/python/cpython/blob/main/Objects/floatobject.c.
The function compares a Python integer (big integer) and a float and returns -1, 0, or 1 (for less than, equal, or greater than).
If the float is positive infinity, any finite integer is considered smaller.
If the float is negative infinity, any finite integer is considered larger.
It obtains the sign (negative, zero, or positive) for both numbers. If the signs differ, it returns the result based solely on the sign; if both are zero, it returns 0.
If the integer's value is within the safe range for a C double (|n| ≤ 2^53), CPython converts it to double for a direct floating-point comparison.
For larger integers, it cannot directly convert to double accurately, so it further calculates the float’s exponent (the number of digits in its integer part minus one) and converts the integer to a string to determine its digit count.
If the integer has more digits than the float’s integer part, it returns 1 or -1 based on the sign; if fewer, it returns the opposite.
If both have the same digit count, it uses approximateBigIntString to convert the float into an approximate big integer string; after trimming leading zeros, it compares the lengths and then lexicographically to decide the final order.

approximateBigIntString:
Converts a possibly very large float num into an approximate decimal string. This string is then used in pyCompare for comparison.
It uses scientific notation (via toExponential) to get a string like "3.333333e+49", splits it into mantissa and exponent, removes the decimal point, and then either pads or truncates the mantissa to achieve a rough big integer representation.

Index (IOptions):
Adds a configuration interface IOptions to control the execution behavior of the CSE (Control Stack Evaluator) Machine.

stdlib:
Adds two global mapping tables for built-in objects, used to store Python’s built-in constants and functions.
These constants and functions will be integrated in the next PR update.

dict:
Implements a dictionary class (Python-style) and provides the import declaration filtering utility function filterImportDeclarations.
This is used to process Python import statements, facilitating later module functionality parsing and integration. It is one of the core utility modules for the static analysis (syntax analysis) and module import phase of the interpreter.

@Fidget-Spinner
Copy link
Contributor

I am busy this week, but I might get to it earlier. Please ping me to review this again next monday if I don't reply by then and forget!

@WangYuyang1013
Copy link
Contributor Author

I am busy this week, but I might get to it earlier. Please ping me to review this again next monday if I don't reply by then and forget!

Sure, thanks! I’ll ping you next Monday if I don’t hear back before then. Appreciate your time!

* Python style dictionary
*/
export default class Dict<K, V> {
constructor(private readonly internalMap = new Map<K, V>()) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that Python dictionaries guarantee insertion order since 3.6 in the language documentation.

If I understand correctly, map also maintains insertion order on iteration. So this should be fine I hope.




// export function evaluateBinaryExpression(operator: BinaryOperator, left: any, right: any) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this.

};
}
} else {
// numbers: only int and float, not bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok this differs from Python where bool is subclass of int. However, I think in our teaching language, this is a good choice.

* we achieve a Python-like ordering of large integers vs floats.
*/

function pyCompare(int_num : any, float_num : any) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having a single operator, we should follow Python convention and call __eq__ __lt__ or __gt__ and such, then each type will have the comparison overloaded. However I understand you won't have time to do that in this PR. So please add a comment somewhere saying that.

@WangYuyang1013
Copy link
Contributor Author

Thanks for the feedback! I’ve added a comment above pyCompare to clarify that this is a temporary implementation. I’ve also noted the intention to replace it with proper eq, lt, etc., method dispatching in future updates.
Let me know if there’s anything else you'd like to see adjusted!

@WangYuyang1013 WangYuyang1013 marked this pull request as ready for review April 13, 2025 14:18
@WangYuyang1013 WangYuyang1013 merged commit e6d8a95 into main Apr 13, 2025
4 checks passed
@WangYuyang1013 WangYuyang1013 deleted the cse-machine-core branch April 13, 2025 14:18
@Fidget-Spinner
Copy link
Contributor

@WangYuyang1013 are there any more PRs left to merge?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants