GSoC 2024 ‐ Snehil Shah

About me

Hey there! I am Snehil Shah, a computer science undergraduate (as of writing this) at the Indian Institute of Information Technology, Nagpur, India. Apart from my interest in computers and software, I have a dormant passion for audio DSP and synthesis.

Project overview

The read-eval-print loop (REPL) is a fixture of data analysis and numerical computing and provides a critical entry-point for individuals seeking to learn and better understand APIs and their associated behavior. For a library emphasizing numerical and scientific computing, a well-featured REPL becomes an essential tool allowing users to easily visualize and work with data in an interactive environment. The stdlib REPL is a command-line based interactive interpreter environment for Node.js equipped with namespaces and tools for statistical computing and data exploration enabling easy prototyping, testing, debugging, and programming.

This project aimed to implement a suite of enhancements to the stdlib REPL to achieve feature parity with similar environments for scientific computing such as IPython and Julia. These enhancements include:

Fuzzy auto-completion
Syntax highlighting
Visualization tools for tabular data
Multi-line editing
Paged outputs
Bracketed-paste
and more...

Project recap

My work on the REPL started before the official coding period began. Before that, I had contributed to some good first issues (implemented some easier packages, C implementations, and refactorings) to get a gist of project conventions and contribution flow. Since then, we have had an array of improvements to the REPL. Let's go through each of them from the beginning:

Completed work

Auto-closing brackets/quotations

My first work on the REPL was implementing auto-closing brackets/quotations, a common feature in IDEs and code editors.
- #1680 - feat: add support for auto-closing brackets/quotations in the REPL
Steps:
- The approach is to walk the abstract syntax tree (generated by acorn) for the current line and detect an auto-closing candidate. If found we just write the corresponding auto-closing symbol to the output.
- Now there are also cases where the user instinctively types the closing symbol themselves as if this feature never existed, and we should respect that and allow the user to type through the auto-appended symbol.
- What about auto-deleting the appended symbol? When deleting an opening symbol, check if the corresponding closing symbol follows it. If it does, time to delete it. We need to avoid this behavior, if the user is editing a string literal and we again use acorn to find if the nodes around the cursor are strings.
As my first PR on the REPL, it wasn't the safest landing, with @kgryte doing most of the heavy lifting. We did get it through the finish line after a month of coding and review cycles, and by this time I had a good grasp of the REPL codebase.
Pager

Earlier, when an output was longer than the terminal height, it would scroll all the way to the end of the output. This meant the user had to scroll all the way back up to start reading the output. The pager aims to capture long outputs and display them in a scrollable way.
- #2162 - feat: add pager to allow scrolling of long outputs in the REPL
In general, pagers are simply implemented by halting the printing till the terminal height and waiting for user input to print further. But I wanted to do it differently. With our UI, we page in-place, meaning the pager appears like a screen, and we can still see the parent command on the top. The only downside to this might be the possible jittering of output as we rely on re-rendering the page upon every scroll.

Steps:
1. Detecting a pageable output. We do this by checking if the number of rows in the output is greater than the height of the terminal stream (including space for the input command).
2. Write the page UI, and maintain the page indexing. During paging mode, the entire REPL is frozen, and is only receptive to pager controls and SIGINT interrupts.
3. As we receive the page up/down controls, update the indices and re-render the page UI.
4. Listen to SIGWINCH events to make it receptive to terminal resizes.
Maintenance work:
- #2205 - fix: remove SIGWINCH listener upon closing the REPL
- #2293 - fix: resolve incorrect constraints for scrollable height in the REPL's pager
New REPL, New Art
- #2178 - feat: add a stdlib ASCII art in REPL's default welcome message
Time for a REPL makeover with some new ASCII art.
- Before:
- After:
Syntax highlighting

One of the most requested and crucial additions to the REPL was syntax highlighting.
- #2254 - feat: add syntax highlighting in the REPL
  
  This PR adds the core modules for syntax highlighting, namely the tokenizer, and highlighter.
  
  Steps:
  1. With every keypress, capture the updated line.
  2. Check if the updated line is changed. This is a short caching mechanism to avoid perf drag during events like moving left/right in the REPL.
  3. Tokenization. To support various token types, a good tokenizer is crucial. We use acorn to parse the line into an abstract syntax tree. During parsing, we keep a record of basic tokens like comments, operators, punctuation, strings, numbers, and regexps. To resolve, declarations, we resolve all declarations (functions, classes, variables etc) in the local scope (not yet added to global context) by traversing the AST. To resolve all identifiers, we resolve the scopes in the order local > command > global. To resolve member expressions, we recursively traverse (and compute where needed) the global context to tokenize each member.
  4. Highlight. Each of the token types is then colored accordingly using ANSI escape sequences, and the line is re-rendered with the highlighted line.
- #2291 - feat: add APIs, commands and tests for syntax-highlighting in the REPL
  
  A follow-up PR adding REPL prototype methods, in-REPL commands, and test suites for the syntax highlighter. This adds various APIs for theming in the REPL, allowing the user to configure it with their own set of themes. Another small thing I took take care of is to disable highlighting in non-TTY environments.
- #2341 - feat: add combined styles and inbuilt syntax highlighting themes in the REPL
  
  This PR adds support for combining ANSI colors and styles to make hybrid colors. So something like italic red bgBrightGreen is supported. This will allow for more expressive theming. It also adds more in-built themes.
Maintenance work:
- #2284 - fix: resolve bug with unrecognized keywords in the REPL's syntax-highlighter
- #2290 - fix: resolve clashes between syntax-highlighter & auto-closer in the REPL
- #2412 - fix: prevent property access if properties couldn't be resolved when syntax highlighting in the REPL
Multi-line editing

Prior to this, the REPL did support multi-line inputs using incomplete statements, but no way to edit them. Adding multi-line editing meant adding support for adding lines manually, and the ability to go up and edit like a normal editor.
- #2347 - feat: add multiline editing in the REPL
Implementing this is not as easy as it seems. Initially, I thought, just updating the _rli instance and using escape sequences with the updated lines by tracking each up/down keypress event would do the trick. But internally, readline refreshes the stream after operations like left/right/delete etc. This meant, if we were at line 2 and the stream was refreshed, everything below that line was gone. So, to actually implement this, we had to implement manual rendering with each keypress event.

Steps:
1. Track each keypress event like up/down/right/left, backspace (for continuous deletion), and CTRL+O (for manually adding a new line), if the input is a multi-line input.
2. Maintain line and cursor indices, and highlighted line buffers to store rendering data.
3. After every keypress event, visually render the remaining lines below the current line.
4. Maintain the final _cmd buffer for final execution.
Unicode table plotter

A plot API for visualizing tabular data can be leveraged for downstream tasks like TTY rendering in the REPL or even in jupyter environments allowing users to easily work with tabular data when doing data analysis in the REPL (or elsewhere).
- #2407 - feat: add plot/table/unicode
The plot API supports data types like Array<Array>, MatrixLike (2D <ndarray>), Array<Object>, and Object. The API is highly configurable giving users full power over how the render looks like instead of giving them a pre-defined set of presets. This is how the default render looks like:
```
┌───────┬──────┬───────┐
│  col1 │ col2 │  col3 │
├───────┼──────┼───────┤
│    45 │   33 │ hello │
│ 32.54 │ true │  null │
└───────┴──────┴───────┘
```
The plotter also supports the one-of-a-kind, wrapping tables which allows breaking the table into segmented sub-tables when given appropriate maxOutputWidth prop values.

Implementing this API has been tedious (as evident from the PR footprint) mainly because of the number of properties and signatures it needs, to parse various datatypes and support this level of configurability.
Fuzzy completions

The initial scope was to just implement the fuzzy completions extension, but upon further discussion (and me having a lot of free time), we ended up re-writing an entirely new & improved completer engine from scratch.
- #2463 - feat: add UX to cycle through completions in the REPL
The new engine allows highlighted completions, persisted drawer, and the ability to cycle through the completions using arrow keys.

Implementation details:
1. A renderer for the drawer view creates the grid-like view from available completions, with a maintained current completion index highlighted using ANSI escape sequences.
2. Keypress events like up/down/left/right are tracked for navigation, and with each movement, the drawer is re-rendered and the selected completion is inserted to the current line.
3. The engine is also receptive to SIGWINCH events meaning the terminal can be resized without any distortion.
- #2493 - feat: add fuzzy completions extension in the REPL
Fuzzy matching involves finding approximate completions that the user might be trying to type. This is different than fuzzy search as the length of the final completion shouldn't affect its relevancy.

I wrote a fuzzy matching algorithm taking the following variables into account:
1. Casing mismatch - Low penalty
2. The first letter doesn't match - Medium penalty
3. Gap between matching characters in the completion - Medium penalty
4. Gap before the input was first encountered in the completion - Low penalty (as it already incurred the penalty from the 2nd clause)
5. Missing character from the input in the completion - High penalty. This is generally not taken into account in most fuzzy completion algorithms but it can help detect spelling mistakes. The only downside being it increases the time complexity of the algorithm as we would still have to traverse the completion string even after a character was found to be missing.
The fuzzy matching algorithm is based on a penalty-based scoring mechanism that negatively scores the completions for unfavorable characteristics in the completion string (mentioned above) that would make it a less ideal match for the input.
General Bug Fixes