Skip to content

Expanding the Syntax Extractor #313

@daomcgill

Description

@daomcgill

Purpose

The Syntax Extractor in Kaiaulu is used to extract meaningful information from source code using srcML. The purpose of this task is to extend the syntax extraction capabilities by adding new functions to extract file-level and class-level documentation using XPath queries. The extracted data will be used in future stages, where the goal is to combine the data with NLP to create semantic representations of the code.

Process

  1. Understand the current syntax extractor functions, and how annotations are represented in srcML XML files.
  2. Understand XPath queries, and create queries for the new functions (file and class level documentation).
  3. Implement new functions with the custom XPath queries.
  4. Test on various examples.
  5. Consider other syntactic elements that could be useful for the bigger picture. Example: Functions in git.R that retrieve commit messages or issue discussions tied to a file.
  6. Create a notebook for Syntax Extraction, and maintain it with existing/ new functionality.

Existing Functions

New Functions

  • query_src_text_file_documentation(): Extract file-level documentation (e.g. comments in the file header).
  • query_src_text_class_documentation(): Extract class-level documentation (e.g. comments before class declaration).

Task List

  • Install and set up srcML. This takes source code and add XML annotations (e.g. classes, methods, variables, etc.). Include the path to srcML in tools.yml.
  • Use the existing functions to generate XML files. Inspect the files to understand the output.
  • Gain an understanding of how XPath queries are formed, and figure out how to create queries for the new functions.
  • Write the new XPath queries.
  • Write the new functions.
  • Verify that the functions work correctly.
  • Consider what other functions could be useful, and implement those.
  • Maintain a notebook for Syntax Extraction.

References


Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions