-
Notifications
You must be signed in to change notification settings - Fork 15
Open
Milestone
Description
Purpose
The Syntax Extractor in Kaiaulu is used to extract meaningful information from source code using srcML. The purpose of this task is to extend the syntax extraction capabilities by adding new functions to extract file-level and class-level documentation using XPath queries. The extracted data will be used in future stages, where the goal is to combine the data with NLP to create semantic representations of the code.
Process
- Understand the current syntax extractor functions, and how annotations are represented in srcML XML files.
- Understand XPath queries, and create queries for the new functions (file and class level documentation).
- Implement new functions with the custom XPath queries.
- Test on various examples.
- Consider other syntactic elements that could be useful for the bigger picture. Example: Functions in git.R that retrieve commit messages or issue discussions tied to a file.
- Create a notebook for Syntax Extraction, and maintain it with existing/ new functionality.
Existing Functions
- annotate_src_text(): Runs srcML on a folder of source code and outputs the annotated XML.
- query_src_text(): Runs an XPath query on the annotated XML generated from annotate_src_text().
- query_src_text_class_names(): Extracts class names from the annotated source code.
- query_src_text_namespace(): Extracts the namespace (file path) from the annotated source code.
New Functions
- query_src_text_file_documentation(): Extract file-level documentation (e.g. comments in the file header).
- query_src_text_class_documentation(): Extract class-level documentation (e.g. comments before class declaration).
Task List
- Install and set up srcML. This takes source code and add XML annotations (e.g. classes, methods, variables, etc.). Include the path to srcML in tools.yml.
- Use the existing functions to generate XML files. Inspect the files to understand the output.
- Gain an understanding of how XPath queries are formed, and figure out how to create queries for the new functions.
- Write the new XPath queries.
- Write the new functions.
- Verify that the functions work correctly.
- Consider what other functions could be useful, and implement those.
- Maintain a notebook for Syntax Extraction.
References
- Syntax Extractor in Kaiaulu
- API for Syntax Extraction
- Example XPath Query
- Issue #206 Add Source Code Text Parser Module
- Text GoF Patterns Notebook (see what srcML does for this
- srcML slides
Metadata
Metadata
Assignees
Labels
No labels