-
Notifications
You must be signed in to change notification settings - Fork 44
Add basic IDL pretty-printer with AST indentation (Option 2 from #650) #772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add basic IDL pretty-printer with AST indentation (Option 2 from #650) #772
Conversation
Hi @dhower-qc, @kbroch-rivosinc, and @ThinkOpenly, Approach Blocker Question Once that’s clear, I’ll wire up the formatting tool fully. |
From the pre-commit edits it looks like you are only formatting
Is there any consideration to format the IDL in the yaml files like this: https://github.com/riscv-software-src/riscv-unified-db/blob/main/arch/inst/I/ecall.yaml#L32 |
That was my goal in opening #650. I'm a bit outside my comfort zone, but how would we expect it to work? Would there be a new pre-commit hook, defined similarly to the hook defined here, but operating on YAML files? It would load the YAML files, look for fields that contain YAML? (Maybe it needs to know which fields to look for, because they are just defined as being "string" type.) Then, it formats the string as IDL, and... reports if there are differences between the original string and the formatted string? (It can't magically fix the YAML, can it?) |
That's how I was thinking it would work. The potentially tricky part is that there's effectively two formatters running on the IDL code in the yaml file and if they don't agree problems occur. I think prettier will leave the IDL string alone as long as:
|
Definitely on the right track for the vision. For the parser: The grammar is defined in https://github.com/riscv-software-src/riscv-unified-db/blob/main/lib/idl/idl.treetop. The AST gets constructed from that. For example: riscv-unified-db/lib/idl/idl.treetop Lines 49 to 53 in 808077a
When parsing, a match for rule riscv-unified-db/lib/idl/ast.rb Lines 1292 to 1300 in 808077a
Which exists just to define a You can invoke this all using the functions in https://github.com/riscv-software-src/riscv-unified-db/blob/main/lib/idl.rb. For example, to get the AST for an entire ast = Idl::Compiler.compile_file(path) For the following, you can use a cfg_arch = cfg_arch_for('_') # for pretty printing, you probably want the 'unconfigured' config '_' To get the AST for an instruction ast = cfg_arch.instruction('ld').operation_ast # internally, calls Idl::Compiler.compile_inst_operation For a CSR ast = cfg_arch.csr('mstatus').sw_read_ast For a CSR field ast = cfg_arch.csr('mstatus').field('SD').type_ast Similar for |
|
Thanks @dhower-qc That clears things up. I’ll update my formatter to use Idl::Compiler.compile_file(path) for parsing and will explore YAML support via ConfiguredArchitecture next. Appreciate the help! |
@dhower-qc, |
Great. Now we need to figure out how to retain comments and possibly intentional whitespace (like blank lines between statements). Currently, comments and whitespace are both parsed but then thrown away (don't make it into the AST). Warning, this isn't going to be simple. The first step is to create AST nodes for comments and whitespace. As we add comments/intentional whitespace in the AST, we have to make decisions about what they are attached to in the tree. Take statements as an example. There are at least four scenarios to account for:
Intuitively, we probably want the pre-comment and post-comment attached to the StatementAst node. But where do the free-floating comments go? When we see statements, they are usually (always?) part of a list of statements in the AST (e.g., an IfBody points to a list of statements). Now that list probably becomes a combined list of statements and comments (and intentional whitespace). That's just an example; we have deal with this anywhere there is whitespace. Consider:
All of this is going to require changing the grammar structure and the Ast node classes. Take-away: formatters aren't easy, even for 'simple' languages :) There may be a better way to approach this that I'm not thinking about. I'm also not opposed to changing the language a bit if we think of a way to make things easier (random example; maybe it's easier to attach a post-comment if they all have to start with |
This PR follows Option 2 from issue #650 to implement a simple scope-based pretty-printer for IDL using the existing AST.
pretty_idl(indent = 0)
methods to AST classes inlib/idl/ast.rb
tools/format_idl.rb
to format.idl
files via ASTissue #650