Skip to content

Add basic IDL pretty-printer with AST indentation (Option 2 from #650) #772

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

syedowaisalishah
Copy link
Contributor

This PR follows Option 2 from issue #650 to implement a simple scope-based pretty-printer for IDL using the existing AST.

  • Added pretty_idl(indent = 0) methods to AST classes in lib/idl/ast.rb
  • Created CLI tool at tools/format_idl.rb to format .idl files via AST

issue #650

@syedowaisalishah
Copy link
Contributor Author

Hi @dhower-qc, @kbroch-rivosinc, and @ThinkOpenly,
I’ve implemented a basic pretty-printer based on the AST, following Option 2 from issue #650. The goal is to ensure proper indentation of .idl files using the existing AST structure.

Approach
I added pretty_idl(indent = 0) methods to all relevant AST classes and created a CLI tool (tools/format_idl.rb) that formats .idl files using the AST. This avoids tracking comments/whitespace, staying consistent with Option 2’s simpler scope-based formatting path.

Blocker
I'm currently stuck because I couldn't find how .idl files are parsed into AST nodes — no parser.rb, IDL.parse, or similar entry point is present in the repo.

Question
Could you confirm:
Is my approach consistent with what you envisioned for Option 2?
Is there an existing parser function or method I can use to parse .idl files into AST nodes?
If not, what would be the recommended next step?

Once that’s clear, I’ll wire up the formatting tool fully.
Thanks!

@kbroch-rivosinc
Copy link
Collaborator

From the pre-commit edits it looks like you are only formatting *.idl files of which there are only:

❯ fd \.idl$
arch/isa/fetch.idl
arch/isa/interrupts.idl
arch/isa/util.idl
arch/isa/fp.idl
arch/isa/builtin_functions.idl

Is there any consideration to format the IDL in the yaml files like this: https://github.com/riscv-software-src/riscv-unified-db/blob/main/arch/inst/I/ecall.yaml#L32

@ThinkOpenly
Copy link
Collaborator

it looks like you are only formatting *.idl files
Is there any consideration to format the IDL in the yaml files

That was my goal in opening #650.

I'm a bit outside my comfort zone, but how would we expect it to work? Would there be a new pre-commit hook, defined similarly to the hook defined here, but operating on YAML files? It would load the YAML files, look for fields that contain YAML? (Maybe it needs to know which fields to look for, because they are just defined as being "string" type.) Then, it formats the string as IDL, and... reports if there are differences between the original string and the formatted string? (It can't magically fix the YAML, can it?)

@kbroch-rivosinc
Copy link
Collaborator

I'm a bit outside my comfort zone, but how would we expect it to work? Would there be a new pre-commit hook, defined similarly to the hook defined here, but operating on YAML files? It would load the YAML files, look for fields that contain YAML? (Maybe it needs to know which fields to look for, because they are just defined as being "string" type.) Then, it formats the string as IDL, and... reports if there are differences between the original string and the formatted string? (It can't magically fix the YAML, can it?)

That's how I was thinking it would work. The potentially tricky part is that there's effectively two formatters running on the IDL code in the yaml file and if they don't agree problems occur. I think prettier will leave the IDL string alone as long as:

  • whitespace before first non-whitespace char is maintained as prettier wants it
  • line length isn't exceeded

@dhower-qc
Copy link
Collaborator

Definitely on the right track for the vision.

For the parser:

The grammar is defined in https://github.com/riscv-software-src/riscv-unified-db/blob/main/lib/idl/idl.treetop. The AST gets constructed from that. For example:

rule bitfield_definition
'bitfield' space* '(' space* int space* ')' space* user_type_name space* '{' space*
e:(field_name space+ range:(int lsb:(space* '-' space* int)?) space+)+ space*
'}' <Idl::BitfieldDefinitionSyntaxNode>
end

When parsing, a match for rule bitfield_definition will construct an Idl::BitfieldDefinitionSyntaxNode:

class BitfieldDefinitionSyntaxNode < Treetop::Runtime::SyntaxNode
def to_ast
fields = []
e.elements.each do |f|
fields << BitfieldFieldDefinitionAst.new(f.input, f.interval, f.field_name.text_value, f.range.int.to_ast, f.range.lsb.empty? ? nil : f.range.lsb.int.to_ast)
end
BitfieldDefinitionAst.new(input, interval, user_type_name.to_ast, int.to_ast, fields)
end
end

Which exists just to define a to_ast method that returns BitfieldDefinitionAst.

You can invoke this all using the functions in https://github.com/riscv-software-src/riscv-unified-db/blob/main/lib/idl.rb.

For example, to get the AST for an entire .idl file:

ast = Idl::Compiler.compile_file(path)

For the following, you can use a ConfiguredArchitecture object to get the AST of YAML-defined IDL:

cfg_arch = cfg_arch_for('_')  # for pretty printing, you probably want the 'unconfigured' config '_'

To get the AST for an instruction operation():

ast = cfg_arch.instruction('ld').operation_ast  # internally, calls Idl::Compiler.compile_inst_operation

For a CSR sw_read():

ast = cfg_arch.csr('mstatus').sw_read_ast

For a CSR field type():

ast = cfg_arch.csr('mstatus').field('SD').type_ast

Similar for reset_value_ast, sw_write_ast.

@dhower-qc
Copy link
Collaborator

From the pre-commit edits it looks like you are only formatting *.idl files of which there are only:

❯ fd \.idl$
arch/isa/fetch.idl
arch/isa/interrupts.idl
arch/isa/util.idl
arch/isa/fp.idl
arch/isa/builtin_functions.idl

Is there any consideration to format the IDL in the yaml files like this: https://github.com/riscv-software-src/riscv-unified-db/blob/main/arch/inst/I/ecall.yaml#L32

arch/isa/globals.isa is also an IDL file. For better or worse, it has a different extension to mark that it is the special top-level file for compilation (you'll see the other files get included from it).

@syedowaisalishah
Copy link
Contributor Author

Thanks @dhower-qc That clears things up. I’ll update my formatter to use Idl::Compiler.compile_file(path) for parsing and will explore YAML support via ConfiguredArchitecture next. Appreciate the help!

@syedowaisalishah
Copy link
Contributor Author

@dhower-qc,
Thanks again for your guidance earlier!
I’ve completed implementing the AST-based pretty-printer using the simpler indentation-based approach (Option 2).
Could you please review the implementation to confirm if it aligns with the intended direction?
Also, what’s the best way to verify that the formatting logic is working correctly?
Thanks!

@dhower-qc
Copy link
Collaborator

Great. Now we need to figure out how to retain comments and possibly intentional whitespace (like blank lines between statements). Currently, comments and whitespace are both parsed but then thrown away (don't make it into the AST).

Warning, this isn't going to be simple.

The first step is to create AST nodes for comments and whitespace. As we add comments/intentional whitespace in the AST, we have to make decisions about what they are attached to in the tree.

Take statements as an example. There are at least four scenarios to account for:

# this is a pre-comment for the statement
X[xd] = X[xs1] + X[xs2];
# this is a free-floating comment

X[xd] = X[xs1] + X[xs2];
X[xd] = X[xs1] + X[xs2]; # this is a post-comment for the statement
X[xd] = X[xs1] + X[xs2];

# this is a free-floating comment

Intuitively, we probably want the pre-comment and post-comment attached to the StatementAst node. But where do the free-floating comments go? When we see statements, they are usually (always?) part of a list of statements in the AST (e.g., an IfBody points to a list of statements). Now that list probably becomes a combined list of statements and comments (and intentional whitespace).

That's just an example; we have deal with this anywhere there is whitespace. Consider:

if (A # comment for A
    || B# comment for B
    # random comment
 # another random comment
    && C) # comment for if?
{
}

All of this is going to require changing the grammar structure and the Ast node classes.

Take-away: formatters aren't easy, even for 'simple' languages :)

There may be a better way to approach this that I'm not thinking about. I'm also not opposed to changing the language a bit if we think of a way to make things easier (random example; maybe it's easier to attach a post-comment if they all have to start with #< instead of just #; or maybe we find there is a verilog formatter out there that would work with some small changes)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants