Replies: 8 comments 6 replies
-
Cool; this sounds about right! Seems like a fun project. Here are a couple of comments:
The other thing to answer here is: where will the benchmarks come from? Presumably, you'll need both the queries and the underlying JSON documents.
It's a good question re. whether LLVM is the right tool for the job. I don't think the statically vs. dynamically typed question actually matters for the LLVM question, however. If your goal is to generate native code, then at some point, you will need to generate a very "static" and low-level program representation—i.e., something close to assembly. At this point, your choices are: (1) generate actual x86 assembly directly, (2) generate LLVM IR and let it translate that to assembly/machine code, or (3) use some other code generator that plays the same role as LLVM would in option #2 (e.g., Cranelift, DynASM, QBE). All of these seem perfectly reasonable. The main trade-offs are engineering ones: generating x86 assembly will fundamentally involve doing more stuff (e.g., you have to implement register allocation yourself, whereas LLVM or any similar alternative does that for you), but LLVM is more complicated: it's a heavyweight dependency and has a lot of "surface area" one needs to learn to be productive.
I totally think you should just do the backend! There is a great opportunity here to reuse the existing bytecode inside the jq implementation… you can leave the parsing to them, and even some run-time support, and focus only on generating the code. You could implement your own parser too, but I would keep this as a backup plan if it turns out to be too complicated to use the "official" frontend. |
Beta Was this translation helpful? Give feedback.
-
Checkpoint 1: A Minimally Working "Compiler"Table of Contents
Current Pipeline
At the time of this writing, the above diagram depicts the flow of execution from reading command line inputs to executing the program. In the beginning of the pipeline, both the JQ source program and the input JSON data are read into the compiler's entry point and immediately passed to the compiler frontend. Using the same frontend used in the JQ interpreter, JQ source code is compiled to bytecode. Additionally, JSON data is stored in a more convenient struct at this point. The bytecode, JSON data, and additional JQ flags are saved into a global struct called cjq_state before control returns to the compiler entry point. Next, a compiled LLVM IR function called jq_program is called. At this point, all jq_program does is call the C function cjq_execute which executes the JQ program based on data stored in cjq_state. NOTE: cjq_execute, is simply a wrapper function that invokes the JQ interpreter's VM. cjq_execute is also responsible for passing data from cjq_state to the VM. ToolsThe tools used to implement the CJQ compiler so far have been the programming languages C and Python as well as the Python library llvmlite. The bulk of the compiler functionality required for actual program execution is written in C, as I believe implementing the stack-based execution functionality and all requisite data structures using something as low-level as LLVM IR would be error-prone. Python is used to build an LLVM IR program that makes calls to these C functions. Specifically, a Python library called llvmlite is used to build an LLVM IR program which compiles to machine code using clang. Next stepsAs it stands, the CJQ "compiler" produces a compiled program that simply makes a call to a function that executes the JQ program in the same way the JQ interpreter executes the JQ program. While this alone accomplishes nothing performance-wise, the work done so far has opened the door for potential performance gains and increased reusability. To implement the improvements I have in mind, I first need to split the CJQ pipeline into two main stages.
Here is an example of the LLVM output I have in mind. Note that these "opcode-functions" still need to be implemented in C: ; ModuleID = '<string>'
source_filename = "<string>"
target triple = "x86_64-unknown-linux-gnu"
define void @jq_program() {
entry:
call void @_opcode_TOP()
call void @_opcode_SUBEXP_BEGIN()
call void @_opcode_PUSHK_UNDER()
call void @_opcode_INDEX()
call void @_opcode_SUBEXP_END()
call void @_opcode_SUBEXP_BEGIN()
call void @_opcode_PUSHK_UNDER()
call void @_opcode_INDEX()
call void @_opcode_SUBEXP_END()
call void @_opcode_CALL_BUILTIN_plus()
call void @_opcode_RET()
call void @_opcode_BACKTRACK_RET()
call void @_cjq_execute()
ret void
}
declare void @_opcode_TOP()
declare void @_opcode_BACKTRACK_TOP()
declare void @_opcode_SUBEXP_BEGIN()
declare void @_opcode_BACKTRACK_SUBEXP_BEGIN()
declare void @_opcode_PUSHK_UNDER()
declare void @_opcode_BACKTRACK_PUSHK_UNDER()
declare void @_opcode_INDEX()
declare void @_opcode_BACKTRACK_INDEX()
declare void @_opcode_SUBEXP_END()
declare void @_opcode_BACKTRACK_SUBEXP_END()
declare void @_opcode_CALL_BUILTIN_plus()
declare void @_opcode_BACKTRACK_CALL_BUILTIN_plus()
declare void @_opcode_RET()
declare void @_opcode_BACKTRACK_RET()
declare void @_cjq_execute()
I'd like to add that I've borrowed a collection of JQ programs that were used for testing the JQ interpreter. I'm currently finding appropriate JSON programs for the given tests and will use these tests to confirm the CJQ compiler produces the same results as the JQ interpreter, and later to compare the performance of each implementation. QuestionsFirst, do you agree with my overall approach in the 'Next Steps' section above? Second - I'm currently just producing an LLVM module that calls a C function called _cjq_execute which executes the jq bytecode exactly how the VM currently does. This is what the current LLVM IR looks like: ; ModuleID = '<string>'
source_filename = "<string>"
target triple = "x86_64-unknown-linux-gnu"
define void @jq_program() {
entry:
call void @_cjq_execute()
ret void
}
declare void @_cjq_execute() My next milestone is to produce a sequence of function calls in LLVM to C functions that perform the same actions that the VM would perform while reading the sequence of bytecode instructions. Eventually, I'd like to somehow inline these C functions so the stack-based logic would truly be in LLVM. I'm not sure how important this truly is. It feels like it would open the door for more optimization opportunities but if clang already performs all these optimizations at compile time (i.e. when I compile the ir.ll file with the rest of the C code required for reading in JSON data and pass it to the main function in the LLVM IR) then I don't want to waste my time. So my question is: Is trying to inline functions worth it? |
Beta Was this translation helpful? Give feedback.
-
UPDATE: Currently implementing opcode functions in granular branch. Some refactoring was required to ensure flow of execution matched that of the JQ interpreter. Per basic functionality test script, CJQ is able to match JQ output for all opcodes that have been implemented in CJQ so far. My current plan is to implement all examples from the JQ manual as test cases which I believe should cover most, if not all, of the standard JQ functionality. I think following the manual will allow me to use more of a test-driven development approach to implementing the opcode functions. Once the opcode functions are implemented, I plan to move on to measuring CJQ performance and comparing to the standard JQ implementation. EDIT: I've also been looking into LLVM's inlining capabilities and believe I can accomplish this by simply passing the correct flags to clang when I compile. Will investigate this further shortly. |
Beta Was this translation helpful? Give feedback.
-
Sounds great! I'd be interested to see how those tests turn out. 😃 |
Beta Was this translation helpful? Give feedback.
-
All tests are passing! That is, the outputs produced by CJQ match the outputs produced by the standard JQ implementation for all examples from the JQ manual. The last few things I wanted to get to before wrapping up are (i) serialization/deserialization (ii) ensuring that the generated LLVM IR inlines the bodies of the opcode functions being called and (iii) comparing the performance of CJQ to the standard JQ implementation. I'm having some difficulties with part (i) and I'm hoping to get your take on the issue I'm running into. For background, I'm trying to serialize and deserialize a list of compiled_jq_state structures. I'm trying to do this because 1. I'm currently compiling source JQ code to bytecode in both the tracing stage and the execution stage (without going into too much detail, I'm compiling the jq source to bytecode in the execution stage to set up the stack properly before the sequence of opcode functions are called). I'd like to serialize so I don't compile a second time, and 2. CJQ currently only works for one input (e.g. one JSON file or one JSON text string from stdin). I'd like to extend this to support more than one input to match the standard implementation's flexibility. Here's what a compiled_jq_state struct looks like
One of the members of a compiled_jq_state struct is a jq_state struct - here's what a jq_state struct looks like:
The issue I'm having is a well-known C issue - serializing a struct with pointers is tricky. There's no point in serializing the pointers since the addresses they point to won't hold the same data when the pointers are deserialized. To make things more confusing, jq_state has a function pointer member - which is a whole other can of worms when it comes to deserialization. One option I have is to come up with my own data layout, serialize all the data pointed to by the pointers in compiled_jq_state, and ensure that when I deserialize, I assign each data field in my data layout to the correct pointer. This will be tricky but (I think) doable. Also, I'm leaning towards ignoring function pointers for now. These are used for debugging purposes anyway. They'll be nice to have in the future, but I don't think they're necessary at this point in time. What are your thoughts? Please feel free to let me know if you see a better approach 😆 |
Beta Was this translation helpful? Give feedback.
-
Wow, that's awesome that it's working so far! Interesting problem w/r/t serializing this state. I guess one big question I have is: do you really need the entire Anyway, if not, you are probably right that inventing a custom data layout is the way to go, although it sounds somewhat painful. You could also imagine trying to use an off-the-shelf serialization format just to simplify things, at the cost of some performance… for example, even JSON itself would be a possible candidate. |
Beta Was this translation helpful? Give feedback.
-
I think you're right! I believe the bytecode struct member of jq_state is the only data structure that needs to be serialized. I think this is the case because, by definition, the bytecode struct is the structure initialized after the jq source code is compiled down to bytecode - so this data is obviously needed prior to the first and all subsequent executions. If there are multiple inputs, the stack should look the same before each execution since no instructions will have been executed at that point. I'm going to get started on serializing the bytecode struct. I'll keep you updated 😀 |
Beta Was this translation helpful? Give feedback.
-
UpdateHi @sampsyo - so after a lot of hacking together, testing, and debugging - serialization and deserialization works! And I'm very happy to say that you were right and that the bytecode struct was the only data structure that was required to be serialized in order to maintain the necessary information to run the jq programs without recompiling 😄 To refresh you (and myself) on the current state of the compilation pipeline, here's a brief summary:
For example,
Each of the above subsequence functions calls 1 or more opcode functions. This helped with reducing compile time because originally, I was trying to inline all calls to opcode functions which was making the LLVM IR way too big and making the optimization passes either take hours or crash altogether. Now, the
Looking aheadAt this point, I'm not planning to add any additional functionality to the implementation until I've finished the thesis. I would like to get Cmake set up so that CJQ will work on machines other than my own. Aside from this, I've been spending most of my time working on my thesis. Here is my current outline - feel free to let me know if there's anything you'd like me to add, remove, or move around. Thesis OutlineAbstract
BackgroundJQ Background
LLVM Background
Other Bytecode to Machine Code Compilers
Implementation
Performance Evaluation
Conclusion and Future Thoughts-Discuss ways to improve CJQ moving forward - such as tracing without the VM also executing the stream of opcodes. Thanks for reading this very long update 😆 . One quick note is that I'm getting married on 6/26 so I won't be working on this between 6/24 and 6/29. I'm hoping to get a first draft to you before 6/24 but if not, then very soon after! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Proposal for Compiled JQ Implementation
What will you do?
Implement a compiler for the JQ programming language.
How will you do it?
1. Familiarization
2. Design
a. Frontend
i. Which language will you use to write the compiler?
- I'm considering either C or C++… most likely C++ as I'm considering using LLVM tools throughout this project.
ii. How will source programs be parsed?
- Bison will be used for parsing.
b. IR (Intermediate Representation)
i. What will the IR look like?
- LLVM IR if feasible.
- Block/Bytecode representation if LLVM is not suitable.
ii. Will you make any optimizations?
- I will implement as many of the "typical" compiler optimizations as possible. By this I mean: dead code elimination, constant folding, constant propagation, copy propagation, loop unrolling, etc. Essentially, as many of the optimizations covered in 6120 as I can.
c. Backend
i. What will the target be?
- The target will be x86 assembly since I'm using a Windows-AMD machine.
ii. Lowering Process
- The input to the backend will either be Bytecode or LLVM IR - in either case, the plan is to target abstract assembly followed by true assembly after the register allocation pass.
3. Implementation
4. Testing
How will you empirically measure success?
Questions
What do you think of using LLVM? I'm unsure if this would be possible since, from what I understand, LLVM is typically used for statically-typed languages whereas JQ is dynamically-typed.
Do you recommend that I use the existing frontend and just build out the backend?
Beta Was this translation helpful? Give feedback.
All reactions