-
-
Notifications
You must be signed in to change notification settings - Fork 120
fix(grainc): Correct parsetree caching behaviour #2280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix(grainc): Correct parsetree caching behaviour #2280
Conversation
Option.fold(~none=None, ~some=Hashtbl.find_opt(cached_parsetrees), name) | ||
) { | ||
| Some((cached_source, cached_program)) | ||
when cached_source == Hashtbl.hash(source) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hashing the entire parsetree is super expensive. I would use file_older
or some other mechanism to verify that the file hasn't changed and that the cache is valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hashing the entire parsetree is super expensive. I would use
file_older
or some other mechanism to verify that the file hasn't changed and that the cache is valid.
I'm only hashing the source which is just hashing a string right? I am not against using file_older
though but I don't think that would work here because of .compile_string
caching
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not as bad; I thought it was hashing the AST.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope I am essentially just using the source hash to validate if the contents of the name
has changed and we should be using a new parsetree rather than the cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be useful to test this with a large file (maybe like 25k-50k lines) and make sure the hashing isn't really noticeable compared to the parsing. Also, we should make an issue to only keep a number of cached parsetrees, because this is basically a memory leak in its current form.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know if we have any precedent anywhere for timing based tests?
Just as some mental logic hashing itself in ocaml looks to be o(n)
Hashtbl.hash caml_hash and I imagine menhir is o(n)
as well (though I havent checked).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not. They're both O(n), but remember that runtime complexity just describes how performance of the algorithm grows with more input, not actual running. You can have two O(n) algorithms, but one might take 20x longer than the other one. Both being O(n) just means that with more data, that one will still only take 20x longer than the other one.
Strings are easy to hash and have quick, optimized algorithms, whereas hashing a big data structure requires chasing a bunch of pointers, offset calculations, etc. ASTs are also a lot more data than the source strings.
This corrects the behaviour of parsetree caching, we noticed the other day that the lsp wasn't updating prompting #2267. This corrects the root source of that bug by validating the source hashes match when we are caching.
I think with this change we would be safe to remove the compiler reset allowing the lsp to run a little faster however I don't want todo that in this pr as I think we still need to validate the behaviour of the other things we don't reset
DependencyGraph
andEnv
,Ctype levels
,Warnings
,FS_Access
, testing locally everything was working again though.Note: After I made this pr I realized we use
file_older
in the dependencytree to check if something is dirty I'm wondering if instead of my source hashing semantics it would be better to use that api but I would like a bit of feedback before I go and do that.