-
Notifications
You must be signed in to change notification settings - Fork 42
Open
Labels
algorithmexperimentThis is an experiment and might not be released 🧪This is an experiment and might not be released 🧪
Description
Running the Plutokiller
dataset (1000+ files) takes 45 seconds:
- 25 seconds are spent tokenizing
- 20 seconds are spent calculating all pairs
During pair calculation, we create all
Disabling the LCS calculation shaves down the time spent calculating the pairs to 50 ms. As we don't use the LCS for the initial visualizations, we could easily skip this step.
However, we will need the LCS if the user would go to the pairs (or similar) page, so there could be a worker that starts calculating all the LCS' once the user opens the browser.
Changes needed
- Lib: Do not calculate the LCS when constructing a new pair. This could be easily implemented without API changes by making
longest
a getter that is lazily calculated. - CLI: Disable writing the longest fragment to the output CSV. As this is a change of the CLI API, we likely want this to be an option that can be enabled again.
- Web: change the
Pair
model and store to reflect this change.- We still want to support CSV's with the longest fragment calculated. We already approach lazily calculating
fragment
s in a similar way by only calculating them when they are required. - We cannot wait for calculating the LCS until it is needed, because we do need all of them at once and that can take 20 seconds. Once the data is loaded, we can start a worker in the background that calculates the LCS (using the methods already provided by
dolos-core
. - Implement loading states for components that do use the longest fragment, as the user can possibly still go to these pages.
- Optionally: we could cache the LCS data using localstorage in order to prevent this data from being recalculated every time the page is loaded again.
- We still want to support CSV's with the longest fragment calculated. We already approach lazily calculating
Metadata
Metadata
Assignees
Labels
algorithmexperimentThis is an experiment and might not be released 🧪This is an experiment and might not be released 🧪