Skip to content

OCR integration #13313

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 45 commits into
base: main
Choose a base branch
from

Conversation

Kaan0029
Copy link
Contributor

@Kaan0029 Kaan0029 commented Jun 12, 2025

Closes #13267.

Mandatory checks

  • I own the copyright of the code submitted and I license it under the MIT license
  • [.] Change in CHANGELOG.md described in a way that is understandable for the average user (if change is visible to the user)
  • [.] Tests created for changes (if applicable)
  • [.] Manually tested changed features in running JabRef (always required)
  • [.] Screenshots added in PR description (if change is visible to the user)
  • [.] Checked developer's documentation: Is the information available and up to date? If not, I outlined it in this pull request.
  • [.] Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.

@calixtus calixtus changed the title Initial implementation using tess4j OCR integration Jun 12, 2025
@subhramit
Copy link
Member

Your pull request conflicts with the target branch.

Please merge upstream/main with your code. For a step-by-step guide to resolve merge conflicts, see https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/addressing-merge-conflicts/resolving-a-merge-conflict-using-the-command-line.

Tip for future - always take a fresh pull from upstream/main before beginning to work on a branch (if there has been a decent time gap).

@koppor koppor added this to the 6.0 milestone Jul 4, 2025
@jabref-machine
Copy link
Collaborator

Your code currently does not meet JabRef's code guidelines. We use Checkstyle to identify issues. You can see which checks are failing by locating the box "Some checks were not successful" on the pull request page. To see the test output, locate "Tests / Checkstyle (pull_request)" and click on it.

In case of issues with the import order, double check that you activated Auto Import. You can trigger fixing imports by pressing Ctrl+Alt+O to trigger Optimize Imports.

Please carefully follow the setup guide for the codestyle. Afterwards, please run checkstyle locally and fix the issues, commit, and push.

@Kaan0029 Kaan0029 force-pushed the gsoc-ocr-tess4j-initial-implementation branch from 9f91505 to db1f577 Compare July 10, 2025 10:49
@jabref-machine
Copy link
Collaborator

Your code currently does not meet JabRef's code guidelines. We use Checkstyle to identify issues. You can see which checks are failing by locating the box "Some checks were not successful" on the pull request page. To see the test output, locate "Tests / Checkstyle (pull_request)" and click on it.

In case of issues with the import order, double check that you activated Auto Import. You can trigger fixing imports by pressing Ctrl+Alt+O to trigger Optimize Imports.

Please carefully follow the setup guide for the codestyle. Afterwards, please run checkstyle locally and fix the issues, commit, and push.

@jabref-machine
Copy link
Collaborator

Note that your PR will not be reviewed/accepted until you have gone through the mandatory checks in the description and marked each of them them exactly in the format of [x] (done), [ ] (not done yet) or [/] (not applicable).

@jabref-machine
Copy link
Collaborator

JUnit tests of jablib are failing. You can see which checks are failing by locating the box "Some checks were not successful" on the pull request page. To see the test output, locate "Tests / Unit tests (pull_request)" and click on it.

You can then run these tests in IntelliJ to reproduce the failing tests locally. We offer a quick test running howto in the section Final build system checks in our setup guide.

@calixtus calixtus added the dev: no-bot-comments If set, there should be no comments from our bots label Jul 13, 2025
Comment on lines +121 to +122
// Create the OCR action
OcrAction ocrAction = new OcrAction(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment is trivial and does not add any new information beyond what is clearly visible in the code. It simply restates what the code is doing.

Comment on lines +131 to +132
// Set the action to execute when clicked
ocrItem.setOnAction(event -> ocrAction.execute());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment is redundant and merely describes what the code does without providing additional context or reasoning. The code is self-explanatory.

Comment on lines +134 to +135
// Disable if the action is not executable (file doesn't exist)
ocrItem.disableProperty().bind(ocrAction.executableProperty().not());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment restates what is evident from the code itself without providing additional insight or explanation about the underlying logic or design decision.

configureTessdata();
this.isAvailable = true;
LOGGER.debug("Initialized TesseractOcrProvider successfully");
} catch (Exception e) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catching generic Exception is too broad and may mask specific issues. Should catch specific exceptions that can occur during initialization.

return true;
}
}
} catch (Exception e) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generic Exception catch block in setTessdataPath method should be replaced with specific exceptions like IOException or SecurityException.


private void configureTessdata() {
// Priority 1: Check user preferences (from settings)
if (filePreferences != null) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Null check on constructor-injected filePreferences indicates potential null usage. Should use Optional or enforce non-null in constructor.

*/
record Failure(String errorMessage) implements OcrResult {
public Failure {
// Provide default message instead of throwing exception
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment is trivial and simply describes what the code does. The code is self-explanatory and doesn't need this comment.

Copy link

trag-bot bot commented Jul 16, 2025

@trag-bot didn't find any issues in the code! ✅✨

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GSOC meta issue: OCR Integration in JabRef
7 participants