-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Description
Is your suggestion for improvement related to a problem? Please describe.
Might be a possible GSoC project!
Many books have a special page with a lot of bibliographical and publishing information (typically a second, or a third, after blank ones, sometimes one of the last pages).
What if JabRef could extract information from these pages? After all, this is the purpose of such pages - to contain bibliographical information.
Of course, many of them are different, and it's hard (impossible) to make a universal extraction algorithm. But! This project would be very beneficial to Ukrainian (and others) community!
In Ukraine, each book has a special page with bibliographical information that has TONS of information, and it's highly standardized! We also include a full citation in our single standard. And after the citation, an abstract typically goes. I attached a screenshot in Additional context.
Describe the solution you'd like
One could improve PdfContentImporter
to extract information from these pages, as they are highly rich.
Additional context
- Ukraine - "Collection of physics problems for 8th grade":
Yes, every book in Ukraine has this 😄. Well, IDK about fiction or modern literature, but scientific literature is like this.
(This book is classics. My generation and several before/after have done problems from this book. Abstract available online too.)
- Pearson - "Artificial Intelligence: Modern Approach" 3rd ed.:
Not much information here.
- O'Reily - "Natural Language Processing with Python":
Not much information here too.
- "The formal semantics of programming languages: an introduction":
One of the few where there is a citation. However, there are many foreign citation styles. In Ukraine, there is only a single, so it's simpler to improve PdfContentImporter
.
Metadata
Metadata
Assignees
Type
Projects
Status