Skip to content

PyPDF4LLM: Problems when extracting pdf text #145

@JJS-L

Description

@JJS-L

Hello,I'm glad I found such a good pdf text extractor!
tutorials/multi-columns/markdowntext.py
But there is still a problem, when I use the py file in this directory to process Wiley's document, two paragraph statements are messed up.
doi:10.1002/adom.202401391
Here is the text I extracted using this py
'''
colors (red, green, and blue) or a combi

Highly efficient and stable single-stack hybrid white organic light-emitting nation of complementary colors (com- diode (WOLED) devices are developed using two emissive layers: one with monly orange and blue) to generate a amber-colored phosphorescent molecular-aggregate emission from the Pd(II) broad emission spectrum.[[8,9]] The compli

cations associated with such architectures

complex, Pd(II) 7-(3-(pyridine-2-yl-κN)phenoxy-κC)(benzo-κC)([c]benzo[4,5]

stem from the employment of multiple

imidazo-κN)[1,2-a][1,5]naphthyridine, Pd3O8-Py5, and the other with blue

emitters, which require intricate device

fluorescence emission. An optimized device structure achieves high color sta- engineering strategies to avoid voltage- bility under various current densities, an external quantum efficiency (EQE) of dependent changes in the electrolumi- 45.5%, a power efficiency of 97.4 Lm W[−][1], and an estimated LT95 (operational nescent (EL) spectrum, typically caused time to 95% of the initial luminance) of 50 744 h at an initial luminance of by a shift of the exciton recombination

'''
Abstract
Highly efficient and stable single-stack hybrid white organic light-emitting
diode (WOLED) devices are developed using two emissive layers: one with
amber-colored phosphorescent molecular-aggregate emission from the Pd(II)
complex, Pd(II) 7-(3-(pyridine-2-yl-κN)phenoxy-κC)(benzo-κC)([c]benzo[4,5]

Introduction
colors (red, green, and blue) or a combi
nation of complementary colors (com
monly orange and blue) to generate a
broad emission spectrum.[8,9] The compli
cations associated with such architectures
stem from the employment of multiple
'''

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions