page.get_links() is not returning all links #4536

1504168 · 2025-05-31T12:55:28Z

1504168
May 31, 2025

I am trying to extract links from this particular PDF file(attached). But when I use page.get_links() it shows an empty list. I don't understand why.
Especially on the 2nd page, I can see multiple links.

Here is the sample code:

import pymupdf
doc = pymupdf.open('faster_rcnn.pdf')
page = doc[1]
print(page.get_links())

faster_rcnn.pdf

Answered by JorjMcKie

May 31, 2025

The page indeed contains no links!
There exists text that looks like a link, but this is just text.
PDF viewers usually still react to this type of text format as if a link had been technically defined. This causes your confusion.

View full answer

JorjMcKie · 2025-05-31T13:39:02Z

JorjMcKie
May 31, 2025
Maintainer

The page indeed contains no links!
There exists text that looks like a link, but this is just text.
PDF viewers usually still react to this type of text format as if a link had been technically defined. This causes your confusion.

0 replies

1504168 · 2025-05-31T17:06:41Z

1504168
May 31, 2025
Author

Okay. It seems so. I think that I will need to use regex and get_text option.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

page.get_links() is not returning all links #4536

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

page.get_links() is not returning all links #4536

Uh oh!

1504168 May 31, 2025

Replies: 2 comments

Uh oh!

JorjMcKie May 31, 2025 Maintainer

Uh oh!

1504168 May 31, 2025 Author

1504168
May 31, 2025

JorjMcKie
May 31, 2025
Maintainer

1504168
May 31, 2025
Author