Skip to content

Microsoft Word table of contents Link annotation error. #2346

@vokson

Description

@vokson

I am trying to use PdfReader and PdfWriter to read/write annotations in pdf file. I use PDF file produced by Microsoft Word -> Save As PDF. Word file has 3 simple pages with headings Page 1, Page 2, Page 3 and automatic table of contents made from these headings.
Links in table of contents become to be Link annotations in PDF file. Annotation itself looks like this

{'/Subtype': '/Link', '/Rect': [82.8, 711.57, 554.55, 731.07], '/BS': {'/W': 0}, '/F': 4, '/Dest': [IndirectObject(3, 0, 1202232362752), '/XYZ', 82, 785, 0], '/StructParent': 3}

Problem is value of '/Dest' key is list, but your code in _writer.py always expects dictionary. Then program tries to get value of tmp["target_page_index" from list, so that crash with error.

Please, help.

      if to_add.get("/Subtype") == "/Link" and "/Dest" in to_add:
            tmp = cast(Dict[Any, Any], to_add[NameObject("/Dest")])
            dest = Destination(
                NameObject("/LinkName"),
                tmp["target_page_index"],
                Fit(
                    fit_type=tmp["fit"], fit_args=dict(tmp)["fit_args"]
                ),  # I have no clue why this dict-hack is necessary
            )
            to_add[NameObject("/Dest")] = dest.dest_array

Environment

$ python -m platform
Windows-10-10.0.19043-SP0

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==3.17.2, crypt_provider=('cryptography', '37.0.4'), PIL=9.4.0

Code + PDF

    annotations = {}
    writer = PdfWriter()
    in_memory_file = BytesIO()

    for filename in filenames:
        reader = PdfReader(filename, strict=False)
        for page_idx, page in enumerate(reader.pages):
            if "/Annots" in page:
                for annot in page["/Annots"]:
                    if not annotations.get(page_idx):
                        annotations[page_idx] = []

                    annotations[page_idx].append(annot.get_object())
        del reader

    reader = PdfReader(filenames[0])
    for page_idx, page in enumerate(reader.pages):
        writer.add_page(page)

    del reader
    writer.remove_links()

    for page_idx in annotations:
        for annot in annotations[page_idx]:
            writer.add_annotation(page_number=page_idx, annotation=annot)

    writer.write(in_memory_file)

Test.docx
Test.pdf

Traceback

This is the complete traceback I see:

Traceback (most recent call last):
  File "C:\NOSKOV\030_DEV\web_services\skotch3\src\backend\entrypoints\..\logic\service_layer\message_bus.py", line 537, in handle_command
    result = handler(command, self._uow, self.handle)
  File "C:\NOSKOV\030_DEV\web_services\skotch3\src\backend\entrypoints\..\logic\service_layer\command_handlers\command_service_handlers.py", line 929, in mix_pdf_files
    writer.add_annotation(page_number=page_idx, annotation=annot)
  File "C:\NOSKOV\030_DEV\web_services\skotch3\src\backend\venv\lib\site-packages\pypdf\_writer.py", line 2803, in add_annotation
    tmp["target_page_index"],
TypeError: list indices must be integers or slices, not str

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions