Skip to content

Commit 5a9a420

Browse files
authored
DOC: Notes about form fields and annotations (#1945)
1 parent cfae3a6 commit 5a9a420

File tree

1 file changed

+26
-0
lines changed

1 file changed

+26
-0
lines changed

docs/user/forms.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,3 +41,29 @@ Generally speaking, you will always want to use `auto_regenerate=False`. The
4141
parameter is `True` by default for legacy compatibility, but this flags the PDF
4242
Viewer to recompute the field's rendering, and may trigger a "save changes"
4343
dialog for users who open the generated PDF.
44+
45+
## A note about form fields and annotations
46+
47+
The PDF form stores form fields as annotations with the subtype "\Widget". This means that the following two blocks of code will give fairly similar results:
48+
49+
```python
50+
from pypdf import PdfReader
51+
reader = PdfReader("form.pdf")
52+
fields = reader.get_fields()
53+
```
54+
55+
```python
56+
from pypdf import PdfReader
57+
from pypdf.constants import AnnotationDictionaryAttributes
58+
reader = PdfReader("form.pdf")
59+
fields = []
60+
for page in reader.pages:
61+
for annot in page.annotations:
62+
annot = annot.get_object()
63+
if annot[AnnotationDictionaryAttributes.Subtype] == "/Widget":
64+
fields.append(annot)
65+
```
66+
67+
However, while similar, there are some very important differences between the two above blocks of code. Most importantly, the first block will return a list of Field objects, where as the second will return more generic dictionary-like objects. The objects lists will *mostly* reference the same object in the underlying PDF, meaning you'll find that `obj_taken_fom_first_list.indirect_reference == obj_taken_from _second_list.indirect_reference`. Field objects are generally more ergonomic, as the exposed data can be access via clearly named properties. However, the more generic dictionary-like objects will contain data that the Field object does not expose, such as the Rect (the widget's position on the page). So, which to use will depend on your use case.
68+
69+
However, it's also important to note that the two lists do not *always* refer to the same underlying PDF objects. For example, if the form contains radio buttons, you will find that `reader.get_fields()` will get the parent object (the group of radio buttons) whereas `page.annotations` will return all the child objects (the individual radio buttons).

0 commit comments

Comments
 (0)