I need help understanding the extracted table data #4627
-
Hi, I am trying to extract the data from a table in a PDF. The table in question is show in the attached PDF page. This is the produced data: In [25]: import pymupdf
In [26]: doc = pymupdf.open("tmc2209_datasheet_rev1.09.pdf")
In [27]: page = doc[23]
In [28]: tabs = page.find_tables()
In [29]: print(f"{len(tabs.tables)} found on {page}")
1 found on page 23 of tmc2209_datasheet_rev1.09.pdf
In [30]: c = tabs[0].extract()
In [31]: for l in c:
...: print(len(l))
...:
18
18
18
18
18
18
18
18
18
18
18
18
18
18
18
18
18
18
18
18
18
18
18
18
18
18
18
18
18
18
18
18
In [32]: for l in c:
...: print(l)
...:
['', 'GENERAL CONFIGURATION REGISTERS (0X00…0X0F)', None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, '']
['', 'R/W', '', '', 'Addr', '', '', 'n', '', '', 'Register', '', '', 'Description / bit names', None, None, None, '']
['R+\nWC', None, None, '0x01', None, None, '3', None, None, 'GSTAT', None, None, 'Bit', 'Bit', None, '', 'GSTAT – Global status flags', '']
[None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, '(Re-Write with ‘1’ bit to clear respective flags)', None]
[None, None, None, None, None, None, None, None, None, None, None, None, '0', None, None, 'reset\n1: Indicates that the IC has been reset since the last\nread access to GSTAT. All registers have been\ncleared to reset values.', None, None]
[None, None, None, None, None, None, None, None, None, None, None, None, '1', None, None, 'drv_err\n1: Indicates, that the driver has been shut down due\nto overtemperature or short circuit detection since\nthe last read access. Read DRV_STATUS for details.\nThe flag can only be cleared when all error\nconditions are cleared.', None, None]
[None, None, None, None, None, None, None, None, None, None, None, None, '2', None, None, 'uv_cp\n1: Indicates an undervoltage on the charge pump.\nThe driver is disabled in this case. This flag is not\nlatched and thus does not need to be cleared.', None, None]
['R', None, None, '0x02', None, None, '8', None, None, 'IFCNT', None, None, '', None, None, 'Interface transmission counter. This register becomes\nincremented with each successful UART interface write\naccess. Read out to check the serial transmission for lost\ndata. Read accesses do not change the content. The\ncounter wraps around from 255 to 0.', None, None]
['W', None, None, '0x03', None, None, '4', None, None, 'NODECONF', None, None, '', 'Bit', '', '', 'NODECONF', '']
[None, None, None, None, None, None, None, None, None, None, None, None, '11..8', None, None, 'SENDDELAY for read access (time until reply is sent):\n0, 1: 8 bit times (Attention: Don’t use in multi-node)\n2, 3: 3*8 bit times\n4, 5: 5*8 bit times\n6, 7: 7*8 bit times\n8, 9: 9*8 bit times\n10, 11: 11*8 bit times\n12, 13: 13*8 bit times\n14, 15: 15*8 bit times', None, None]
['W', None, None, '0x04', None, None, '16', None, None, 'OTP_PROG', None, None, 'Bit', None, None, '', 'OTP_PROGRAM – OTP programming', '']
[None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, 'Write access programs OTP memory (one bit at a time),', None]
[None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, 'Read access refreshes read data from OTP after a write', None]
[None, None, None, None, None, None, None, None, None, None, None, None, '2..0', None, None, '', 'OTPBIT', '']
[None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, 'Selection of OTP bit to be programmed to the selected', None]
[None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, 'byte location (n=0..7: programs bit n to a logic 1)', None]
[None, None, None, None, None, None, None, None, None, None, None, None, '', '5..4', '', '', 'OTPBYTE', '']
[None, None, None, None, None, None, None, None, None, None, None, None, None, '', None, None, 'Selection of OTP programming location (0, 1 or 2)', None]
[None, None, None, None, None, None, None, None, None, None, None, None, '15..8', '15..8', None, '', 'OTPMAGIC', '']
[None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, 'Set to 0xbd to enable programming. A programming', None]
[None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, 'time of minimum 10ms per bit is recommended (check', None]
[None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, 'by reading OTP_READ).', None]
['R', None, None, '0x05', None, None, '24', None, None, 'OTP_READ', None, None, 'Bit', None, None, '', 'OTP_READ (Access to OTP memory result and update)', '']
[None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, 'See separate table!', None]
[None, None, None, None, None, None, None, None, None, None, None, None, '', '7..0', '', '', 'OTP0 byte 0 read data', '']
[None, None, None, None, None, None, None, None, None, None, None, None, '', '15..8', '', '', 'OTP1 byte 1 read data', '']
[None, None, None, None, None, None, None, None, None, None, None, None, '', '23..1', '', 'OTP2 byte 2 read data', 'OTP2 byte 2 read data', None]
[None, None, None, None, None, None, None, None, None, None, None, None, None, '6', None, None, None, None]
['R', None, None, '0x06', None, None, '10\n+\n8', None, None, 'IOIN', None, None, '', 'Bit', '', '', 'INPUT (Reads the state of all input pins available)', '']
[None, None, None, None, None, None, None, None, None, None, None, None, '0', None, None, 'ENN', None, None]
[None, None, None, None, None, None, None, None, None, None, None, None, '1', None, None, '0', None, None]
[None, None, None, None, None, None, None, None, None, None, None, None, '2', None, None, 'MS1', None, None] The extraction does produce data but I am having trouble understanding what the fields mean. For example on some lines, the fields are empty strings. On others, Where can I find any information on where and how the fields in the extracted array come from and mean? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 4 replies
-
The finder will always deliver rectangular m x n cells. So you will never see row-spans or column-spans. Where it determines missing cell boundary boxes in this process, they are generated to be |
Beta Was this translation helpful? Give feedback.
-
Here is the page with all cells marked: |
Beta Was this translation helpful? Give feedback.
Thanks for the hint. This makes it much more usable. Related question: is there a way to combine tables from multiple pages?