Skip to content

Commit 3e66c64

Browse files
committed
Documentation update and minor refactoring.
1 parent 88ef1fd commit 3e66c64

File tree

9 files changed

+220
-51
lines changed

9 files changed

+220
-51
lines changed

docs/src/index.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,19 @@ pdPageIsEmpty
6767
pdPageGetCosObject
6868
pdPageGetContentObjects
6969
```
70+
## PDF Page objects
71+
```@docs
72+
PDPageObject
73+
PDPageElement
74+
PDPageObjectGroup
75+
PDPageTextObject
76+
PDPageTextRun
77+
PDPageMarkedContent
78+
PDPageInlineImage
79+
PDPage_BeginGroup
80+
PDPage_EndGroup
81+
```
82+
7083
# Cos
7184
```@docs
7285
CosDoc

src/CDObject.jl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ string types without having any encoding associated for semantic representation.
1010
Determination of encoding is carried out mostly by associated fonts and character maps in
1111
the content stream. There are also strings used in descriptions and other attributes of a
1212
PDF file where no font or mapping information is provided. This represents the string type
13-
in such situations. Typically, strings in PDFs are 3 types.
13+
in such situations. Typically, strings in PDFs are of 3 types.
1414
1515
1. Text string
1616
a. PDDocEncoded string - Similar to ISO_8859-1

src/CosDoc.jl

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ physical file structure of the PDF document. To be used for accessing PDF intern
1919
from document structure when no direct API is available.
2020
2121
One can access any aspect of PDF using the COS level APIs alone. However, they may require
22-
you to know the PDF specification in details and not the most intuititive.
22+
you to know the PDF specification in details and they are not the most intuititive.
2323
"""
2424
abstract type CosDoc end
2525

@@ -50,7 +50,8 @@ end
5050
```
5151
show(io::IO, doc::CosDoc)
5252
```
53-
Prints the CosDoc. The intent is to print lesser information from the structure.
53+
Prints the CosDoc. The intent is to print lesser information from the structure as default
54+
can be overwhelming flooding the REPL.
5455
"""
5556
function show(io::IO, doc::CosDoc)
5657
print(io, "\nCosDoc ==>\n")
@@ -71,7 +72,7 @@ end
7172
```
7273
Reclaims all system resources consumed by the `CosDoc`. The `CosDoc` should not be used
7374
after this method is called. `cosDocClose` only needs to be explicitly called if you have
74-
opened the document by'cosDocOpen'. Documents opened with `pdDocOpen` do not need to use
75+
opened the document by 'cosDocOpen'. Documents opened with `pdDocOpen` do not need to use
7576
this method.
7677
"""
7778
function cosDocClose(doc::CosDocImpl)
@@ -105,8 +106,8 @@ end
105106
cosDocGetRoot(doc::CosDoc) -> CosDoc
106107
```
107108
The structural starting point of a PDF document. Also known as document root dictionary.
108-
This provides details object locations and document access methodology. This should not be
109-
confused with the `catalog` object of the PDF document.
109+
This provides details of object locations and document access methodology. This should not
110+
be confused with the `catalog` object of the PDF document.
110111
"""
111112
cosDocGetRoot(doc::CosDoc) = CosNull
112113

@@ -121,7 +122,7 @@ access to the direct object after searching for the object in the document struc
121122
indirect object reference is passed as an `obj` parameter the complete `indirect object`
122123
(reference as well as all content of the object) are returned. A `direct object` passed to
123124
the method is returned as is without any translation. This ensures the user does not have
124-
to go through type check of the objects before accessing the contents.
125+
to go through checking the type of the objects before accessing the contents.
125126
"""
126127
cosDocGetObject(doc::CosDoc, obj::CosObject) = CosNull
127128

@@ -451,7 +452,7 @@ cosDocGetPageNumbers(doc::CosDoc, catalog::CosObject, label::AbstractString) ->
451452
```
452453
PDF utilizes two pagination schemes. An internal global page number that is maintained
453454
serially as an integer and `PageLabel` that is shown by the viewers. Given a `label` this
454-
method returns a `range` of valid page numbers for the given label.
455+
method returns a `range` of valid page numbers.
455456
"""
456457
function cosDocGetPageNumbers(doc::CosDoc, catalog::CosObject, label::AbstractString)
457458
ref = get(catalog, cn"PageLabels")

src/CosObject.jl

Lines changed: 59 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,8 @@ abstract type CosObject end
4141
```
4242
CosString
4343
```
44+
Abstract type that represents a PDF string. In PDF objects are mere byte representations.
45+
They translate to actual text strings by application of fonts and associated encodings.
4446
"""
4547
abstract type CosString <: CosObject end
4648

@@ -49,13 +51,16 @@ abstract type CosString <: CosObject end
4951
```
5052
CosNumeric
5153
```
54+
Abstract type for numeric objects. The objects can be an integer [`CosInt`](@ref) or float
55+
[`CosFloat`](@ref).
5256
"""
5357
abstract type CosNumeric <: CosObject end
5458

5559
"""
5660
```
5761
CosBoolean
5862
```
63+
A boolean object in PDF which is either a `CosTrue` or `CosFalse`
5964
"""
6065
struct CosBoolean <: CosObject
6166
val::Bool
@@ -70,22 +75,25 @@ struct CosNullType <: CosObject end
7075
```
7176
CosNull
7277
```
78+
PDF representation of a `null` object. Can be applied to [`CosObject`](@ref) of any type.
7379
"""
7480
const CosNull=CosNullType()
7581

7682
"""
7783
```
7884
CosFloat
7985
```
86+
A numeric float data type.
8087
"""
8188
struct CosFloat <: CosNumeric
8289
val::Float64
8390
end
8491

8592
"""
8693
```
87-
CosFloat
94+
CosInt
8895
```
96+
An integer in PDF document.
8997
"""
9098
struct CosInt <: CosNumeric
9199
val::Int
@@ -116,6 +124,7 @@ get(o::CosIndirectObject) = get(o.obj)
116124
```
117125
CosName
118126
```
127+
Name objects are symbols used in PDF documents.
119128
"""
120129
struct CosName <: CosObject
121130
val::Symbol
@@ -124,8 +133,9 @@ end
124133

125134
"""
126135
```
127-
@cn_str
136+
@cn_str(str) -> CosName
128137
```
138+
A string decorator for easier instantiation of a [`CosName`](@ref)
129139
"""
130140
macro cn_str(str)
131141
return CosName(str)
@@ -135,6 +145,8 @@ end
135145
```
136146
CosXString
137147
```
148+
Concrete representation of a [`CosString`](@ref) object. The underlying data is represented
149+
as hexadecimal characters in ASCII.
138150
"""
139151
struct CosXString <: CosString
140152
val::Vector{UInt8}
@@ -145,6 +157,8 @@ end
145157
```
146158
CosLiteralString
147159
```
160+
Concrete representation of a [`CosString`](@ref) object. The underlying data is represented
161+
by byte representations without any encoding.
148162
"""
149163
struct CosLiteralString <: CosString
150164
val::Vector{UInt8}
@@ -157,55 +171,84 @@ CosLiteralString(str::AbstractString)=CosLiteralString(transcode(UInt8,str))
157171
```
158172
CosArray
159173
```
174+
An array in a PDF file. The objects can be any combination of [`CosObject`](@ref).
160175
"""
161176
mutable struct CosArray <: CosObject
162-
val::Array{CosObject,1}
163-
function CosArray(arr::Array{T,1} where {T<:CosObject})
164-
val = Array{CosObject,1}()
177+
val::Vector{CosObject}
178+
function CosArray(arr::Vector{T} where {T<:CosObject})
179+
val = Vector{CosObject}()
165180
for v in arr
166181
push!(val,v)
167182
end
168183
new(val)
169184
end
170-
CosArray()=new(Array{CosObject,1}())
185+
CosArray()=new(Vector{CosObject}())
171186
end
172187

173-
get(o::CosArray, isNative=false)=isNative ? map((x)->get(x),o.val) : o.val
188+
"""
189+
```
190+
get(o::CosArray, isNative=false) -> Vector{CosObject}
191+
```
192+
An array in a PDF file. The objects can be any combination of [`CosObject`](@ref).
193+
194+
`isNative = true` will return the underlying native object inside the `CosArray` by
195+
invoking get method on it.
196+
"""
197+
get(o::CosArray, isNative=false) = isNative ? map((x)->get(x),o.val) : o.val
198+
"""
199+
```
200+
length(o::CosArray) -> Int
201+
```
202+
Length of the `CosArray`
203+
"""
174204
length(o::CosArray)=length(o.val)
175205

176206
"""
177207
```
178208
CosDict
179209
```
210+
Name value pair of a PDF objects. The object is very similar to the `Dict` object. The `key`
211+
has to be of a [`CosName`](@ref) type.
180212
"""
181213
mutable struct CosDict <: CosObject
182214
val::Dict{CosName,CosObject}
183215
CosDict()=new(Dict{CosName,CosObject}())
184216
end
185217

186-
get(dict::CosDict, name::CosName)=get(dict.val,name,CosNull)
218+
"""
219+
```
220+
get(dict::CosDict, name::CosName) -> CosObject
221+
```
222+
Returns the value as a [`CosObject`](@ref) for the key `name`
223+
"""
224+
get(dict::CosDict, name::CosName) = get(dict.val,name,CosNull)
187225

188226
get(o::CosIndirectObject{CosDict}, name::CosName) = get(o.obj, name)
189227

190228
"""
191-
Set the value to object. If the object is CosNull the key is deleted.
229+
```
230+
set!(dict::CosDict, name::CosName, obj::CosObject) -> CosObject
231+
```
232+
Sets the value on a dictionary object. Setting a `CosNull` object deletes the object from
233+
the dictionary.
192234
"""
193235
function set!(dict::CosDict, name::CosName, obj::CosObject)
194-
if (obj === CosNull)
195-
return delete!(dict.val,name)
196-
else
197-
dict.val[name] = obj
236+
if (obj === CosNull)
237+
delete!(dict.val,name)
238+
else
239+
dict.val[name] = obj
240+
end
198241
return dict
199-
end
200242
end
201243

202-
set!(o::CosIndirectObject{CosDict}, name::CosName, obj::CosObject) =
203-
set!(o.obj, name, obj)
244+
set!(o::CosIndirectObject{CosDict}, name::CosName, obj::CosObject) = set!(o.obj, name, obj)
204245

205246
"""
206247
```
207248
CosStream
208249
```
250+
A stream object in a PDF. Stream objects have an `extends` disctionary, followed by binary
251+
data.
209252
"""
210253
mutable struct CosStream <: CosObject
211254
extent::CosDict

src/CosObjectHelpers.jl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ function convert(::Type{CDTextString}, xstr::CosXString)
99
const FEFF = [LATIN_UPPER_F, LATIN_UPPER_E, LATIN_UPPER_F, LATIN_UPPER_F]
1010
prefix = xstr.val[1:4]
1111
hasPrefix = (prefix == feff || prefix == FEFF)
12-
isUTF16 = hasPrefix || prefix[1:2] == UInt8[0x00, 0x00]
12+
isUTF16 = hasPrefix || prefix[1:2] == UInt8[0x30, 0x30]
1313
data = xstr.val
1414
buffer = data |> String |> hex2bytes
1515
if isUTF16

src/PDDoc.jl

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ using ..Common
1515
```
1616
PDDoc
1717
```
18-
A in memory representation of a PDF document. Once created this type has to be used to
18+
An in memory representation of a PDF document. Once created this type has to be used to
1919
access a PDF document.
2020
"""
2121
abstract type PDDoc end
@@ -61,7 +61,7 @@ end
6161
pdDocGetCatalog(doc::PDDoc) -> CosObject
6262
```
6363
`Catalog` is considered the topmost level object in PDF document that is subsequently
64-
used to traverse and extract information on a PDF document. To be used for accessing PDF
64+
used to traverse and extract information from a PDF document. To be used for accessing PDF
6565
internal objects from document structure when no direct API is available.
6666
"""
6767
function pdDocGetCatalog(doc::PDDoc)
@@ -78,15 +78,15 @@ physical file structure of the PDF document. To be used for accessing PDF intern
7878
from document structure when no direct API is available.
7979
8080
One can access any aspect of PDF using the COS level APIs alone. However, they may require
81-
you to know the PDF specification in details and not the most intuititive.
81+
you to know the PDF specification in details and it is not the most intuititive.
8282
"""
8383
pdDocGetCosDoc(doc::PDDoc)= doc.cosDoc
8484

8585
"""
8686
```
8787
pdDocGetPage(doc::PDDoc, num::Int) -> PDPage
8888
```
89-
Given a document absolute page number provides the associated page.
89+
Given a document absolute page number, provides the associated page object.
9090
"""
9191
function pdDocGetPage(doc::PDDoc, num::Int)
9292
cosobj = find_page_from_treenode(doc.pages, num)
@@ -118,9 +118,11 @@ end
118118
pdDocGetInfo(doc::PDDoc) -> Dict
119119
```
120120
Given a PDF document provides the document information available in the `DocumentInfo`
121-
disctionary. The information typically includes _creation date, modification date, author,
122-
creator_ used etc. However, information content are not all mandatory and all information
123-
may not be available in a document. Please refer to the PDF specification for details.
121+
dictionary. The information typically includes *creation date, modification date, author,
122+
creator* used etc. However, all information content are not mandatory. Hence, all
123+
information needed may not be available in a document.
124+
125+
Please refer to the PDF specification for further details.
124126
"""
125127
function pdDocGetInfo(doc::PDDoc)
126128
ref = get(doc.cosDoc.trailer[1], CosName("Info"))
@@ -140,8 +142,10 @@ end
140142
```
141143
Some information in PDF is stored as name and value pairs not essentially a dictionary.
142144
They are all aggregated and can be accessed from one `names` dictionary object in the
143-
document catalog. This method provides access to such dictionary in a PDF file. Not all PDF
145+
document catalog. This method provides access to such values in a PDF file. Not all PDF
144146
document may have a names dictionary. In such cases, a `CosNull` object may be returned.
147+
148+
Please refer to the PDF specification for further details.
145149
"""
146150
function pdDocGetNamesDict(doc::PDDoc)
147151
catalog = pdDocGetCatalog(doc)

src/PDFIO.jl

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,15 @@ export PDDoc,
3030
pdPageIsEmpty,
3131
pdPageGetCosObject,
3232
pdPageGetContentObjects,
33-
pdPageExtractText
33+
pdPageExtractText,
34+
PDPageObject,
35+
PDPageObjectGroup,
36+
PDPage_BeginGroup, PDPage_EndGroup,
37+
PDPageTextObject,
38+
PDPageMarkedContent,
39+
PDPageElement,
40+
PDPageTextRun,
41+
PDPageInlineImage
3442

3543
using .Cos
3644
export CosDoc,

src/PDPage.jl

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,8 @@ pdPageGetCosObject(page::PDPage) = page.cospage
2323
```
2424
Page rendering objects are normally stored in a `CosStream` object in a PDF file. This
2525
method provides access to the stream object.
26+
27+
Please refer to the PDF specification for further details.
2628
"""
2729
function pdPageGetContents(page::PDPage)
2830
if (page.contents === CosNull)
@@ -46,9 +48,8 @@ end
4648
```
4749
pdPageGetContentObjects(page::PDPage) -> CosObject
4850
```
49-
A page object can have multiple associated content objects. This method will return a
50-
`CosArray` if there are multiple of content objects associated. Otherwise an indirect
51-
object of `CosStream` type will be returned.
51+
Page rendering objects are normally stored in a `CosStream` object in a PDF file. This
52+
method provides access to the stream object.
5253
"""
5354
function pdPageGetContentObjects(page::PDPage)
5455
if (isnull(page.content_objects))

0 commit comments

Comments
 (0)