Update to the JOSS paper based on the comments.

sambitdash · sambitdash · commit be4f9b31a123 · 2019-10-17T11:24:21.000+05:30
diff --git a/paper/paper.md b/paper/paper.md
@@ -1,5 +1,5 @@
 ---
-title: 'PDFIO: PDF Reader Library for native Julia'
+title: PDFIO: PDF Reader Library for native Julia'
 tags:
   - Julia
   - PDF
@@ -22,54 +22,81 @@ bibliography: paper.bib
 # Summary
 
 Portable Document Format (PDF) is the most ubiquitous file format for
-text, scientific research, legal documentation and many other domains
-for information dissemination and presentation. Being a final form
+text, scientific research, legal documentation and many other fields
+for information presentation and dissemination. Being a final form
 format of choice, a large body of text is currently archived in this
 format. Julia is an upcoming programming language in the field of data
-sciences with focus on text analysis. Extracting archived content to
-text is highly beneficial to the language usage and adoption.
+sciences. Extracting archived content and understanding document 
+metadata is beneficial to the language usage.
 
-``PDFIO`` is an API developed purely in Julia. Almost, all the
+`PDFIO` is an API developed purely in Julia. Almost, all the
 functionalities of PDF understanding is entirely written from scratch
-in Julia with only exception of certain (de)compression codecs and
+in Julia with the only exception of certain (de)compression codecs and
 cryptography, where standard open source libraries are being used.
 
 The following are some of the benefits of utilizing this approach:
 
 1. PDF files are in existence for over three decades. Implementations
-   of the PDF writers are not always accurate to the specification or
-   they may even vary significantly from vendor to vendor. Every time,
-   someone gets a new PDF file there is a possibility that it may not
-   work to the best interpretation of the specification. A script
-   based language makes it easier for the consumers to quickly modify
-   the code and enhance to their specific needs.
+   of the PDF writers are not always accurate to the specification.
+   They may even vary significantly from vendor to vendor. Every time 
+   someone gets a new PDF file, there is a possibility that file
+   may not be interpreted as per the specification. A script based 
+   language makes it easier for the consumers to quickly modify and 
+   enhance the code to their specific needs.
    
 2. When a higher level scripting language implements a C/C++ PDF
-   library API, the scope is confined to achieving certain high level
-   application tasks like, graphics or text extraction; annotation or
-   signature content extraction or page merging or
-   extraction. However, this API represents the PDF specification as a
-   model (in Model, View and Controller parlance). Every object in PDF
-   specification can be represented in some form through these
-   APIs. Hence, objects can be utilized effectively to understand
-   document structure or correlate documents in more meaningful ways.
+   library API, the scope is kept limited to achieving certain high 
+   level tasks like, graphics or text extraction; annotation or
+   signature content extraction; or page extraction or merging. 
+   
+   However, `PDFIO` represents the PDF specification as a model in the 
+   Model, View and Controller parlance. A PDF file can be represented 
+   as a collection of interconnected Julia structures. Those 
+   structures can be utilized in granular tasks or simply can be used 
+   to understand the structure of the PDF document. 
+
+   As per the PDF specification, text can be presented as part of the
+   page content stream or inside PDF page annotations. An API like 
+   `PDFIO` can create two categories of object types. One representing
+   the text object inside the content stream and the other for the 
+   text inside an annotation object. Thus, providing flexibility to 
+   the API user. 
     
-3. Potential to be extended as a PDF generator. Since, the API is
-   written as an object model of PDF documents, it's easier to extend
-   with additional PDF write or update capabilities.
+3. Since, the API is written as an object model of PDF documents, it's 
+   easier to extend with additional PDF write or update capabilities. 
+   Although, the current implementation does not provide the PDF 
+   writing capabilities, the foundation has been laid for future 
+   extension.
    
 There are also certain downsides to this approach:
 
 1. Any API that represents an object model of a document, tends to
-   carry the complexity of introducing abstract objects, often opaque
-   objects (handles) that are merely representational for an API
-   user. They may not have any functional meaning. The methods tend to
-   be granular than a method that can complete a user level task.
-2. The user may need to refer to the PDF specification
-   (PDF-32000-1:2008)[@Adobe:2008] for having a complete semantic
-   understanding.
-3. The amount of code needed to carry out certain tasks can be
-   substantially higher.
+   carry the complexity of introducing abstract objects. They can be
+   opaque objects (handles) that are representational specific to the 
+   API. They may not have any functional meaning. The methods are
+   granular and may not complete one use level task. The amount of code
+   needed to complete a user level task can be substantially higher. 
+   
+   A comparative presentation of such approach can be seen in the 
+   illustration given below. A text extraction task, that can be just
+   one simple method invocation in a competing library like `Taro`, 
+   can involve more number of steps in `PDFIO`. For example, in `PDFIO` 
+   the following steps have to be carried out: 
+   a. Open the PDF document and obtain the document handle.  
+   b. Query the document handle for all the pages in the document. 
+   c. Iterate the pages and obtain the page object handles for each of
+      the pages.  
+   d. Extract the text from the page objects and write to a file IO.  
+   e. Close the document ensuring all the document resources are 
+      reclaimed.
+2. The API user may need to refer to the PDF specification
+   (PDF-32000-1:2008)[@Adobe:2008] for semantic understanding of PDF 
+   files in accomplishing some of the tasks. For example, the workflow 
+   of PDF text extraction above is a natural extension from how text is 
+   represented in a PDF file as per the specification. A PDF file is 
+   composed of pages and text is represented inside each page content 
+   object. The object model of `PDFIO` is a Julia language 
+   representation of the PDF specification. 
    
 ## Illustration
 
@@ -105,7 +132,8 @@ end
 
 ```   
 While `PDFIO` requires a larger number of lines of code, it definitely
-provides a more granular set of APIs.
+provides a more granular set of APIs to understand the PDF document
+structure. 
 
 # Functionality
 
@@ -132,7 +160,7 @@ provides a more granular set of APIs.
    also known as the `COS` layer APIs.
 
 
-# Acknowledgements
+# Acknowledgments
 
 We acknowledge contributions of all the community developers who have
 contributed to this effort. Their contribution can be viewed at: