Modified the README.md based on JOSS comments.

sambitdash · sambitdash · commit da9fb768d68b · 2019-11-03T16:28:29.000+05:30
diff --git a/README.md b/README.md
@@ -22,65 +22,56 @@ The following are some of the benefits of utilizing this approach:
    of the specification. A script based language makes it easier for the
    consumers to quickly modify the code and enhance to their specific needs. 
    
-2. When a higher level scripting language implements a C/C++ PDF library API, the
-   scope is confined to achieving certain high level application tasks like, 
-   graphics or text extraction; annotation or signature content extraction or 
-   page merging or extraction. However, this API represents the PDF 
-   specification as a model (in MVC parlance). Every object in PDF 
-   specification can be represented in some form through these APIs. Hence, 
-   objects can be utilized effectively to understand document structure or 
-   correlate documents in more meaningful ways. 
-   
-3. Potential to be extended as a PDF generator. Since, the API is written as an 
-   object model of PDF documents, it's easier to extend with additional PDF 
-   write or update capabilities.
+2. When a higher level scripting language implements a C/C++ PDF
+   library API, the scope is kept limited to achieving certain high 
+   level tasks like, graphics or text extraction; annotation or
+   signature content extraction; or page extraction or merging. 
    
+   However, `PDFIO` represents the PDF specification as a model in the 
+   Model, View and Controller parlance. A PDF file can be represented 
+   as a collection of interconnected Julia structures. Those 
+   structures can be utilized in granular tasks or simply can be used 
+   to understand the structure of the PDF document. 
+
+   As per the PDF specification, text can be presented as part of the
+   page content stream or inside PDF page annotations. An API like 
+   `PDFIO` can create two categories of object types. One representing
+   the text object inside the content stream and the other for the 
+   text inside an annotation object. Thus, providing flexibility to 
+   the API user. 
+    
+3. Since, the API is written as an object model of PDF documents, it's 
+   easier to extend with additional PDF write or update capabilities. 
+   Although, the current implementation does not provide the PDF 
+   writing capabilities, the foundation has been laid for future 
+   extension.
 
 There are also certain downsides to this approach:
 
-1. Any API that represents an object model of a document, tends to carry the 
-   complexity of introducing abstract objects, often opaque objects (handles) 
-   that are merely representational for an API user. They may not have any 
-   functional meaning. The methods tend to be granular than a method that can 
-   complete a user level task. 
-2. The user may need to refer to the PDF specification for having a complete 
-   semantic understanding.
-3. The amount of code needed to carry out certain tasks can be substantially 
-   higher. 
+1. Any API that represents an object model of a document, tends to
+   carry the complexity of introducing abstract objects. They can be
+   opaque objects (handles) that are representational specific to the 
+   API. They may not have any functional meaning. The methods are
+   granular and may not complete one use level task. The amount of code
+   needed to complete a user level task can be substantially higher. 
    
-### Illustration
-
-A popular package `Taro.jl` that utilizes Java based [Apache
-Tika](http://tika.apache.org/), [Apache POI](http://poi.apache.org/) and [Apache
-FOP](https://xmlgraphics.apache.org/fop/) libraries for reading PDF and other
-file types may need the following code to extract text and other metadata from
-the document.
+   In `PDFIO` the following steps have to be carried out: 
+   a. Open the PDF document and obtain the document handle.  
+   b. Query the document handle for all the pages in the document. 
+   c. Iterate the pages and obtain the page object handles for each of
+      the pages.  
+   d. Extract the text from the page objects and write to a file IO.  
+   e. Close the document ensuring all the document resources are 
+      reclaimed.
+2. The API user may need to refer to the PDF specification
+   (PDF-32000-1:2008)[@Adobe:2008] for semantic understanding of PDF 
+   files in accomplishing some of the tasks. For example, the workflow 
+   of PDF text extraction above is a natural extension from how text is 
+   represented in a PDF file as per the specification. A PDF file is 
+   composed of pages and text is represented inside each page content 
+   object. The object model of `PDFIO` is a Julia language 
+   representation of the PDF specification. 
 
-```julia
-using Taro
-Taro.init()
-meta, txtdata = Taro.extract("sample.pdf");
-
-```
-
-While the same with `PDFIO` may look like below:
-
-```julia
-function getPDFText(src, out)
-    doc = pdDocOpen(src)
-    docinfo = pdDocGetInfo(doc)
-    open(out, "w") do io
-		npage = pdDocGetPageCount(doc)
-        for i=1:npage
-            page = pdDocGetPage(doc, i)
-            pdPageExtractText(io, page)
-        end
-    end
-    pdDocClose(doc)
-    return docinfo
-end
-
-```
 
 ## Installation
 
@@ -106,34 +97,30 @@ The above mentioned code takes a PDF file `src` as input and writes the text dat
 return - A dictionary containing metadata of the document
 """
 function getPDFText(src, out)
+    # handle that can be used for subsequence operations on the document.
     doc = pdDocOpen(src)
-```
-Provides `doc` handle that can be used for subsequence operations on the document.
-```julia
+	
+    # Metadata extracted from the PDF document. 
+	# This value is retained and returned as the return from the function. 
     docinfo = pdDocGetInfo(doc) 
-```
-Metadata extracted from the PDF document. This value is retained and returned as the return from the function. 
-```julia
     open(out, "w") do io
+	
+		# Returns number of pages in the document		
 		npage = pdDocGetPageCount(doc)
-```
-Returns number of pages in the document
-```julia
+
         for i=1:npage
+		
+            # handle to the specific page given the number index. 
             page = pdDocGetPage(doc, i)
-```
-Returns a `page` handle to the specific page given the number number index. 
-```julia
+			
+			# Extract text from the page and write it to the output file.
             pdPageExtractText(io, page)
-```
-Extract text from the page and write it to the output file.
-```julia
-        end
+
+		end
     end
+	# Close the document handle. 
+	# The doc handle should not be used after this call
     pdDocClose(doc)
-```
-Close the document handle. The `doc` handle should not be used after this call
-```julia
     return docinfo
 end
 ```