Book Content Processing

Jump to bottom Edit New page

gleneivey edited this page Sep 3, 2011 · 1 revision

Book Content Processing

A site using Draft Outloud is configured to serve a particular book through values in its local database (the book's git URL and root file name) and by placing the correct SSH credentials for accessing the book's repository onto the Draft Outloud server.

Book Content Processing Commands

Eventually it will be possible to update the content of a Draft Outloud site with new content from its book's repository, but for the moment that is accomplished on the command-line of the server running Draft Outloud.

To update to a new version of content, log in to your Draft Outloud site's server. cd to the directory containing the active Draft Outloud instance. For example, if you deploy Draft Outloud using capistrano (and why wouldn't you), then this would be the current directory below your deployment root. Then, invoke the process_book script included with Draft Outloud:

cd somewhere-on-your-server/current
bundle exec tools/process_book [your git repo path] [target to checkout]

Note that before you can execute process_book the XML tools and standard DocBook XSLT style sheets have to be installed on your server for Draft Outloud, see [Installing Draft Outloud]].

"your git repo path" is the full URI for the repository that contains the content of your book. If your book is hosted on GitHub, then the path would look like git://github.com/user-name/book-name.git.

"target to checkout" can be any string or symbol that git can accept as an argument to git checkout for your repository. It could be master or another branch name, a tag you have created, or the SHA of a particular commit.

process_book also uses the values from several optional environment variables to over-ride its own defaults. For example, because process_book manipulates the database that is shared with the rest of Draft Outloud, it invokes Ruby-on-Rails and needs to know what "environment" your server is operating in so that it can access the correct database login credentials. Assuming that you are using the standard Rails environment name "production" for your Draft Outloud instance, and your hosting provider isn't configured to automatically set the RAILS_ENV environment variable accordingly, you would run process_book like this (for example):

RAILS_ENV=production bundle exec tools/process_book git://github.com/user-name/book-name.git master

While process_book is executing...

Book Content Processing Flow

While individual pages served by Draft Outloud are processed individually, much of the work of preparing the book to be served is performed in a batch when Draft Outloud is instructed to update to a new version of the book from the repository. The sequence of this processing is described here. (Note that this is the overall as-intended flow. Not all steps and site features are implemented yet.)

Updating the site to a new version of the book's content is triggered by an access to a special administrative URL with a parameter that is an acceptable parameter to "get checkout" (a commit SHA or tag, for example). This launches a background job within Draft Outloud. The background job writes to a static HTML file stored in 'public' to indicate its latest status. The file is written (always including the target checkout and a time stamp) at the end of each major step, and in the event of any error that halts processing.
The first step is to check out the version of the book matching the parameter given. If the local repository isn't already set up, it is cloned locally, and then the specified revision checked out.
Draft Outloud provides a download link for a monolithic PDF version of the book. At this point, the XML tool chain to generate PDF from the DocBook source is run. If errors occur, they are written to the status file and processing ends. The PDF version of the book is not made available for download yet, but held in the local working directory.
Draft Outloud also builds a monolithic, downloadable HTML version of the book. This HTML is served when someone clicks the "download" link on the home page and used as an intermediate data form for subsequent processing steps. At this point, the XML tool chain to generate the HTML version of the book is executed. If errors occur, they are written to the status file and processing ends. The HTML version of the book is not made available for download yet, but held in the local working directory.
The generated HTML version of the book is read into Ruby and parsed into a document tree. The following processes are then run on the tree:
1. It is scanned for all HTML elements that represent entries in the outline of the book (chapter headers, section/subsection headers, etc.). Database records reflecting the book's outline are updated, including "what changed between versions" info. The central content (not header or footer) of the site's TOC (home page) is generated and cached.
2. The content for each "section" of the book is extracted from the whole-book document tree, and new HTML is generated (including the markup necessary for the site's interactive features) for the body of each section page that Draft Outloud will server for the current version of the book.
3. Each section and group of sections of the book are also downloadable as stand-alone HTML files. These are now generated from the whole-book document tree.
The book version being served by the site is changed. The generated downloadable files are moved to the directory from which they will be served. The pre-generated HTML for section pages is moved to the cache directory. The "current version" information used to select TOC information from the database is updated to the new target version.