Skip to content

LapDevelopment_Local

StephanOepen edited this page Jan 22, 2016 · 20 revisions

Background

Most LAP development happens on locally installed instances, e.g. ‘private’ laptop or desktop. The instructions on this page are intended to support developers in creating their own instance. Over time, we hope the same recipe will be applicable for the ‘deployment’ (on a new system) requirement that is part of certification as a CLARIN ‘A Service’.

Environment

It appears that most Galaxy development is done on RedHat Enterprise Linux (RHEL) installations, its ‘clones’ like CentOS, and its community look-alike Fedora. In principle, there should be no major obstacles to getting everything working in other Linux distributions, for example ArchLinux, Debian, or Ubuntu, but we have at times encountered distribution-related obstacles, for example in the ‘mix’ of versions that results from the automated downloading of Python eggs by Galaxy. Whenever possible, we recommend as the path of least resistance an environment compatible with RHEL6. In early 2016, LAP is only available for the 64-bit x86 architectures.

To maximally isolate LAP development from other activities, we recommend creation of a separate account; in the notes below, we assume the user laportal with its home directory in /home/laportal/. When installing into a different target directory, some of the path names given below and coded into the LAP-customized Galaxy code will need to be adapted. Alternatively, it might be possible to ‘mimic’ the directory structure below through a set of symbolic links in /home/laportal/.

MongoDB

As a prerequisite to LAP installation, there must be a MongoDB database available for access by the LAP user. Some notes on how MongoDB is configured on the LAP servers is available on the LapDevelopment/MongoDB, but for local installations

  • Install MongoDB via your package manager, e.g.

      yum -y install mongodb-server
    
  • Confirm that the database directory (/var/lib/mongodb/ by default) is available;

  • Optionally, review MongoDB settings (e.g. in /etc/mongodb.conf and /etc/sysconfig/mongod);

  • Start the server and optionally enable automated start-up, e.g.

      systemctl start mongod
      systemctl enable mongod
    

By default, MongoDB initially allocates database space relatively generously (at around three gigabytes, it appears). If disk space is at a premium (as can be the case on a laptop :-), consider adding the --smallfiles option to the start-up sequence of the MongoDB server.

Download LAP

On the (non-Galaxy) LAP side, the following components are required: the LAP Tree, Library, tool descriptions, and (optionally) operational scripts.

  cd /home/laportal
  svn co http://svn.emmtee.net/lap/trunk/tree
  svn co http://svn.emmtee.net/lap/trunk/library
  svn co http://svn.emmtee.net/lap/trunk/tools
  svn co http://svn.emmtee.net/lap/trunk/operation

Shell Set-Up

Add the following to your .bashrc and re-source it (or start a fresh shell):

export LAPTREE=/home/laportal/tree
export LAPLIBRARY=/home/laportal/library

### Activate this if you want to run galaxy in lappython only
#. $LAPTREE/etc/dot.bashrc

### Address to MongoDB to the local host 
export LAPSTORE=mongodb://127.0.0.1:27017/lapstore

Galaxy

We also need a clean Galaxy, as the production instance has some changes to make things work nice with Abel and such.

This assumes you install galaxy side-by-side with the production instance (that is, in the root of the SVN checkout). If you want something else, the file manipulation commands will necessarily have to be different.

  • Check out the appropriate revision of Galaxy: hg clone -r 5c789ab4144a http://bitbucket.org/galaxy/galaxy-dist

  • Copy the tool config from the production instance to your checkout: cp trunk/development/galaxy/tool_conf.xml* galaxy-dist/

  • Remove the default tools: rm -r galaxy-dist/tools

  • Symlink in the LAP tools: ln -s trunk/tools galaxy-dist/tools

  • In the galaxy-dist directory, run the file run.sh

  git clone https://github.com/galaxyproject/galaxy/
  cd galaxy
  git checkout release_15.03

Troubleshooting

On Debian Sid the first run fails with the following message:

WebError 0.8a couldn't be downloaded automatically.  You can try
building it by hand with:
  python scripts/scramble.py -e WebError
Fetch failed.
  • Run the indicated command python scripts/scramble.py -e WebError

  • Run run.sh again

On Ubuntu 14.04, the first run of run.sh fails when downloading eggs. This seems to be a version conflict between the system Python's version of some library and what Galaxy wants. It can be fixed by doing the first invocation in a virtualenv:

  • Make sure virtualenv is installed: sudo apt-get install python-virtualenv

  • Set up a virtualenv: virtualenv --no-site-packages galaxy_env

  • Activate it: . galaxy_env/bin/activate

  • Run run.sh again

The server should now start, and subsequent runs should not require the virtualenv.

ToDo And what about our custom data types (oe; 14-jan-16)?

ToDo: tool_conf and tool_path in config/galaxy.ini; pick up datatypes

Test Suite

Relevant parts of the repository:

trunk/library/python/lap/test.py
/home/emanuel/work/lap/trunk/tree/tests/function/{eng.t|eng.txt|...}

Before committing changes, developers must make sure that all tests pass. To run all tests, from the top level trunk directory, run:

make

Each test in trunk/tree/tests/function/ runs a workflow. To create a new test:

touch tree/tests/function/{example.t,example.txt}

First we need to populate example.txt we some text to process (in the appropriate language). Then we can write the actual test in example.t.

Say that we have just implemented a new POS tagger, hunpos, and we want to make sure that it plays nicely with the rest of the tools in LAP; a good test workflow is going to run first all the preprocessing tools needed by the POS tagger, then a tool that depends on it, and finally an export tool so that we can make sure we are getting sane output.

The file example.t will look like this:

from lap.test import TestContext
from lap.utils import laptree

# Notice how the parameter of the TestContext() 
# object is equal to the number of tests; 
# 6 for 6 check_tool() calls.
with TestContext(6) as ctx:
    # the check tool function returns a LAP receipt 
    # that is then used as input for the next processing step
    upload = ctx.check_python('import/lap/text.py', 
                              [laptree('tests/function/eng.txt'), None])
    segmented = ctx.check_tool('nltk', 
                               upload, 
                               __process__='punkt')
    repp = ctx.check_tool('repp', 
                          segmented, 
                          segmenter="nltk_punkt", 
                          style="ptb")
    tagged = ctx.check_tool('hunpos', 
                            repp, 
                            model='eng_wsj.model',
                            segmenter='nltk_punkt', 
                            tokenizer='repp')
    parsed = ctx.check_tool('maltparser', 
                            tagged, 
                            segmenter="nltk_punkt",
                            tokenizer="repp", 
                            pos="hunpos", 
                            model="bm_sp_opt.mco")
    ctx.check_tool('export', 
                   parsed, 
                   __process__='tsv', 
                   sentence='any', 
                   token='any', 
                   format='CoNLL-X')

Notice how the parameter of the TestContext() object is equal to the number of tests: 6 for 6 check_tool() calls. Also note that check_tool() calls return LAP receipts, which are then used as input for downstream tools.

We can now run make from the trunk directory and the test will be run together with the rest of the tests in trunk/tree/tests/function/. However, when debugging we should run the verbose version of the tests, which prints all output (stdout, stderr, receipts and exported files) to stdout.

Running the verbose version of example.t from trunk/:

LAP_TESTS_VERBOSE=1 tree/python/lap/python tree/tests/function/example.t
Clone this wiki locally