-
Notifications
You must be signed in to change notification settings - Fork 4
TitanTadm
This page documents a sub-task of the HPC adaptation project at UiO; please see the TitanTop page for background.
In order to eliminate the current biggest bottleneck in the DELPH-IN toolchain, turn-around times in machine learning experiments need to be reduced substantially. The Toolkit for Advanced Discriminative Modeling ([http://tadm.sf.net TADM]) is the main machine learning component used in DELPH-IN research to date. TADM estimates the parameters of so-called discriminative, log-linear (or exponential) statistical models, where the result of this process can subsequently serve to probabilistically rank competing hypotheses, say directing the parser towards the most probable analysis.
TADM is implemented in C++, built on top of the [http://www.mcs.anl.gov/petsc/petsc-2/ PETSc] and [http://www.mcs.anl.gov/research/projects/tao/ TAO] libraries, and was originally developed by [http://www-rohan.sdsu.edu/~malouf/ Rob Malouf] (then at the University of Groningen, The Netherlands). A group of active TADM users, collaborating with Rob, hosted the project at SourceForge around 2004 and consolidated existing patches (including some from UiO). Otherwise, there has been no active TADM development in recent years, and available documentation is sparse.
TADM is applied to training data (typically in the form of millions or billions of integer-coded 'features') prepared using the itsdb software (see the TitanItsdb page), and a single estimation run can take several cpu hours. In searching for best-performing model parameters, dozens or hundreds of distinct configurations need to be tested, typically each by means of ten-fold cross validation. Hence, in current development, TADM throughput is the primary bottleneck.
Reportedly, a parallel version of TADM was available locally at Groningen in the late 1990s (customized for MPICH and Myrinet), and the project will resurrect (and adapt as needed, for use on TITAN) MPI support in TADM. Also, it will be necessary to profile some of the core routines and experiment with different versions of low-level libraries (notably BLAS and LAPACK) and use of the Intel compiler suite (rather than the vanilla GNU Compiler Collection), to further improve the cpu utilization of TADM.
This work package will be predominantly implemented by VD staff, (re-)enabling the incomplete and currently dormant MPI support in the TADM code base. Once the software modifications are complete, a joint series of experiments of increasing complexity will serve to determine the scalability of the TADM core (numeric optimization, processing huge sparse matrices). The extended TADM software will be integrated with the LOGON tree and contributed to the TADM project repository at SourceForge.
The LOGON tree includes pre-compiled TADM binaries, plus a (comparatively small) sample input file. To invoke the parameter estimation procedure, assuming a functional LOGON installation, the following should work:
cd $LOGONROOT/uio/titan
tadm -monitor -events_in small.events.gz -params_out /dev/null
This is a comparatively small sample file and, at present, takes on the order of ten minutes wall clock time. In a nutshell, TADM estimates model parameters for a Maximum Entropy (aka log-linear) statistical model; the model is expressed in terms of large numbers of features (about 1.3 million in our sample), and model estimation means assigning a numerical weight to each feature. TADM goes through a series of iterations, re-estimating model parameters each time, aiming to maximize the probability of the training data (i.e. minimizing KL divergence between current model predictions and the training distribution). Estimation terminates when 'convergence' is observed, where there are various measures to avoid over-fitting the training distribution.
Even though I do not understand TADM internals very well,
Home | Forum | Discussions | Events