This repository was archived by the owner on Feb 1, 2022. It is now read-only.

Weekly Progress Reports

Jump to bottom

keilexandra edited this page Aug 6, 2012 · 1 revision

Bold denotes current week.

Goals for the future:

Solve Issues Running on the Altocumulus Cloud's Hadoop Cluster
Demonstrate distributed annotation of a batch job containing thousands of documents
Integrate with new Curator version
Make the Curator's cached annotations available via a web interface (as described in the document Curator Web Annotation Service)

Goals for the (final) week ending 10 August:

Finish testing suite
Complete documentation
- Create UML diagrams for all custom classes, as well as their use by Hadoop
Clean up (refactor) code, especially in CuratorReducer
- Place classes in more appropriate package structure
Write final project report
Ensure that future work will be able to smoothly pick up where we left off

During the week ending 3 August:

Prepared final poster/presentation
Integrated self-killing annotator additions into the Curator
Tested interface on real Hadoop cluster (built unsuccessful workaround with 32 instances on network disk)
- Documented Issues Running on the Altocumulus Cloud's Hadoop Cluster
Composed README starting documentation page

During the week ending 27 July:

Continued code modifications in light of testing
Finished implementing dependency handling
- If the prior document annotations in the input are inconsistent, allow user to override dependencies to run
Finalized user interface
Began installation of prerequisite software on a real Hadoop cluster

During the week ending 20 July:

Continued testing annotation tools within Hadoop
Resolved uncertainties in tool behavior
Began modifications to scripts/clients to automatically handle dependencies
- Chained intermediate MapReduce jobs to avoid unnecessary copying of files between runs of different tools
Finished overarching shell scripts
Prepared report on design of a web interface for requesting annotations from the Curator
Prepared presentations for Wednesday meetings

During the week ending 13 July:

Overhauled existing Hadoop interface code to work properly with Curator's Thrift interface
- Fixed serialization and deserialization of Record
Finished both external and Hadoop-side CuratorClients
- Ready for testing
Began writing unit tests
Began testing tools together (prepping input in the Curator, sending it to Hadoop, launching the Hadoop job, and copying the output back)

During the week ending 6 July:

(Nearly) finished construction of the CuratorReducer (for Hadoop), which interfaces with the Hadoop-side CuratorClient
- Ready for testing
Created both local and Hadoop-side CuratorClient
- Ready for testing
Began modifications to Curator/Thrift interface
Created a means of serializing Curator's Records
- Ready for testing
Prepared presentations for Wednesday meetings

29 June and earlier:

Completed construction of the Hadoop input data structures
Completed testing of Hadoop data structures
Constructed most of the Hadoop-side classes
Completed researching Hadoop versus other distributive frameworks