This repository was archived by the owner on Feb 1, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
Weekly Progress Reports
keilexandra edited this page Aug 6, 2012
·
1 revision
Bold denotes current week.
Goals for the future:
- Solve Issues Running on the Altocumulus Cloud's Hadoop Cluster
- Demonstrate distributed annotation of a batch job containing thousands of documents
- Integrate with new Curator version
- Make the Curator's cached annotations available via a web interface (as described in the document Curator Web Annotation Service)
Goals for the (final) week ending 10 August:
- Finish testing suite
- Complete documentation
- Create UML diagrams for all custom classes, as well as their use by Hadoop
- Clean up (refactor) code, especially in CuratorReducer
- Place classes in more appropriate package structure
- Write final project report
- Ensure that future work will be able to smoothly pick up where we left off
During the week ending 3 August:
- Prepared final poster/presentation
- Integrated self-killing annotator additions into the Curator
- Tested interface on real Hadoop cluster (built unsuccessful workaround with 32 instances on network disk)
- Composed README starting documentation page
During the week ending 27 July:
- Continued code modifications in light of testing
- Finished implementing dependency handling
- If the prior document annotations in the input are inconsistent, allow user to override dependencies to run
- Finalized user interface
- Began installation of prerequisite software on a real Hadoop cluster
During the week ending 20 July:
- Continued testing annotation tools within Hadoop
- Resolved uncertainties in tool behavior
- Began modifications to scripts/clients to automatically handle dependencies
- Chained intermediate MapReduce jobs to avoid unnecessary copying of files between runs of different tools
- Finished overarching shell scripts
- Prepared report on design of a web interface for requesting annotations from the Curator
- Prepared presentations for Wednesday meetings
During the week ending 13 July:
- Overhauled existing Hadoop interface code to work properly with Curator's Thrift interface
- Fixed serialization and deserialization of Record
- Finished both external and Hadoop-side CuratorClients
- Ready for testing
- Began writing unit tests
- Began testing tools together (prepping input in the Curator, sending it to Hadoop, launching the Hadoop job, and copying the output back)
During the week ending 6 July:
- (Nearly) finished construction of the CuratorReducer (for Hadoop), which interfaces with the Hadoop-side CuratorClient
- Ready for testing
- Created both local and Hadoop-side CuratorClient
- Ready for testing
- Began modifications to Curator/Thrift interface
- Created a means of serializing Curator's Records
- Ready for testing
- Prepared presentations for Wednesday meetings
29 June and earlier:
- Completed construction of the Hadoop input data structures
- Completed testing of Hadoop data structures
- Constructed most of the Hadoop-side classes
- Completed researching Hadoop versus other distributive frameworks