Skip to content

DAML User Guide

SteveLiang edited this page Apr 16, 2021 · 8 revisions

Data Annotator for Machine Learning (DAML) is designed to enable an end-to-end data annotation process for common data types. Here we provide a high level users guide of the key features in DAML:

Create a new data annotation project:

  • From the Projects tab, click Create New Annotation Project and choose the project type.
    • Supported projects types are:
      • text classification
      • tabular
      • named entity recognition (NER)
      • log classification
      • image classification
  • Depending on the annotation project type, you will be asked different project setup questions. In general, the requirements are a project name, uploading data, label values, configuring active learning, and assigning to annotators via email. Here, we show the project set up for a NER project:
    • INSERT_NER_SETUP_IMAGE
  • Click Create to complete the project set up.
  • You will receive an email notification confirming the project creation and this project will show up in the Projects tab
    • Annotators will receive an email link to join the project and start annotating

Annotate a Project:

The design objective for DAML is to provide an interface to focus on a single task to enable rapid annotations. To start annotating, you must be assigned to the project.
  • From the Annotate tab, click START on the project of your choice. This example will use an NER project:
    • INSERT_NER_PICTURE
  • On the left hand side menu (which can be toggled to hide), you have the following:
    • Projects selector to switch between projects
    • Project info including annotation instructions from the Project Owner
    • Your Progress on the current annotation project
    • A history of your labels in this session
  • On the right hand side, you are presented the Original Ticket which is one entry from the overall project
    • The flag icon next to this entry allows the annotator to send this entry to the Project Owner to review for fit (eg; the entry might not fit the current set of labels or is bad data)
    • In an NER project, the annotator can select the entity (one of the buttons) and then click the text from the entry to highlight
      • Note: a single click will annotate the clicked word or you can select a span of text to be annotated as this entity
      • INSERT_IMAGE_NER_ANNOTATION
    • At any time, you may skip the current entry, return to a previous entry
  • Click EXIT at any time to stop annotating. Your progress is automatically saved for resumption later

Tracking the Project Progress

DAML aggregates all annotation actions in real time so you can see the progress of all annotators.

In the Projects tab, choose click on the name of the project to view the overall progress:

  • On the top you will see overall project details in addition to two charts:
    • # Annotations Per User
    • # Annotations Per Category
  • Underneath the charts, you will see two tabs:
    • Annotations tab which presents all currently annotated examples in a table format for your review
    • Flag tab which presents all examples flagged by users for review. For a flagged ticket, you have two options:
      • Delete the example from the project. This will permanently remove this example from the dataset.
      • Silence the flag will return the specific example back into the pool which will be shown to annotators again
    • For projects with Active Learning support, you will see an additional Active Learning tab showing the computed accuracy over time of models which are used to query annotators

Manage a Project

Project owners can modify existing projects (add or remove data from a project, edit project owners and annotators, and more), export and share data to service users, and have a full view of annotation progress as well as resolve any data conflicts.
Update project details
  • On the Projects tab, under the actions column, choose "Edit Project" to see the following options:
  • Project Name
  • Project Owners
    • Add using comma separated emails
    • Delete by clicking "X" next to a user's email
  • Annotators
    • Add using comma separated emails
    • Delete by clicking "X" next to a user's email
  • Labels
    • Add new individual labels
    • Delete by clicking "X" next to a label
      • Note: if a label is already in-use for any entries in your project, it cannot be deleted
  • Assignment Logic: choose from Random or Sequential
Add Data
  • On the Projects tab, under the actions column, click the Append New Entries icon to see the following options:
  • Quick Add:
    • Add individual entries matching the headers of your data
      • Note: for Logs or computer vision, you may add individual files matching the required format
  • Bulk Add:
    • Upload a CSV or a zip file depending on your project setup
      • Note: for CSVs, you must match the same column headers or the file will be rejected
Export labeled data
  • On the Projects tab, under the actions column, click the "Download Project" icon to see the following options:
  • Choose an Export format and check if you want all un-labled entries removed:
    • Standard: a DAML format suitable for ML tasks
      • Note: for NER, Log annotation, and image classification projects, Standard is the only available option
    • Top: adds a "top" column to the Standard data export based on the number of labels for a specific entry
    • Probabilistic: adds the ratio of each label to the overall label count for a specific entry
Share Dataset
You have the option to share any annotated dataset within the service.
  • On the Projects tab, under the actions column, click the "Share Dataset" icon
  • Provide a description of the dataset and click OK, the dataset will now be available in the  Community Dataset tab
Delete Project
You have the option to delete any unused projects.
  • On the Projects tab, under the actions column, click the Delete icon
  • WARNING: Deletes are permanent and you must confirm before a deletion takes place.
Data Management: input and export formats follow best practices to enable seamless integration with ML frameworks. Data is sharable to annotators, project owners as well as service users. Data is uploaded in its original format, extracted for a particular project and retained for N days. At anytime, you can append new datasets to your annotation projects.

Active Learning: Active learning works with annotators by continuously training and improving an ML model using the most recently annotated data to query annotators to label the data that matter the most, therefore reducing the amount of labeled data to achieve similar accuracies.

Using the DAML API : DAML provides a set of common APIs to manage your data annotation projects. A swagger UI is available for easy interactivity at /api-docs/. You can easily plug in your favorite ML models as annotators using the APIs.

Clone this wiki locally