Skip to content

Annotate a Dataset for NER Trainning #67

@grayJiaaoLi

Description

@grayJiaaoLi

User story

  1. As a data engineer
  2. I want/need to prepare an annotated dataset for NER training
  3. So that the NER model can be trained on accurately tagged data

Acceptance criteria

  • Select a suitable amount of Q&A pairs from the HuggingFace

    • Start with 50-100 Q&A Pairs
  • Optional: Use tools like Doccano to tag entities according to the defined

  • Store the NER training dataset

    • Upload the NER-annotated data in a different directory on HuggingFace
    • Ensure annotated dataset can be used for the NER model
  • The list should contain objects like f.e.:

    • Entity types: Project_Name, Technology_Name, (Organization_Name), ...
    • Entities: Kubernetes, Docker, gRPC,...
    • Relationships: Depends_On, Complements, (Conflicts_with), ...
  • Here is an example:

    • "Example Text"
      • Project_Name: Kubernetes, ...
      • Technology_Name: Docker, gRPC, ...
      • (Organization Name: Google, Red Hat, ...)
      • Relationship: ...
  • Store the list in a format that can be used for the NER model training

  • As for this part of the work, it does not have to be automated but it can be automated

Definition of done (DoD)

  • Bill of Materials in the planning document has been updated
  • All feature branches have been merged and closed
  • New feature code has been documented
  • Potential new licenses have been checked
  • All GitHub Actions are passing
  • The requirement.txt is updated

DoD general criteria

  • Feature has been fully implemented
  • Feature has been merged into the mainline
  • All acceptance criteria were met
  • Product owner approved features
  • All tests are passing
  • Developers agreed to release

Metadata

Metadata

Assignees

No one assigned

    Labels

    User StoryLabel for User Stories

    Type

    No type

    Projects

    Status

    Product Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions