- 
                Notifications
    
You must be signed in to change notification settings  - Fork 3.2k
 
Description
TL;DR: Enable "editing" for markdown text - allow annotators to edit content (detoxify, refine, curate). Critical for LLM training data curation workflows.
β Is your feature request related to a problem? Please describe.
I'm working on LLM fine-tuning data curation where domain experts need to simultaneously edit and label markdown-formatted documents. Current Label Studio only supports view-only annotation, but our workflow requires "editing".
Pain points:
- π« No content editing capability (can only annotate existing text)
 - π Users must copy-paste to external editors (VSCode), edit, paste back - breaking the workflow
 - π No modification tracking
 
This is critical for data curation tasks where experts need to:
- βοΈ Edit: Remove toxic/outdated content, adapt for legal compliance
 - π·οΈ Label: Classify content (license types, political sensitivity, domain categories)
 
β Describe the solution you'd like
Integrate a markdown editor (Monaco Editor/CodeMirror) with split-pane live preview to enable "labeling".
Key features:
- π± Split-pane interface: Raw markdown editor (left) + live rendered preview (right) with synchronized scrolling
 - π Change tracking: Track modifications as annotations, view diff, export both original and edited versions
 - π€ Export: Both edited markdown content and annotations/labels
 
Typical workflow:
1. π₯ Import JSON documents (markdown text field)
2. π Annotator reviews in split-pane view
3. βοΈ Edit content (detoxify, remove outdated info, refine)
4. π·οΈ Add labels/tags (e.g., "license-type", "political-content")
5. π€ Export as JSON dataset with refined content + annotations
π€ Describe alternatives you've considered
- 
External editors (current workaround): Copy to VSCode β edit β paste back
β Problem: Breaks workflow, no tracking, error-prone - 
Pre-render markdown to HTML: Import pre-rendered HTML
β Problem: No editing, only annotation 
π Additional context
Use cases:
- π§Ή Content detoxification (remove toxic/outdated content)
 - βοΈ License classification per paragraph
 - π³οΈ Political content identification
 - π― Domain-specific refinement for LLM training data
 
Technical details:
- π» Monaco Editor (VSCode's editor) recommended - mature, excellent markdown support
 - π Document size: ~1000 chars per block (browser-friendly)
 - π§ Should integrate with Label Studio's existing labeling XML configuration
 
Sample data format:
{
  "data": {
    "text": "# Heading\n\nParagraph with **bold** text...",
    "metadata": {"source": "book-v1", "document_id": "doc-123"}
  }
}Why this matters:
- π Extends Label Studio's paradigm: From "annotate existing content" to "curate and annotate"
 - π Markdown is the de facto format for LLM training data
 - β‘ Data curation is a critical bottleneck in LLM fine-tuning
 - π Unifies editing and annotation workflows in one tool
 
Visual mockup:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Editor (Raw Markdown)    β Preview (Rendered)         β
ββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββ€
β # Title                  β Title                      β
β ## Section 1             β ββββββ                     β
β This is **important**    β Section 1                  β
β - Item 1                 β This is important          β
β                          β β’ Item 1                   β
ββββββββββββββββββββββββββββ΄βββββββββββββββββββββββββββββ€
β Labels: [Political] [Legal-Review] [License: CC-BY]   β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π¬ I'm happy to contribute or provide more details about this workflow!