Skip to content

🎯 [Feature Request]: Markdown Editor with Live Preview - "Editing" #8676

@Zenobia2222

Description

@Zenobia2222

TL;DR: Enable "editing" for markdown text - allow annotators to edit content (detoxify, refine, curate). Critical for LLM training data curation workflows.


❌ Is your feature request related to a problem? Please describe.

I'm working on LLM fine-tuning data curation where domain experts need to simultaneously edit and label markdown-formatted documents. Current Label Studio only supports view-only annotation, but our workflow requires "editing".

Pain points:

  • 🚫 No content editing capability (can only annotate existing text)
  • πŸ”„ Users must copy-paste to external editors (VSCode), edit, paste back - breaking the workflow
  • πŸ“ No modification tracking

This is critical for data curation tasks where experts need to:

  • ✏️ Edit: Remove toxic/outdated content, adapt for legal compliance
  • 🏷️ Label: Classify content (license types, political sensitivity, domain categories)

βœ… Describe the solution you'd like

Integrate a markdown editor (Monaco Editor/CodeMirror) with split-pane live preview to enable "labeling".

Key features:

  1. πŸ“± Split-pane interface: Raw markdown editor (left) + live rendered preview (right) with synchronized scrolling
  2. πŸ” Change tracking: Track modifications as annotations, view diff, export both original and edited versions
  3. πŸ“€ Export: Both edited markdown content and annotations/labels

Typical workflow:

1. πŸ“₯ Import JSON documents (markdown text field)
2. πŸ‘€ Annotator reviews in split-pane view
3. ✏️ Edit content (detoxify, remove outdated info, refine)
4. 🏷️ Add labels/tags (e.g., "license-type", "political-content")
5. πŸ“€ Export as JSON dataset with refined content + annotations

πŸ€” Describe alternatives you've considered

  1. External editors (current workaround): Copy to VSCode β†’ edit β†’ paste back
    ❌ Problem: Breaks workflow, no tracking, error-prone

  2. Pre-render markdown to HTML: Import pre-rendered HTML
    ❌ Problem: No editing, only annotation

⚠️ None support the integrated "editing" workflow inside Label Studio needed for data curation.


πŸ“‹ Additional context

Use cases:

  • 🧹 Content detoxification (remove toxic/outdated content)
  • βš–οΈ License classification per paragraph
  • πŸ—³οΈ Political content identification
  • 🎯 Domain-specific refinement for LLM training data

Technical details:

  • πŸ’» Monaco Editor (VSCode's editor) recommended - mature, excellent markdown support
  • πŸ“ Document size: ~1000 chars per block (browser-friendly)
  • πŸ”§ Should integrate with Label Studio's existing labeling XML configuration

Sample data format:

{
  "data": {
    "text": "# Heading\n\nParagraph with **bold** text...",
    "metadata": {"source": "book-v1", "document_id": "doc-123"}
  }
}

Why this matters:

  • πŸš€ Extends Label Studio's paradigm: From "annotate existing content" to "curate and annotate"
  • πŸ“ Markdown is the de facto format for LLM training data
  • ⚑ Data curation is a critical bottleneck in LLM fine-tuning
  • πŸ”— Unifies editing and annotation workflows in one tool

Visual mockup:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Editor (Raw Markdown)    β”‚ Preview (Rendered)         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ # Title                  β”‚ Title                      β”‚
β”‚ ## Section 1             β”‚ ══════                     β”‚
β”‚ This is **important**    β”‚ Section 1                  β”‚
β”‚ - Item 1                 β”‚ This is important          β”‚
β”‚                          β”‚ β€’ Item 1                   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Labels: [Political] [Legal-Review] [License: CC-BY]   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ’¬ I'm happy to contribute or provide more details about this workflow!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions