Skip to content

NL2G/ScImageV2Dataset

Repository files navigation

ScImage V2 Dataset

Introduction

We introduce ScImage V2, a benchmark dataset for evaluating scientific image generation across four major domains: biology, mathematics, computer science, and physics. ScImage V2 features an expanded terminology set, diverse mathematical functions, and a broad range of chart types. It includes over 2,000 high-quality, template-based text-to-image pairs designed to evaluate fine-grained scientific image generation capabilities. The dataset supports the development and assessment of more capable and reliable multimodal LLMs for scientific applications.

Directory Overview

  • Chart_Types: JSON files specifying all chart types used
  • Filled_Templates: CSVs of all 10 filled template batches (prompt + output)
  • Groupings: Grouped terms used to fill templates
  • Human_Evals: Human annotations for template evaluation, correction, and filtering
  • Plots: PDFs of all chart visuals used in the accompanying paper
  • Python_Code_1000: Python scripts for the first 1000 template examples
  • Python_Images_291: Rendered Python-generated images (291 samples)
  • TikZ_Code_1000: TikZ code for the first 1000 template examples
  • TikZ_Images_291: Rendered TikZ-generated images (291 samples)
  • ScImage_V1: Prompts and templates from ScImage V1 for baseline comparison
  • Scripts: Code for extracting chart types, generating plots, and filling templates
  • Templates: Human-curated templates (domain terms, math functions, charts)
  • ScImage_V2_Presentation: ScImage V2 Dataset Presentation
  • ScImage_V2_Paper: ScImage V2 Dataset Paper

Filled Templates Explanations

  • Understanding_Reasoning_Types: Specifies the types of reasoning the template involves. Attribute, Spatial, Numerical, or any combination of these.
  • Reasoning: Indicating whether the template requires reasoning to be correctly completed.
  • Difficulty: An integer from 1 (easy) to 3 (hard), reflecting the complexity of the template.
  • Template_Type: The category of the template. Options include domain_term (terms from DaTikZ V3), math_function, or chart.
  • Group: High-level domain category of the term, such as CS, Math, Biology, or Computer Science.
  • Subgroup: More specific classification within the selected group such as Computational Geometry for Computer Science.
  • Template: The original template with placeholders, used for inserting selected terms.
  • Chosen_Terms: The specific terms selected by the LLM (GPT-4o) to fill into the template.
  • Filled_Template: The initial version of the template after term insertion, generated by GPT-4o.
  • Corrected_Template: A revised and improved version of the filled template, also generated by GPT-4o.
  • Evaluated_Template: Binary evaluation indicating whether the corrected template is acceptable (1 = good, 0 = still problematic).

Recommendation for Use

For downstream use, prefer the Corrected_Template over the Filled_Template. Evaluated_Template = 1 ensures you are more likely to work with corrected templates that are visualizable.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published