Skip to content

🐞 Missing Filtering Fields in FLAN Query Generation Inputs #3

@Hyrin-mansoor

Description

@Hyrin-mansoor

Summary

In several FLAN Stage 3 query generation examples, we missed explicitly including fields used in the filtering clause (e.g., filters={'field': value}) in the input fields list. As a result, the model incorrectly assumes or omits filtering logic, leading to incomplete or incorrect query generation.


Problem

While some examples include both retrieval and filtering fields, others only include output fields, leaving out critical filter fields like disabled, supplier_group, territory, etc.

This inconsistency:

  • Reduces training consistency
  • Affects generalization to filtering-type questions
  • Leads to wrong query structure in multi-field queries

Solution

We need to:

  • Identify all filtering-type questions in the FLAN Stage 3 dataset
  • Ensure that filter fields are added alongside retrieval fields
  • Reformat existing samples to include all required fields for correctness
  • Review and validate updated entries with multi-field and conditional logic

Labels

bug, data-quality, field-mapping, query-logic, high-priority

Metadata

Metadata

Assignees

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions