Skip to content

Use generated_timestamp instead of updated_at for incremental export on Zendesk source #621

@lorransr

Description

@lorransr

dlt version

1.11.0

Source name

zendesk

Describe the problem

The Zendesk source currently uses the updated_at field as the cursor for incremental loads. However, according to Zendesk’s documentation, the API performs incremental filtering based on the generated_timestamp field — not updated_at.

Because of this, the current implementation may miss updates that are only reflected in generated_timestamp, such as system updates. These changes will not be captured if updated_at remains unchanged, leading to data gaps.

Expected behavior

The incremental loading logic should use generated_timestamp as the cursor field instead of updated_at. This ensures that all updates — including system updates — are captured by the incremental extractor.

Per the Zendesk documentation:

The endpoint can return tickets with an updated_at time that's earlier than the start_time time. The reason is that the API compares the start_time with the ticket's generated_timestamp value, not its updated_at value. [...] The generated_timestamp value is updated for all ticket updates, including system updates.

Steps to reproduce

Suppose there's a ticket #12345 that was last updated by an agent on May 1st at 12:00 PM. At that point:

updated_at = 2025-05-01T12:00:00Z

generated_timestamp = 2025-05-01T12:00:00Z

On May 2nd at 09:00 AM, a system automation changes the ticket’s priority (e.g., due to an SLA policy or scheduled trigger). This update:

  • does not create a new ticket event

  • causes generated_timestamp to be updated to 2025-05-02T09:00:00Z

  • updated_at remains 2025-05-01T12:00:00Z

The DLT incremental pipeline is restarted with a start_time = 2025-05-02T00:00:00Z, expecting to fetch any ticket updated after May 2nd.

The Zendesk API returns ticket #12345, because its generated_timestamp > start_time.

But the DLT loader compares the updated_at field (2025-05-01T12:00:00Z) with the start_time (2025-05-02T00:00:00Z) and skips the record, assuming it is out of range.

Result: Ticket #12345 is missed, even though it was modified.

How you are using the source?

I run this source in production.

Operating system

macOS

Runtime environment

Local

Python version

3.11

dlt destination

duckdb

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Planned

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions