-
Notifications
You must be signed in to change notification settings - Fork 70
Description
dlt version
1.11.0
Source name
zendesk
Describe the problem
The Zendesk source currently uses the updated_at field as the cursor for incremental loads. However, according to Zendesk’s documentation, the API performs incremental filtering based on the generated_timestamp field — not updated_at.
Because of this, the current implementation may miss updates that are only reflected in generated_timestamp, such as system updates. These changes will not be captured if updated_at remains unchanged, leading to data gaps.
Expected behavior
The incremental loading logic should use generated_timestamp as the cursor field instead of updated_at. This ensures that all updates — including system updates — are captured by the incremental extractor.
Per the Zendesk documentation:
The endpoint can return tickets with an updated_at time that's earlier than the start_time time. The reason is that the API compares the start_time with the ticket's generated_timestamp value, not its updated_at value. [...] The generated_timestamp value is updated for all ticket updates, including system updates.
Steps to reproduce
Suppose there's a ticket #12345 that was last updated by an agent on May 1st at 12:00 PM. At that point:
updated_at = 2025-05-01T12:00:00Z
generated_timestamp = 2025-05-01T12:00:00Z
On May 2nd at 09:00 AM, a system automation changes the ticket’s priority (e.g., due to an SLA policy or scheduled trigger). This update:
-
does not create a new ticket event
-
causes generated_timestamp to be updated to
2025-05-02T09:00:00Z
-
updated_at remains
2025-05-01T12:00:00Z
The DLT incremental pipeline is restarted with a start_time = 2025-05-02T00:00:00Z, expecting to fetch any ticket updated after May 2nd.
The Zendesk API returns ticket #12345, because its generated_timestamp > start_time.
But the DLT loader compares the updated_at field (2025-05-01T12:00:00Z) with the start_time (2025-05-02T00:00:00Z) and skips the record, assuming it is out of range.
Result: Ticket #12345 is missed, even though it was modified.
How you are using the source?
I run this source in production.
Operating system
macOS
Runtime environment
Local
Python version
3.11
dlt destination
duckdb
Additional information
No response
Metadata
Metadata
Assignees
Labels
Type
Projects
Status