Skip to content

fix(dnstap source): implement all TCP dnstap options to reduce error #23123

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

esensar
Copy link
Contributor

@esensar esensar commented May 30, 2025

Summary

The dnstap TCP source was initially built based on the TCP socket source, but not all of the options were correctly implemented, which left the request limiter unused. This also changes the multithreaded approach to do async-aware waits, because previous implementation used thread sleep in async context.

Change Type

  • Bug fix
  • New feature
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

How did you test this PR?

Besides the included tests, this was also tested in real environments and it has not yet had errors like described in #20744.

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the "no-changelog" label to this PR.

Notes

  • Please read our Vector contributor resources.
  • Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
  • The CI checks run only after we manually approve them.
    • We recommend adding a pre-push hook, please see this template.
    • Alternatively, we recommend running the following locally before pushing to the remote branch:
      • cargo fmt --all
      • cargo clippy --workspace --all-targets -- -D warnings
      • cargo nextest run --workspace (alternatively, you can run cargo test --all)
      • ./scripts/check_changelog_fragments.sh
  • After a review is requested, please avoid force pushes to help us review incrementally.
    • Feel free to push as many commits as you want. They will be squashed into one before merging.
    • For example, you can run git merge origin master and git push.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run cargo vdev build licenses to regenerate the license inventory and commit the changes (if any). More details here.

References


Sponsored by Quad9

The dnstap TCP source was initially built based on the TCP socket source, but not all of the options
were correctly implemented, which left the request limiter unused.
This also changes the multithreaded approach to do async-aware waits, because previous
implementation used thread sleep in async context.

Related: vectordotdev#20744
@esensar esensar requested a review from a team as a code owner May 30, 2025 07:48
@github-actions github-actions bot added the domain: sources Anything related to the Vector's sources label May 30, 2025
@johnhtodd
Copy link

We've been running this patch in production on half a dozen large collectors (probably half a trillion records processed?) for several days, with no issues. This has fixed our problems with dnstap sockets falling over and not recovering. Very much in favor of inclusion.

else => break,
};

let timeout = tokio::time::sleep(Duration::from_millis(10));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's extract this into a constant and also document why this value was selected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied this over from src/sources/util/tcp/mod and I can see it was added in b8fb1e3 . Not really sure why this value was selected, I guess the goal was to pick a fairly short time, but still enough for the connection to be able to take a request.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, it's an existing code smell. From the PR:

A small read timeout was added, and if a connection does not receive any data during that time it will release its permit (and try to obtain a new one) allowing other connections to read.

We can use this as a comment here.

@pront pront added the source: dnstap Anything `dnstap` source related label Jun 10, 2025
esensar and others added 3 commits June 11, 2025 10:18
Co-authored-by: Pavlos Rontidis <pavlos.rontidis@gmail.com>
Co-authored-by: Pavlos Rontidis <pavlos.rontidis@gmail.com>
@pront pront enabled auto-merge June 11, 2025 16:23
Copy link
Member

@pront pront left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

auto-merge was automatically disabled June 12, 2025 07:19

Head branch was pushed to by a user without write access

@pront pront enabled auto-merge June 12, 2025 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: sources Anything related to the Vector's sources source: dnstap Anything `dnstap` source related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants