-
Notifications
You must be signed in to change notification settings - Fork 1.8k
fix(dnstap source): implement all TCP dnstap options to reduce error #23123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
The dnstap TCP source was initially built based on the TCP socket source, but not all of the options were correctly implemented, which left the request limiter unused. This also changes the multithreaded approach to do async-aware waits, because previous implementation used thread sleep in async context. Related: vectordotdev#20744
We've been running this patch in production on half a dozen large collectors (probably half a trillion records processed?) for several days, with no issues. This has fixed our problems with dnstap sockets falling over and not recovering. Very much in favor of inclusion. |
src/sources/util/framestream.rs
Outdated
else => break, | ||
}; | ||
|
||
let timeout = tokio::time::sleep(Duration::from_millis(10)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's extract this into a constant and also document why this value was selected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied this over from src/sources/util/tcp/mod
and I can see it was added in b8fb1e3 . Not really sure why this value was selected, I guess the goal was to pick a fairly short time, but still enough for the connection to be able to take a request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, it's an existing code smell. From the PR:
A small read timeout was added, and if a connection does not receive any data during that time it will release its permit (and try to obtain a new one) allowing other connections to read.
We can use this as a comment here.
Co-authored-by: Pavlos Rontidis <pavlos.rontidis@gmail.com>
Co-authored-by: Pavlos Rontidis <pavlos.rontidis@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Head branch was pushed to by a user without write access
Summary
The dnstap TCP source was initially built based on the TCP socket source, but not all of the options were correctly implemented, which left the request limiter unused. This also changes the multithreaded approach to do async-aware waits, because previous implementation used thread sleep in async context.
Change Type
Is this a breaking change?
How did you test this PR?
Besides the included tests, this was also tested in real environments and it has not yet had errors like described in #20744.
Does this PR include user facing changes?
Notes
@vectordotdev/vector
to reach out to us regarding this PR.pre-push
hook, please see this template.cargo fmt --all
cargo clippy --workspace --all-targets -- -D warnings
cargo nextest run --workspace
(alternatively, you can runcargo test --all
)./scripts/check_changelog_fragments.sh
git merge origin master
andgit push
.Cargo.lock
), pleaserun
cargo vdev build licenses
to regenerate the license inventory and commit the changes (if any). More details here.References