Skip to content

Suggestion: empty user-agent should be identified as crawler #2798

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
fekir opened this issue Mar 7, 2025 · 4 comments
Open

Suggestion: empty user-agent should be identified as crawler #2798

fekir opened this issue Mar 7, 2025 · 4 comments

Comments

@fekir
Copy link

fekir commented Mar 7, 2025

I use the option --no-crawler, and noticed that requests with empty user-agents are not marked as such

<ip address> - - [28/Feb/2025:17:13:33 +0100] "GET /404.php HTTP/1.1" 200 11729 "-" "-"
<ip address> - - [28/Feb/2025:17:13:33 +0100] "GET /wp.php HTTP/1.1" 200 11728 "-" "-"
<ip address> - - [28/Feb/2025:17:13:33 +0100] "GET /wp-head.php HTTP/1.1" 200 11733 "-" "-"
<ip address> - - [28/Feb/2025:17:13:33 +0100] "GET /images/uploader.php HTTP/1.1" 200 11743 "-" "-"
<ip address> - - [28/Feb/2025:17:13:33 +0100] "GET /upload/upload.php HTTP/1.1" 200 11741 "-" "-"

I'm not aware of any browser that does not use an user-agent, the most probable cause are some crawler that do not bother to set it.
I think it would make sense to mark them by default as crawlers.

@allinurl
Copy link
Owner

allinurl commented Mar 8, 2025

That's a good point, have you tried --unknowns-as-crawlers to see if that helps?

@fekir
Copy link
Author

fekir commented Mar 8, 2025

have you tried --unknowns-as-crawlers to see if that helps?

Just tested, it helps.

Is there any way to save the filtered logs to a file?

Something like

cat * | goaccess - --no-crawler --unknowns-as-crawlers --print-logs > file.txt

@allinurl
Copy link
Owner

Are you referring to the actual report values? You can export them as JSON or CSV.

cat * | goaccess - --no-crawler --unknowns-as-crawlers -o report.json 

or

cat * | goaccess - --no-crawler --unknowns-as-crawlers -o report.csv

@fekir
Copy link
Author

fekir commented Mar 11, 2025

Are you referring to the actual report values? You can export them as JSON or CSV.

Yes, I was hoping I could tell goaccess to print them as is; without converting th to json or csv, so that I could (ab)use the filter functionality of goaccess, and then process the data with different tools; goaccess included.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants