Feature Request: Ignore Parameter Values in Crawler #1337

Rand0x · 2025-01-22T07:32:59Z

Rand0x
Jan 22, 2025

Please describe your feature request:

I would like to request a feature for the Katana crawler that allows users to ignore the values of URL parameters during the crawling process. Currently, Katana crawls all variations of a URL, including those with different parameter values, which can lead to excessive crawling of fundamentally similar pages. For instance, the URLs "http://example.com?param1=1&param2=2" and "http://example.com?param1=2&param2=1" may lead to nearly identical content, yet they are treated as completely distinct pages by the crawler.

Describe the use case of this feature:

The primary motivation for this feature is to optimize the crawling efficiency of Katana. By ignoring the specific values of parameters, users can reduce the number of redundant requests made during a crawl. This would not only improve the crawling speed but also minimize the load on the target server, helping to avoid potential rate limiting or being flagged for excessive requests.

In practice, this feature could be particularly beneficial for users who work with large websites that have numerous parameters appended to their URLs, enabling a more streamlined and effective crawling process. It would help ensure that Katana focuses on the structural aspects of the site rather than getting caught in unnecessary loops due to value variations in query strings.

Thank you for considering this feature request to enhance the capabilities of the Katana crawler.

Additionally: It may be beneficial to allow users to choose which parameters to ignore, potentially by passing them as a list.

GeorginaReeder · 2025-01-22T11:45:45Z

GeorginaReeder
Jan 22, 2025

Thanks so much for your feature request @Rand0x , we'll take a look into this!

0 replies

dogancanbakir · 2025-01-23T11:59:33Z

dogancanbakir
Jan 23, 2025
Collaborator

@Rand0x, thank you for your feature request. Have you explored the -ignore-query-params option?

0 replies

Rand0x · 2025-01-23T14:43:11Z

Rand0x
Jan 23, 2025
Author

@dogancanbakir, thank you for your answer. Yes, I already explored -iqp

As you can see in the image, I have a website that handles the CSRF token via the newtoken parameter. Therefore, it would be very helpful if I could say, "Ignore the newtoken parameter, even if it changes." With the -iqp option, only 3 pages are crawled, but there are many more. Katana also proves this. However, it gets caught in an endless loop because the parameter changes with each request.

0 replies

dogancanbakir · 2025-01-27T09:24:41Z

dogancanbakir
Jan 27, 2025
Collaborator

@Rand0x So, are you looking for an option to skip or ignore, for example, URLs that include the newtoken query parameter?

0 replies

ehsandeep · 2025-01-27T10:00:52Z

ehsandeep
Jan 27, 2025
Maintainer

@Rand0x -crawl-out-scope option supports regex to define out of scope items, you can use this to exclude the pattern you wanted to exclude in any part of URL including query parameters.

   -cos, -crawl-out-scope string[]  out of scope url regex to be excluded by crawler

0 replies

Rand0x · 2025-01-27T13:20:27Z

Rand0x
Jan 27, 2025
Author

@Rand0x So, are you looking for an option to skip or ignore, for example, URLs that include the newtoken query parameter?

Yes, correct.

@Rand0x -crawl-out-scope option supports regex to define out of scope items, you can use this to exclude the pattern you wanted to exclude in any part of URL including query parameters.
   -cos, -crawl-out-scope string[]  out of scope url regex to be excluded by crawler

I do not want to exclude links which contain the newtoken parameter. I want to exclude duplicates.

example.com?file=abc.pdf&newtoken=12312312

example.com?file=deaf.pdf&newtoken=44332145&user=1

2 different sites, but every site links the other site, with a new value of the parameter newtoken

0 replies

dogancanbakir · 2025-07-23T15:18:40Z

dogancanbakir
Jul 23, 2025
Collaborator

@Rand0x Thank you for providing all the details. However, I believe this is a very specific use case. To implement this, we need to develop a more generalized version. I'm currently unable to think of one. I’ll close this for now, but if you have any other ideas, please let us know.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Ignore Parameter Values in Crawler #1337

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 7 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Feature Request: Ignore Parameter Values in Crawler #1337

Uh oh!

Uh oh!

Rand0x Jan 22, 2025

Please describe your feature request:

Describe the use case of this feature:

Replies: 7 comments

Uh oh!

GeorginaReeder Jan 22, 2025

Uh oh!

dogancanbakir Jan 23, 2025 Collaborator

Uh oh!

Rand0x Jan 23, 2025 Author

Uh oh!

dogancanbakir Jan 27, 2025 Collaborator

Uh oh!

ehsandeep Jan 27, 2025 Maintainer

Uh oh!

Rand0x Jan 27, 2025 Author

Uh oh!

dogancanbakir Jul 23, 2025 Collaborator

Rand0x
Jan 22, 2025

GeorginaReeder
Jan 22, 2025

dogancanbakir
Jan 23, 2025
Collaborator

Rand0x
Jan 23, 2025
Author

dogancanbakir
Jan 27, 2025
Collaborator

ehsandeep
Jan 27, 2025
Maintainer

Rand0x
Jan 27, 2025
Author

dogancanbakir
Jul 23, 2025
Collaborator