You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+14-10Lines changed: 14 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -29,6 +29,8 @@ func main() {
29
29
options:= octopus.GetDefaultCrawlOptions()
30
30
options.MaxCrawlDepth = 3
31
31
options.TimeToQuit = 10
32
+
options.CrawlRatePerSec = 5
33
+
options.CrawlBurstLimitPerSec = 8
32
34
options.OpAdapter = opAdapter
33
35
34
36
crawler:= octopus.New(options)
@@ -43,21 +45,23 @@ Customizations can be made by supplying the crawler an instance of `CrawlOptions
43
45
44
46
```go
45
47
typeCrawlOptionsstruct {
46
-
MaxCrawlDepthint64// Max Depth of Crawl, 0 is the initial link.
47
-
MaxCrawledUrlsint64// Max number of links to be crawled in total.
48
-
StayWithinBaseHostbool// [Not-Implemented-Yet]
49
-
CrawlRateint64// Max Rate at which requests can be made (req/sec).
50
-
CrawlBurstLimitint64// Max Burst Capacity (should be atleast the crawl rate).
51
-
RespectRobotsbool// [Not-Implemented-Yet]
52
-
IncludeBodybool// Include the Request Body (Contents of the web page) in the result of the crawl.
53
-
OpAdapterOutputAdapter// A user defined crawl output handler (See next section for info).
54
-
ValidProtocols []string// Valid protocols to crawl (http, https, ftp, etc.)
55
-
TimeToQuitint64// Timeout (seconds) between two attempts or requests, before the crawler quits.
48
+
MaxCrawlDepthint64// Max Depth of Crawl, 0 is the initial link.
49
+
MaxCrawledUrlsint64// Max number of links to be crawled in total.
50
+
StayWithinBaseHostbool// [Not-Implemented-Yet]
51
+
CrawlRatePerSecint64// Max Rate at which requests can be made (req/sec).
52
+
CrawlBurstLimitPerSecint64// Max Burst Capacity (should be atleast the crawl rate).
53
+
RespectRobotsbool// [Not-Implemented-Yet]
54
+
IncludeBodybool// Include the Request Body (Contents of the web page) in the result of the crawl.
55
+
OpAdapterOutputAdapter// A user defined crawl output handler (See next section for info).
56
+
ValidProtocols[]string// Valid protocols to crawl (http, https, ftp, etc.)
57
+
TimeToQuitint64// Timeout (seconds) between two attempts or requests, before the crawler quits.
56
58
}
57
59
```
58
60
59
61
A default instance of the `CrawlOptions` can be obtained by calling `octopus.GetDefaultCrawlOptions()`. This can be further customized by overriding individual properties.
60
62
63
+
**NOTE:** If rate-limiting is not required, then just ignore(don't set value) both `CrawlRatePerSec` and `CrawlBurstLimitPerSec` in the `CrawlOptions`.
64
+
61
65
### Output Adapters
62
66
63
67
An Output Adapter is the final destination of a crawler processed request. The output of the crawler is fed here, according to the customizations made before starting the crawler through the `CrawlOptions` attached to the crawler.
0 commit comments