Skip to content

Commit f112601

Browse files
committed
Update README.md
1 parent 111f66b commit f112601

File tree

1 file changed

+14
-0
lines changed

1 file changed

+14
-0
lines changed

engine/README.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,19 @@ motivation.
4848
python main.py --online
4949
```
5050

51+
---
52+
53+
**Important:**
54+
55+
- The online pipeline will run until you stop it manually, or it reaches the maximum number of sites.
56+
- You can adapt the configuration in the `main.py`. The crawler has alot of options to configure.
57+
- The online pipeline will start a lot of threads, so it can be quite resource-intensive. You can limit the number of
58+
- You need a lot of RAM (~20 GB of RAM) for the offline pipeline.
59+
threads in the `main.py` file.
60+
- Have fun crawling the web!
61+
62+
---
63+
5164
### Start the server:
5265

5366
```shell
@@ -75,6 +88,7 @@ You can see a list of all available routes by navigating to <http://localhost:80
7588
- The server will only work if you have crawled some pages before.
7689
- For the summarization you will need a strong CPU and a lot of RAM, as the summarization is done on the fly and can be
7790
quite resource-intensive.
91+
7892
---
7993

8094
## Known Issues

0 commit comments

Comments
 (0)