Clone of link gopher browser extension written in Python programming language for educational purposes. For more info see:
As part of installation, CLI tool is provided which can be used to fetch all links from webpages without duplicates while also being able to filter them.
To install cli tool you need:
- Latest version of Mozilla Firefox
- Gecko Driver for your OS
- Python version 3.9.4 or more
After installation follow these steps
- Clone this repository on your local machine
- Change directory to cloned repository
- Run
pip install -e .command
After installation you can use CLI tool.
Basic usage is
link-gopher run --browser=b --src=s --dst=d --in=in --out=out --filter_type=ftype --filter_values=fvalues
Command options have following meanings:
browserchooses type of browser for scraping, with only option supported now beingfirefoxwhich is a default optionsrcdetermines type of source for urls to gopher. Two possible options are:memwhich loads links from memory (default)txtwhich loads links from txt file
dstdetermines where to store fetched links from pages. Two possible options are:memwrites links to stdout (default)txtwrites linkts to txt file
inis used for input path and depends of type of source chosen- For
memsource it is comma separated list of values - For
txtsource it is path to txt file
- For
outis used for ouput path and depends of type of destination chosen- For
memnothing has to be specified - For
txtthis is path to destination file
- For
filter-typedetermines type of filter, currently onlybasicfilter is supportedfilter-valuesdetermines values for filter as comma separated list of values
Link gopher for Python and Github websites which writes into stdout
link-gopher run --src=mem --dst=mem --in=https://python.org,https://github.com
Link gopher which reads from txt file and writes into stdout
link-gopher run --src=text --dst=mem --in=test.txt
Link gopher which reads from txt file and writes to new txt file
link-gopher run --src=mem --dst=mem --in=test.txt --out=test2.txt
Link gopher which filters results
link-gopher run --src=mem --dst=mem --in=test.txt --out=test2.txt --filter-type=basic --filter-values=filter