`gartenpflege`

Headless scraper using Puppeteer and TypeScript parsers to extract information from some big platforms.

for YouTube, we use our parser harke
for TikTok, we use our parser schaufel

It's mainly used for testing the parsers as well as monitoring, e.g. YouTube News playlists (on a server without using YouTube's API). We use gartenpflege to save HTML to write tests for our parsers. So we use gartenpflege to help writing parsers. But we also use our parsers (when they are working) to extract data with gartenpflege. Our parsers only take a HTML string as input. Thus, we need to get the HTML from somewhere.

Installation

git clone https://github.com/algorithmwatch/gartenpflege.git
cd gartenpflege
npm install

Usage: TikTok

Login

npm run cli --  --tiktok --login --credentials "xx@xx.xx:xx"

email:password

Scrape

npm run cli --  --tiktok --feed --scroll 30

Usage: YouTube

The YouTube part is legacy and should be improved (if anybody is working on YouTube again).

Documentation

Log in to Google (complicated)

We have to use some obfuscation to make the Google login work. We are using: https://github.com/berstend/puppeteer-extra/tree/master/packages/puppeteer-extra-plugin-stealth

Firefox

Alternativly, we could use puppeteer with Firefox. To setup, specify this in .launch:

  product: 'firefox',

yarn remove puppeteer
PUPPETEER_PRODUCT=firefox npm i puppeteer

Unfortunatly, using it with Firefox was buggy. (But Google was not blocking the login)

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.vscode		.vscode
src		src
.editorconfig		.editorconfig
.eslintrc		.eslintrc
.gitignore		.gitignore
.npmrc		.npmrc
README.md		README.md
nodemon.json		nodemon.json
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

`gartenpflege`

Installation

Usage: TikTok

Login

Scrape

Usage: YouTube

Documentation

Log in to Google (complicated)

Firefox

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

algorithmwatch/gartenpflege

Folders and files

Latest commit

History

Repository files navigation

gartenpflege

Installation

Usage: TikTok

Login

Scrape

Usage: YouTube

Documentation

Log in to Google (complicated)

Firefox

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

`gartenpflege`

Packages