A scrapper which generates a google sheet with the top 10 posts of each month for a given subreddit
- Install nodejs
- Clone the repository locally (or a fork of it):
git clone git@github.com:Trekiros/BestOfReddit.git
- In your local repository, install code dependencies:
npm i
- Create a copy of
conf-template.yml
, namedconf.yml
, modifyingconf.yml
for your specific use case. - Create a Google Sheet, and note its id (from its URI) in
conf.yml
, under thespreadsheetId
field - Enable Google Sheets API for your Google API, and download a
credentials.json
file following the instructions found here. Make sure you use the same Google Account which was used for the creation of the spreadsheet. - Write down the
client_id
,project_id
andclient_secret
fields inconf.yml
, using the values found incredentials.json
. Never share or commit these credentials. They could be used to access and modify all of your Google Spreadsheets. - Create a Reddit script app here.
- Use this new Reddit app to complete the
appId
andappSecret
fields of the reddit category inconf.yml
. Never share or commit these credentials. They could be used to access your reddit account. - Complete the
username
andpassword
fields in the reddit category ofconf.yml
, using the credentials of the reddit account which has created the reddit app. If you do not wish to use your personal reddit account for this, the project can just as easily be ran using a new reddit account. - Run the project for the first time:
npm start
ornode .
- On the first run, the project will ask you to follow a link to grant it authority over your Google Sheets file. Make sure you use the same Google Account which was used for the creation of the spreadsheet.
- It will then create a file named
token.json
which lets it bypass the last step on subsequent runs. Never share or commit this file. It could be used to access and modify all of your Google Spreadsheets.
To contribute, fork this project, and make a pull request with your changes. I will then review the pull request, notably to ensure no changes are made which could compromise users' credentials.
This project could be deployed on any number of platforms. Heroku was chosen as the example because the projects is designed to be ran in a monthly cron job, and Heroku provides free options for this use case.
- Fork this project, and run it locally using the instructions found above. This ensures you have a
conf.yml
and atoken.json
file, which will be needed to configure Heroku. - Create a Heroku account here
- Create a new app
- Create a production pipeline for your app
- In the
Resources
tab, add theHeroku Scheduler
andLogentries
add-ons to your pipeline - Configure
Heroku Scheduler
to run the project periodically (command:npm run start
). Ideally, this project is to be ran monthly, but Heroku Scheduler only goes up to daily. The project can be ran more often without issue (no duplicate months in the output spreadsheet, or errors), but this would be a waste of computing power. - Start following logs in real time in
Logentries
, to ensure that things are working properly - In the
Settings
tab, clickReveal Config Vars
, and add the following environment variables:conf
: copy your localconf.yml
filegoogleToken
: copy your localtoken.json
file
- In the
Deployment Method
tab, link the pipeline to your fork of the project. This should launch the project for the first time and automatically detect that it is a nodejs project. You can then re-run it manually from this same tab, but theHeroku Scheduler
add-on will update it periodically without your input.
- Use the Reddit API directly rather than pushshift