An experiment towards integrating Google Documents with Awesome Indexes.
In this version, we attempt a much simpler task:
- We start with a (potentially private) Google Document that is used to collect meeting notes.
- We want to summarise what kind of thing is covered in those meetings, without publishing anything sensitive.
- Can we do that by just extracting any hyperlinks and their anchor text?
The current version does this, as follows:
$ python3.11 -m venv .venv
$ source .venv/bin/activate
$ pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib
$ python extract_links.py <DOCUMENT_ID> links.csv
The results are not suitable for publication without manual review! Some links may be sensitive. But it could be used to generate summaries from time to time, with manual oversight.
The script itself was initially created using Gemini 2.5 Pro.
This script was largely based on the Google Docs Python quickstart. Setting up the credentials was more time consuming. The basic flow is outlined in the quickstart document, but some additional steps were needed:
- The first time I ran it, it complained that the credentials didn't have access to the Google Docs API. The error provided a link where that could be added in.
- I added my own email address as a test user account.
- After that, when first running the script, an instance of Chrome was started so I could log in as that test user and authorise the application.
- This created a
token.json
file that the script could use.
At this point, I can run this script on any document that I (the test user) have access to.