Skip to content

Text Extraction from Bills #3

Text Extraction from Bills

Text Extraction from Bills #3

Workflow file for this run

name: Text Extraction from Bills
on:
schedule:
- cron: "0 8 * * *" # Daily at 8 AM UTC (~3 AM ET, ~12 AM PT)
workflow_dispatch:
jobs:
extract-text:
name: Text Extraction
runs-on: ubuntu-latest
timeout-minutes: 330 # 5.5 hours (recommended for large datasets)
permissions:
contents: write
steps:
- name: Checkout state repo
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Run text extraction action
uses: windy-civi/toolkit/actions/extract@refactor/v2-data-structure
with:
state: nh # New Hampshire
github-token: ${{ secrets.GITHUB_TOKEN }}
force-update: "false"
- name: Display extraction summary
if: always()
shell: bash
run: |
echo "๐Ÿ“Š Text Extraction Summary"
echo "================================"
echo "โœ… Check country:us/state:*/sessions/*/bills/*/files/ for extracted text files"
echo "๐Ÿ“„ Look for *_extracted.txt files in the files/ directories"
echo ""
echo "โ„น๏ธ Features:"
echo " - Incremental processing (skips already-processed bills)"
echo " - Auto-saves progress every 30 minutes"
echo " - Can be safely restarted if timeout occurs"