A comprehensive collection of Paul Graham's essays scraped and organized for easy AI-assisted analysis and inspiration.
Paul Graham is a legendary figure in the startup and technology world. He's an English-American computer scientist, writer, essayist, entrepreneur, and investor who co-founded Y Combinator, the most successful startup accelerator in history.
Key accomplishments:
- Co-founded Y Combinator (funded companies like Airbnb, Dropbox, Stripe, Reddit, and thousands more)
- Created Viaweb (sold to Yahoo for $49M, became Yahoo Store)
- Built Hacker News, the premier tech community platform
- Author of influential programming books including "On Lisp" and "Hackers & Painters"
- PhD in Computer Science from Harvard
Connect with PG:
- π¦ Twitter: @paulg
- π Famous Blog: paulgraham.com
His essays are considered essential reading for entrepreneurs, programmers, and anyone interested in startups, technology, and thoughtful commentary on society.
This repository contains:
- 200+ Paul Graham essays scraped from his website and converted to markdown
- Automated scraping and processing scripts for maintaining the collection
- Two AI system prompts for different interaction styles with the knowledge base
- Clean, organized structure for easy browsing and analysis
- Complete essay collection spanning decades of influential writing
Current Recommended Method: Raycast Chat Presets
The most effective way to interact with this knowledge base is through Raycast Chat Presets using:
- Model: Google Gemini 2.5 Flash (excellent for quick queries and reasoning)
- Thinking Setting: Maximum (enables deeper analytical reasoning)
- System Prompt: Load from
SYS_PROMPT/system_prompt.md
orSYS_PROMPT/system_prompt_G.md
- Knowledge Base: Import relevant essays or the complete knowledge base as context
Alternative: Direct AI IDE Integration
While you could set up RAG (Retrieval-Augmented Generation) for this dataset, direct context loading is more effective with modern large context windows.
If you prefer IDE integration:
- Open your favorite AI-powered IDE (mine is Zed π)
- Select Google Gemini 2.0 Flash or 2.5 Pro which has a massive 1M token context window
- Load all the markdown files directly into your prompt
- Ask anything - Who needs RAG when you have 1 million tokens?
- Large Context Windows: Google Gemini models can handle 1M+ tokens in a single context window
- Massive Capacity: 1M tokens β 50,000 lines of code or 8 novels worth of text
- AI-Native IDEs: Tools like Zed provide seamless AI integration with multiple providers
- Simplicity: No indexing delays, no vector databases, no complexity - just pure contextual understanding
- Real-time Analysis: Direct access to the complete knowledge base for instant insights
Startup & Business Advice:
"Based on PG's essays, what would he say about [your startup idea]?"
"What does PG recommend for early-stage startups struggling with product-market fit?"
"Analyze my business model using Paul Graham's principles"
Research & Analysis:
"Find all of PG's advice about hiring and summarize the key principles"
"What are the common patterns in PG's thoughts on successful founders?"
"Compare PG's views on venture capital from different time periods"
Writing & Communication:
"Help me write a PG-style essay about [your topic]"
"What would PG say about [current tech trend] based on his writing patterns?"
"Critique my startup pitch using Paul Graham's communication principles"
PG/
βββ README.md # Project documentation
βββ requirements.txt # Python dependencies
βββ .env.example # Environment variables template
βββ .gitignore # Git ignore rules
βββ scrape_pg.sh # Bash script for scraping essays
βββ pg_data.json # Essay URLs and metadata
βββ scrape_log.txt # Scraping activity log
βββ SYS_PROMPT/ # AI system prompts and knowledge base
β βββ system_prompt.md # Analytical advisor persona
β βββ system_prompt_G.md # Paul Graham persona
β βββ pg_knowledge_base.md # Complete knowledge base
β βββ KNOWLEDGE_INDEX.md # Structured essay index
βββ posts/ # Original essays (raw format)
β βββ Founder Mode.md
β βββ How to Do Great Work.md
β βββ How to Get Startup Ideas.md
β βββ ... (200+ more essays)
βββ posts_clean/ # Cleaned and formatted essays
β βββ ... (processed essays)
βββ concatenator.py # Combine essays into knowledge base
βββ indexmaker.py # Generate structured essay index
βββ pg_parser.py # Essay parsing and cleanup (v1)
βββ pg_parser_V2.py # Essay parsing and cleanup (v2)
Generated files (git ignored):
βββ .env # Your API keys (DO NOT COMMIT)
βββ pg_env/ # Python virtual environment
This repository includes two carefully crafted system prompts in the SYS_PROMPT/
directory:
This prompt creates a straightforward startup and technology advisor persona that:
- Provides direct, analytical insights grounded in PG's essays
- Requires verbatim quotes and specific essay references
- Focuses on practical, actionable advice
- Maintains intellectual rigor with systematic knowledge retrieval
This prompt creates a more immersive Paul Graham persona that:
- Embodies PG's distinctive voice, thinking patterns, and communication style
- Uses first-person perspective ("I" statements) as if PG himself is responding
- Incorporates his characteristic rhetorical patterns and analogical reasoning
- References personal experiences (Viaweb, Y Combinator) when relevant
Key Difference in User Experience:
- Analytical Advisor (
system_prompt.md
): Provides insights about Paul Graham's ideas with academic rigor and specific citations - Paul Graham Persona (
system_prompt_G.md
): Delivers insights from Paul Graham's perspective with his distinctive voice and thinking patterns
Choose based on whether you want analytical distance or immersive persona interaction.
# Create and activate virtual environment
python3 -m venv pg_env
source pg_env/bin/activate # On Windows: pg_env\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Or install manually:
# pip install python-dotenv requests
# Set up your API key
cp .env.example .env
# Edit .env and add your Google API key
# Generate knowledge index from essays
python indexmaker.py
# Parse and clean essay formats
python pg_parser.py
# or
python pg_parser_V2.py
# Combine essays into single knowledge base
python concatenator.py
- Robust error handling with colored output
- Rate limiting to be respectful of PG's servers
- File sanitization for clean, readable filenames
- Progress tracking with detailed logging
- Resume capability for interrupted scrapes
For scraping essays (optional):
# Install required tools
brew install jq # JSON parsing utility
# Install fabric utility (path configured in script)
For processing and analysis:
# Python 3.7+ required
python3 --version
# Google Gemini API key (for indexing)
# Get yours at: https://console.cloud.google.com/apis/credentials
./scrape_pg.sh
The script will:
- Read URLs from
pg_data.json
- Download each essay using the
fabric
utility - Convert to clean markdown format
- Save in the
posts/
directory - Log all activity to
scrape_log.txt
Paul Graham's essays represent decades of battle-tested wisdom across multiple domains:
- π Startups & Entrepreneurship: From initial idea validation to IPO strategies
- π» Programming & Technology: Insights from a master craftsman and language designer
- βοΈ Writing & Communication: Clear thinking made manifest through precise language
- π§ Philosophy & Life: Thoughtful commentary on society, human nature, and decision-making
Having this knowledge instantly accessible in your AI workflow enables you to:
- Get PG's perspective on critical business decisions and strategic challenges
- Identify patterns across decades of writing to inform your own thinking
- Generate insights using proven mental models and frameworks
- Access specific examples and case studies without manual research
- Develop intuition for startup thinking and technology trends
We welcome contributions and suggestions! Here are ways you can help:
- Refine the personas - Found better ways to capture PG's voice or analytical framework?
- Add new prompt variations - Different use cases might need different approaches
- Test and iterate - Try the prompts with various AI models and share results
- Missing essays - Found an essay that's not in the collection?
- Script enhancements - Improve the scraper, parser, or indexing tools
- Documentation - Help others understand and use the tools better
- New features - Ideas for better ways to interact with the knowledge base
- Bug reports - If something doesn't work as expected
- Use case examples - Share how you're using this in your workflow
Submit issues or PRs on GitHub!
This repository contains content scraped from paulgraham.com for educational and research purposes.
Important Notes:
- All essays remain the intellectual property of Paul Graham
- This collection is intended for personal learning and analysis only
- Commercial use requires permission from the original author
- The scraping and processing scripts are provided under MIT license
- Please respect the original author's work and link back to paulgraham.com when sharing insights
Attribution: Original essays by Paul Graham β’ Collection and processing tools by this repository's contributors
β Star this repo if you find it useful! β’ π΄ Fork it to customize for your needs β’ π§ Share your use cases and feedback
"The way to get startup ideas is not to try to think of startup ideas. It's to look for problems, preferably problems you have yourself." - Paul Graham
Now go build something people want! π