A Streamlit app that scrapes property listings from apartments.com and extracts property data using AI.
- Search for properties by location (city, neighborhood, zip code)
- View property details including price, beds, baths, and amenities
- Automatic detection of cloud environments for seamless deployment
- Demo mode with pre-scraped data
The app can be deployed both locally and on Streamlit Cloud.
For local deployment, you'll need to install the required dependencies:
pip install -r requirements.txt
Then run the app:
streamlit run app.py
The app automatically detects when it's running on Streamlit Cloud and uses pre-scraped demo data instead of attempting live scraping, which would require system dependencies.
- Built with Streamlit, Playwright, and Groq LLM
- Implements cloud environment detection
- Uses fallback mechanisms for better user experience
- Handles various edge cases and errors gracefully
- Clone this repository
- Install the required packages:
pip install -r requirements.txt
- Install Playwright browsers:
playwright install
- Create a
.env
file in the root directory with your Groq API key:GROQ_API_KEY=your_api_key_here
- Run the Streamlit app:
streamlit run app.py
- Enter a location (city, neighborhood, or zip code)
- Choose whether to run in headless mode or not
- Click "Start Scraping" and watch the results in real-time
The application uses:
- Playwright for rendering and interacting with web pages
- Groq API (with LLama 3) for extracting property information
- Streamlit for the web interface
- Selectolax for HTML parsing
As properties are found, they are immediately displayed in the interface, allowing you to see results as they come in rather than waiting for the entire scraping process to finish.
- Python 3.7+
- Groq API key
- Internet connection
AiEstateScraper/
├── config/ # Configuration files
│ ├── config.json # Stores scraper settings
│ └── tools.py # Helper functions for configuration handling
│
├── outputs/ # Directory for storing scraped data
│ └── outputs.json # JSON file containing extracted property listings
│
├── utils/ # Utility scripts
│ ├── extract.py # Main scraper logic using LLM and Selectolax
│ └── render.py # Handles rendering and data processing using Playwright
│
├── .env # Environment variables (e.g., API keys, LLM credentials)
├── main.py # Entry point for the scraper
└── requirements.txt # Project dependencies
groq
playwright
selectolax
python-dotenv
This project is licensed under the MIT License.