AI Estate Scraper

A Streamlit app that scrapes property listings from apartments.com and extracts property data using AI.

Features

Search for properties by location (city, neighborhood, zip code)
View property details including price, beds, baths, and amenities
Automatic detection of cloud environments for seamless deployment
Demo mode with pre-scraped data

Deployment

The app can be deployed both locally and on Streamlit Cloud.

Local Deployment

For local deployment, you'll need to install the required dependencies:

pip install -r requirements.txt

Then run the app:

streamlit run app.py

Streamlit Cloud Deployment

The app automatically detects when it's running on Streamlit Cloud and uses pre-scraped demo data instead of attempting live scraping, which would require system dependencies.

Technical Details

Built with Streamlit, Playwright, and Groq LLM
Implements cloud environment detection
Uses fallback mechanisms for better user experience
Handles various edge cases and errors gracefully

Setup

Clone this repository
Install the required packages:
```
pip install -r requirements.txt
```
Install Playwright browsers:
```
playwright install
```
Create a .env file in the root directory with your Groq API key:
```
GROQ_API_KEY=your_api_key_here
```

Usage

Run the Streamlit app:
```
streamlit run app.py
```
Enter a location (city, neighborhood, or zip code)
Choose whether to run in headless mode or not
Click "Start Scraping" and watch the results in real-time

How It Works

The application uses:

Playwright for rendering and interacting with web pages
Groq API (with LLama 3) for extracting property information
Streamlit for the web interface
Selectolax for HTML parsing

As properties are found, they are immediately displayed in the interface, allowing you to see results as they come in rather than waiting for the entire scraping process to finish.

Requirements

Python 3.7+
Groq API key
Internet connection

Project Structure

AiEstateScraper/
├── config/                 # Configuration files
│    ├── config.json         # Stores scraper settings 
│    └── tools.py            # Helper functions for configuration handling
│
├── outputs/                # Directory for storing scraped data
│    └── outputs.json        # JSON file containing extracted property listings
│
├── utils/                  # Utility scripts
│    ├── extract.py          # Main scraper logic using LLM and Selectolax
│    └── render.py           # Handles rendering and data processing using Playwright
│
├── .env                    # Environment variables (e.g., API keys, LLM credentials)
├── main.py                 # Entry point for the scraper
└── requirements.txt        # Project dependencies

Dependencies

groq
playwright
selectolax
python-dotenv

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.devcontainer		.devcontainer
.streamlit		.streamlit
config		config
status		status
utils		utils
.gitignore		.gitignore
README.md		README.md
app.py		app.py
main.py		main.py
requirements.txt		requirements.txt
run_scraper.py		run_scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI Estate Scraper

Features

Deployment

Local Deployment

Streamlit Cloud Deployment

Technical Details

Setup

Usage

How It Works

Requirements

Project Structure

Dependencies

License

About

Uh oh!

Uh oh!

Languages

nazzal5448/AiEstateScraper

Folders and files

Latest commit

History

Repository files navigation

AI Estate Scraper

Features

Deployment

Local Deployment

Streamlit Cloud Deployment

Technical Details

Setup

Usage

How It Works

Requirements

Project Structure

Dependencies

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages