What Makes Art Valuable: Data Scraping and Analysis of Auction Data

Project Overview

This project explores factors influencing artwork prices at major auction houses (Christie's and Sotheby's). Using custom web scrapers built with Python and Selenium, auction data was collected, including artwork details, estimates, and final sale prices. The data was then cleaned and processed using Pandas to analyze trends, particularly the relationship between auction house estimates, artist popularity (approximated via Yahoo search results), and final sale prices.

Key Features

Multi-Stage Web Scraping: Python scripts utilizing Selenium to navigate dynamic auction sites, collect auction/artwork URLs, and extract specific artwork features (price, artist, estimates, dimensions, etc.).
Data Cleaning & Processing: Jupyter Notebooks demonstrating data cleaning techniques with Pandas, including:
- Handling inconsistencies in scraped data.
- Parsing and separating estimate ranges (low/high).
- Standardizing and converting currencies (GBP, EUR, HKD, etc.) to USD.
- Filtering out non-painting/print lots.
Feature Engineering:
- Calculation of artist age and determination of living status.
- Creation of binary 'Sold' status based on price data.
- Calculation of estimate accuracy (whether the final price fell below, within, or above the estimate range).
- Integration of artist popularity metric derived from scraping Yahoo search result counts using Requests and BeautifulSoup.
Exploratory Data Analysis (EDA): Initial visualizations exploring the relationship between sale prices, estimates (confirming underestimation bias and anchoring effects), and artist popularity.
(Experimental) An included notebook explores image classification using Keras/TensorFlow (VGG16), though this feature was not integrated into the final analysis.

Key Technologies

Python: Core programming language.
Selenium: Web browser automation and scraping dynamic websites.
Pandas: Data manipulation, cleaning, and analysis.
NumPy: Numerical operations.
Requests & BeautifulSoup: Scraping static content (used for Yahoo search results).
Jupyter Notebook: Development environment for scraping, cleaning, and analysis.
Matplotlib & Seaborn: Data visualization.
(Experimental): Keras / TensorFlow

Key Findings

A strong correlation was observed between auction house estimates (both low and high) and the final sale price, suggesting a potential anchoring bias effect.
The analysis indicated a tendency for the auction house (Christie's data was primarily used for this part) to underestimate artwork values, with a significant percentage (~54%) selling above the high estimate.
Artist popularity, as approximated by Yahoo search result counts, did not show a strong correlation with final sale prices within this dataset.

Challenges

Data Collection: Navigating the complexities of Selenium for dynamic websites and handling inconsistencies across different auction/artwork page layouts.
Data Cleaning: Significant effort was required to standardize formats, currencies, and filter out irrelevant lots (e.g., furniture).
Scope: Difficulty in reliably filtering only paintings/prints and excluding medium as a feature might introduce noise into the analysis (e.g., comparing a Picasso print to a painting).

Usage Note

The web scrapers were developed based on the website structures of Christie's and Sotheby's at the time of the project's creation. Websites change frequently, so these scrapers will likely require significant updates to function correctly now. The cleaned data files (.csv) are provided for direct analysis.

Project Files

Data Collection and Cleaning

Stage1_Christies_Scraper.ipynb: Data collection and initial scraping.
Stage2-Christies_Scraper.ipynb: Further data collection and scraping.
Christies_Art_Objects_Clean.csv: Cleaned data from Christie's.
Christies_data with popularity.csv: Data from Christie's with artist popularity measures.
SothebysData_clean.csv: Cleaned data from Sotheby's.

Data Processing and Visualization

Art_Object_Info2.csv: Processed data with added features.
SothebysData.csv: Data from Sotheby's.
Sotheby's Scraper .ipynb: Scraping data from Sotheby's.
Art_Object_URL.csv: URLs for art objects.
Christies Data Visualization.ipynb: Visualizations of Christie's data.
Christies_Art Data Cleaner.ipynb: Data cleaning for Christie's data.
Christies_Art Data Cleaner_With Day, Month, Year.ipynb: Detailed data cleaning for Christie's data.
Data Cleaner_Christies.ipynb: Data cleaning and processing for Christie's data.
Data Visualization_Christies.ipynb: Further data visualization for Christie's data.
ImageClassifier.ipynb: Image classification experiments.
Sothebys_Data_Cleaner.ipynb: Data cleaning for Sotheby's data.
What Makes Art Valuable_ Data Scraping and Exploratory Data Visualizations.pdf: The blog post about the project.
relative popularity.ipynb: Notebook for analyzing artist popularity.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

What Makes Art Valuable: Data Scraping and Analysis of Auction Data

Project Overview

Key Features

Key Technologies

Key Findings

Challenges

Usage Note

Project Files

Data Collection and Cleaning

Data Processing and Visualization

Further Reading

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
Christies web scraper		Christies web scraper
Clean Data		Clean Data
Raw Data		Raw Data
Sotheby's Scraper		Sotheby's Scraper
Art_Object_URL.csv		Art_Object_URL.csv
Christies Data Visualization.ipynb		Christies Data Visualization.ipynb
Christies_Art Data Cleaner.ipynb		Christies_Art Data Cleaner.ipynb
Christies_Art Data Cleaner_With Day, Month, Year.ipynb		Christies_Art Data Cleaner_With Day, Month, Year.ipynb
Data Cleaner_Christies.ipynb		Data Cleaner_Christies.ipynb
Data Visualization_Christies.ipynb		Data Visualization_Christies.ipynb
ImageClassifier.ipynb		ImageClassifier.ipynb
README.md		README.md
Sothebys_Data_Cleaner.ipynb		Sothebys_Data_Cleaner.ipynb
What Makes Art Valuable_ Data Scraping and Exploratory Data Visualizations.pdf		What Makes Art Valuable_ Data Scraping and Exploratory Data Visualizations.pdf
relative popularity.ipynb		relative popularity.ipynb

marcusrprojects/What-Makes-Art-Valuable

Folders and files

Latest commit

History

Repository files navigation

What Makes Art Valuable: Data Scraping and Analysis of Auction Data

Project Overview

Key Features

Key Technologies

Key Findings

Challenges

Usage Note

Project Files

Data Collection and Cleaning

Data Processing and Visualization

Further Reading

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages