This repository is dedicated to collecting and scraping KBO (Korea Baseball Organization) data. It includes scripts and processes for gathering player statistics, team data, game results, and other related information.
- Scrape KBO data including game results, schedules, and player statistics
- Supports various output formats:
Parquet
,JSON
,CSV
- Flexible command-line interface with multiple scraping commands
- Filter by year, specific date, and series ID (league/stage type)
- Python 3.12+ is required.
-
Clone the repository
git clone https://github.com/kbo-data-portal/collector.git cd collector
-
Install dependencies
pip install -r requirements.txt
This project provides a command-line tool for scraping KBO data. You can specify the target data and output format using commands.
python run.py <command> [options]
<command>
— The target data type (game, schedule, player)[options]
— Additional filters and configurations
Option | Description |
---|---|
-y, --year |
Specify the year (e.g., 2014) |
-d, --date |
Specific date in YYYYMMDD format |
-f, --format |
Output format: parquet, json, csv |
-s, --series |
Series ID to indicate league/stage type (see Series ID) |
game
Scrape game-related data.
python run.py game -y 2014 -f csv # Season data
python run.py game -d 20141111 -f json # Specific date data
schedule
Scrape schedule of games.
python run.py schedule -y 2014 -f parquet
player
Scrape player statistics.
python run.py player -y 2014 -f csv
For detailed command usage, run:
python run.py <command> --help
Each game record includes a SR_ID
field representing the league/stage type:
SR_ID | Description |
---|---|
0 | Regular Season |
1 | Preseason Game |
3 | Semi-Playoffs |
4 | Wild Card Round |
5 | Playoffs |
7 | Korean Series |
8 | International Competitions |
9 | All-star Game |
You can use this field to filter games based on the competition stage.
This project is licensed under the MIT License. See the LICENSE file for details.