Scraping Korea Baseball Game information


License
MIT
Install
pip install kbodata==0.1.0

Documentation

What is kbo-data

kbo-data๋Š” ํ•œ๊ตญํ”„๋กœ์•ผ๊ตฌ ๊ฒฝ๊ธฐ์ •๋ณด๋ฅผ ์Šคํฌ๋ž˜ํ•‘ํ•˜๋Š” ํŒŒ์ด์ฌ ํŒจํ‚ค์ง€์ž…๋‹ˆ๋‹ค.
kbo-data is a Python package that provides Korean professional baseball game information by scraping.

PyPI - Python Version PyPI GitHub license

Required

์ด ํŒจํ‚ค์ง€๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” chrome driver๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. chrome driver๋Š” ํ•ด๋‹น ํŽ˜์ด์ง€์—์„œ ๋‹ค์šด๋กœ๋“œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
This package is required chrome driver. You can download it from this page

How to Use

ํŒจํ‚ค์ง€ ์„ค์น˜ํ•˜๊ธฐ

๋จผ์ € ํŒจํ‚ค์ง€๋ฅผ ์„ค์น˜ํ•ฉ๋‹ˆ๋‹ค.
you have to install kbodata package first.

pip install kbodata

๋ฐ์ดํ„ฐ ๊ฐ€์ ธ์˜ค๊ธฐ (kbodata.get module)

์›ํ•˜๋Š” ๋‚ ์งœ์˜ ๊ฒฝ๊ธฐ ์Šค์ผ€์ฅด์„ ๋‹ค์šด๋กœ๋“œ ๋ฐ›์Šต๋‹ˆ๋‹ค.
you can download KBO match schedule that you want to get.

    import kbodata

    # 2021๋…„ 4์›” 20์ผ์˜ KBO ๊ฒฝ๊ธฐ ์Šค์ผ€์ฅด์„ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค.
    # Get the KBO match schedule for April 20, 2021.
    >>> day = kbodata.get_daily_schedule(2021,4,20,'chromedriver_path')

    # 2021๋…„ 4์›” KBO ๊ฒฝ๊ธฐ ์Šค์ผ€์ฅด์„ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค.
    # Get the KBO match schedule for April 2021.
    >>> month = kbodata.get_monthly_schedule(2021,4,'chromedriver_path')

    # 2021๋…„ KBO ๊ฒฝ๊ธฐ ์Šค์ผ€์ฅด์„ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค. 
    # Get the KBO match schedule for 2021.
    >>> year = kbodata.get_yearly_schedule(2021,'chromedriver_path')

ํ•ด๋‹น ์Šค์ผ€์ฅด์„ ๋ฐ”ํƒ•์œผ๋กœ ๊ฒฝ๊ธฐ ์ •๋ณด๋ฅผ JSON ํ˜•์‹์œผ๋กœ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค.
It will be broght match information in JSON format based on the schedule.

    # 2021๋…„ 4์›” 20์ผ์˜ KBO ๊ฒฝ๊ธฐ ์ •๋ณด๋ฅผ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค.
    # Get the KBO match information for April 20, 2021.
    >>> day_data = kbodata.get_game_data(day,'chromedriver_path')

    # 2021๋…„ 4์›” KBO ๊ฒฝ๊ธฐ ์ •๋ณด๋ฅผ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค.
    # Get the KBO match information for April 2021.
    >>> month_data = kbodata.get_game_data(month,'chromedriver_path')

    # 2021๋…„ KBO ๊ฒฝ๊ธฐ ์ •๋ณด๋ฅผ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค. 
    # Get the KBO match information for 2021.
    >>> year_data = kbodata.get_game_data(year,'chromedriver_path')

JSON ํ˜•์‹์€ ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.
The JSON format is as below.

    { id: date_gameid,
    contents: {
      'scoreboard': []
      'ETC_info': {}
      'away_batter': []
      'home_batter': []
      'away_pitcher': []
      'home_pitcher': []
        }
    }

๋ฐ์ดํ„ฐ ๋ณ€ํ˜•ํ•˜๊ธฐ (kbodata.load module)

๊ฐ€์ ธ์˜จ ๋ฐ์ดํ„ฐ๋“ค์„ ํŠน์ • ํŒŒ์ผ ํƒ€์ž…์œผ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค. ์ง€์›ํ•˜๋Š” ํŒŒ์ผ ํƒ€์ž…์€ ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.
This module converts data into specific file types. The supported file types are as follows.

  • DataFrame(pandas)
  • Dict
    # ํŒ€ ๊ฒฝ๊ธฐ ์ •๋ณด๋งŒ์„ ์ •๋ฆฌํ•˜์—ฌ DataFrame์œผ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
    scoreboard = kbodata.scoreboard_to_DataFrame(day_data)
    # ํƒ€์ž ์ •๋ณด๋งŒ์„ ์ •๋ฆฌํ•˜์—ฌ DataFrame์œผ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
    batter = kbodata.batter_to_DataFrame(day_data)
    # ํˆฌ์ˆ˜ ์ •๋ณด๋งŒ์„ ์ •๋ฆฌํ•˜์—ฌ DataFrame์œผ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
    pitcher = kbodata.pitcher_to_DataFrame(day_data)

    # ํŒ€ ๊ฒฝ๊ธฐ ์ •๋ณด๋งŒ์„ ์ •๋ฆฌํ•˜์—ฌ Dict์œผ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
    scoreboard = kbodata.scoreboard_to_Dict(day_data)
    # ํƒ€์ž ์ •๋ณด๋งŒ์„ ์ •๋ฆฌํ•˜์—ฌ Dict์œผ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
    batter = kbodata.batter_to_Dict(day_data)
    # ํˆฌ์ˆ˜ ์ •๋ณด๋งŒ์„ ์ •๋ฆฌํ•˜์—ฌ Dict์œผ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
    pitcher = kbodata.pitcher_to_Dict(day_data)

๋ณ€ํ™˜๋œ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์ •๋ณด๋Š” ์•„๋ž˜์˜ ๋งํฌ์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
You can find information about the converted data at the link below.

Issues

KBO ๊ณต์‹ ํ™ˆํŽ˜์ด์ง€์— ์—†๋Š” ๋ฐ์ดํ„ฐ๋Š” ์ œ๊ณต๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ๊ฐ€ ์ œ๊ณต๋˜์ง€ ์•Š๋Š” ๊ฒฝ๊ธฐ ์ •๋ณด๋Š” ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.
Data that is not on the KBO official website is not provided. Match information for which data is not provided are listed below.

๊ฒฝ๊ธฐ ๊ธฐ์ค€ (from game)

  • 2008-03-30 LTHH0
  • 2009-04-04 WOLT0
  • 2010-03-20 OBLT0
  • 2010-03-20 WOSS0
  • 2015-07-08 HTWO0
  • 2018-08-01 WOSK0

๋‚ ์งœ ๊ธฐ์ค€ (from date)

  • 2013-03-09
  • 2013-03-10
  • 2013-03-11
  • 2013-03-12
  • 2013-03-13
  • 2013-03-14
  • 2013-03-15
  • 2013-03-16
  • 2013-03-17
  • 2013-03-18
  • 2013-03-19
  • 2013-03-20