korean-news-crawler

Python Library for Crawling Top 10 Korean News and Providing Synonym Dictionary


Keywords
korean, newspaper, newspaper-crawler, scraping-python, scraping-websites, webcrawler, webcrawling
Install
pip install korean-news-crawler==1.0.5

Documentation

Korean_News_Crawler

ํ•œ๊ตญ 10๋Œ€ ์ผ๊ฐ„์ง€ ํฌ๋กค๋ง ๋ฐ ์œ ์‚ฌ์–ด ์‚ฌ์ „ ์ œ๊ณต Python ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์ž…๋‹ˆ๋‹ค. ์•„์ง PyPI์— ์ •์‹๋“ฑ๋ก๋˜์ง„ ์•Š์€ beta ๋ฒ„์ „์ž…๋‹ˆ๋‹ค.
Open Source Project๋กœ ๊ธฐ์—ฌ์ž, ์ฐธ์—ฌ์ž ์ƒ์‹œ ๋ชจ์ง‘ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์—ฐ๋ฝ์ฃผ์‹œ๋ฉด ๊ฐ์‚ฌํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

This is Python library for crawling articles from Korean Top 10 Newspaper sites and providing synonym dictionary.
The copyright of articles are belong to original media company. We don't take any legal responsibility using of them. We assume that you have agreed to this.
We're greeting to join you as contibutors, collaborator. Thanks to give me contact.

Supported News Sites

Contibutors

Indigo_Coder
Indigo_Coder

Installation

pip install korean_news_crawler

BeautifulSoup, Selenium, Requests are required.

Quick Usage

from korean_news_crawler import chosun

chosun = Chosun()
print(chosun.dynamic_crawl("https://www.chosun.com/..."))

chosun_url_list = list() #Chosun Ilbo url list
print(chosun.dynamic_crawl(chosun_url_list))

API

  1. Chosun()
  2. Donga()
  3. Hankook()
  4. Hankyoreh()
  5. Joongang()
  6. Kukmin()
  7. Kyunghyang()
  8. Munhwa()
  9. Naeil()
  10. Segye()
  11. Seoul()

It provides crawling Chosun Ilbo.

Parameters

Parameters Type Description
delay_time float or tuple - Optional, Defaults to None.
- When 'delay_time=float', it will crawl sites with delay.
- When 'delay_time=tuple', it will crawl sites with random delay.
saving_html bool - Optional, Defaults to False.
- When 'saving_html=False', it always requests url every function calling.
- When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load.

Attributes

Attributes Type Description
delay_time float or tuple
saving_html bool

Methods

Methods Description
dynamic_crawl(url) Return article text using Selenium.
static_crawl(url) Return article text using BeautifulSoup.
dynamic_crawl(url)
  • Return article text using Selenium.
Parameters Type Description
url str or list - When 'url=str', it will only crawl given url.
- When 'url=list', it will crawl with iterating url list.
Returns Type Description
list Return article list.
static_crawl(url)
  • Return article text using BeautifulSoup.
Parameters Type Description
url str or list - When 'url=str', it will only crawl given url.
- When 'url=list', it will crawl with iterating url list.
Returns Type Description
list Return article list.

It provides crawling Dong-a Ilbo.

Parameters

Parameters Type Description
delay_time float or tuple - Optional, Defaults to None.
- When 'delay_time=float', it will crawl sites with delay.
- When 'delay_time=tuple', it will crawl sites with random delay.
saving_html bool - Optional, Defaults to False.
- When 'saving_html=False', it always requests url every function calling.
- When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load.

Attributes

Attributes Type Description
delay_time float or tuple
saving_html bool

Methods

Methods Description
dynamic_crawl(url) Return article text using Selenium.
static_crawl(url) Return article text using BeautifulSoup.
dynamic_crawl(url)
  • Return article text using Selenium.
Parameters Type Description
url str or list - When 'url=str', it will only crawl given url.
- When 'url=list', it will crawl with iterating url list.
Returns Type Description
list Return article list.
static_crawl(url)
  • Return article text using BeautifulSoup.
Parameters Type Description
url str or list - When 'url=str', it will only crawl given url.
- When 'url=list', it will crawl with iterating url list.
Returns Type Description
list Return article list.

It provides crawling Hankook Ilbo.

Parameters

Parameters Type Description
delay_time float or tuple - Optional, Defaults to None.
- When 'delay_time=float', it will crawl sites with delay.
- When 'delay_time=tuple', it will crawl sites with random delay.
saving_html bool - Optional, Defaults to False.
- When 'saving_html=False', it always requests url every function calling.
- When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load.

Attributes

Attributes Type Description
delay_time float or tuple
saving_html bool

Methods

Methods Description
dynamic_crawl(url) Return article text using Selenium.
static_crawl(url) Return article text using BeautifulSoup.
dynamic_crawl(url)
  • Return article text using Selenium.
Parameters Type Description
url str or list - When 'url=str', it will only crawl given url.
- When 'url=list', it will crawl with iterating url list.
Returns Type Description
list Return article list.
static_crawl(url)
  • Return article text using BeautifulSoup.
Parameters Type Description
url str or list - When 'url=str', it will only crawl given url.
- When 'url=list', it will crawl with iterating url list.
Returns Type Description
list Return article list.

It provides crawling Hankyoreh.

Parameters

Parameters Type Description
delay_time float or tuple - Optional, Defaults to None.
- When 'delay_time=float', it will crawl sites with delay.
- When 'delay_time=tuple', it will crawl sites with random delay.
saving_html bool - Optional, Defaults to False.
- When 'saving_html=False', it always requests url every function calling.
- When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load.

Attributes

Attributes Type Description
delay_time float or tuple
saving_html bool

Methods

Methods Description
dynamic_crawl(url) Return article text using Selenium.
static_crawl(url) Return article text using BeautifulSoup.
dynamic_crawl(url)
  • Return article text using Selenium.
Parameters Type Description
url str or list - When 'url=str', it will only crawl given url.
- When 'url=list', it will crawl with iterating url list.
Returns Type Description
list Return article list.
static_crawl(url)
  • Return article text using BeautifulSoup.
Parameters Type Description
url str or list - When 'url=str', it will only crawl given url.
- When 'url=list', it will crawl with iterating url list.
Returns Type Description
list Return article list.

It provides crawling Joongang Ilbo.

Parameters

Parameters Type Description
delay_time float or tuple - Optional, Defaults to None.
- When 'delay_time=float', it will crawl sites with delay.
- When 'delay_time=tuple', it will crawl sites with random delay.
saving_html bool - Optional, Defaults to False.
- When 'saving_html=False', it always requests url every function calling.
- When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load.

Attributes

Attributes Type Description
delay_time float or tuple
saving_html bool

Methods

Methods Description
dynamic_crawl(url) Return article text using Selenium.
static_crawl(url) Return article text using BeautifulSoup.
dynamic_crawl(url)
  • Return article text using Selenium.
Parameters Type Description
url str or list - When 'url=str', it will only crawl given url.
- When 'url=list', it will crawl with iterating url list.
Returns Type Description
list Return article list.
static_crawl(url)
  • Return article text using BeautifulSoup.
Parameters Type Description
url str or list - When 'url=str', it will only crawl given url.
- When 'url=list', it will crawl with iterating url list.
Returns Type Description
list Return article list.

It provides crawling Kukmin Ilbo.

Parameters

Parameters Type Description
delay_time float or tuple - Optional, Defaults to None.
- When 'delay_time=float', it will crawl sites with delay.
- When 'delay_time=tuple', it will crawl sites with random delay.
saving_html bool - Optional, Defaults to False.
- When 'saving_html=False', it always requests url every function calling.
- When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load.

Attributes

Attributes Type Description
delay_time float or tuple
saving_html bool

Methods

Methods Description
dynamic_crawl(url) Return article text using Selenium.
static_crawl(url) Return article text using BeautifulSoup.
dynamic_crawl(url)
  • Return article text using Selenium.
Parameters Type Description
url str or list - When 'url=str', it will only crawl given url.
- When 'url=list', it will crawl with iterating url list.
Returns Type Description
list Return article list.
static_crawl(url)
  • Return article text using BeautifulSoup.
Parameters Type Description
url str or list - When 'url=str', it will only crawl given url.
- When 'url=list', it will crawl with iterating url list.
Returns Type Description
list Return article list.

It provides crawling Kyunghyang Shinmun.

Parameters

Parameters Type Description
delay_time float or tuple - Optional, Defaults to None.
- When 'delay_time=float', it will crawl sites with delay.
- When 'delay_time=tuple', it will crawl sites with random delay.
saving_html bool - Optional, Defaults to False.
- When 'saving_html=False', it always requests url every function calling.
- When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load.

Attributes

Attributes Type Description
delay_time float or tuple
saving_html bool

Methods

Methods Description
dynamic_crawl(url) Return article text using Selenium.
static_crawl(url) Return article text using BeautifulSoup.
dynamic_crawl(url)
  • Return article text using Selenium.
Parameters Type Description
url str or list - When 'url=str', it will only crawl given url.
- When 'url=list', it will crawl with iterating url list.
Returns Type Description
list Return article list.
static_crawl(url)
  • Return article text using BeautifulSoup.
Parameters Type Description
url str or list - When 'url=str', it will only crawl given url.
- When 'url=list', it will crawl with iterating url list.
Returns Type Description
list Return article list.

It provides crawling Munhwa Ilbo.

Parameters

Parameters Type Description
delay_time float or tuple - Optional, Defaults to None.
- When 'delay_time=float', it will crawl sites with delay.
- When 'delay_time=tuple', it will crawl sites with random delay.
saving_html bool - Optional, Defaults to False.
- When 'saving_html=False', it always requests url every function calling.
- When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load.

Attributes

Attributes Type Description
delay_time float or tuple
saving_html bool

Methods

Methods Description
dynamic_crawl(url) Return article text using Selenium.
static_crawl(url) Return article text using BeautifulSoup.
dynamic_crawl(url)
  • Return article text using Selenium.
Parameters Type Description
url str or list - When 'url=str', it will only crawl given url.
- When 'url=list', it will crawl with iterating url list.
Returns Type Description
list Return article list.
static_crawl(url)
  • Return article text using BeautifulSoup.
Parameters Type Description
url str or list - When 'url=str', it will only crawl given url.
- When 'url=list', it will crawl with iterating url list.
Returns Type Description
list Return article list.

It provides crawling Naeil News.

Parameters

Parameters Type Description
delay_time float or tuple - Optional, Defaults to None.
- When 'delay_time=float', it will crawl sites with delay.
- When 'delay_time=tuple', it will crawl sites with random delay.
saving_html bool - Optional, Defaults to False.
- When 'saving_html=False', it always requests url every function calling.
- When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load.

Attributes

Attributes Type Description
delay_time float or tuple
saving_html bool

Methods

Methods Description
dynamic_crawl(url) Return article text using Selenium.
static_crawl(url) Return article text using BeautifulSoup.
dynamic_crawl(url)
  • Return article text using Selenium.
Parameters Type Description
url str or list - When 'url=str', it will only crawl given url.
- When 'url=list', it will crawl with iterating url list.
Returns Type Description
list Return article list.
static_crawl(url)
  • Return article text using BeautifulSoup.
Parameters Type Description
url str or list - When 'url=str', it will only crawl given url.
- When 'url=list', it will crawl with iterating url list.
Returns Type Description
list Return article list.

It provides crawling Segye Ilbo.

Parameters

Parameters Type Description
delay_time float or tuple - Optional, Defaults to None.
- When 'delay_time=float', it will crawl sites with delay.
- When 'delay_time=tuple', it will crawl sites with random delay.
saving_html bool - Optional, Defaults to False.
- When 'saving_html=False', it always requests url every function calling.
- When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load.

Attributes

Attributes Type Description
delay_time float or tuple
saving_html bool

Methods

Methods Description
dynamic_crawl(url) Return article text using Selenium.
static_crawl(url) Return article text using BeautifulSoup.
dynamic_crawl(url)
  • Return article text using Selenium.
Parameters Type Description
url str or list - When 'url=str', it will only crawl given url.
- When 'url=list', it will crawl with iterating url list.
Returns Type Description
list Return article list.
static_crawl(url)
  • Return article text using BeautifulSoup.
Parameters Type Description
url str or list - When 'url=str', it will only crawl given url.
- When 'url=list', it will crawl with iterating url list.
Returns Type Description
list Return article list.

It provides crawling Seoul Shinmun.

Parameters

Parameters Type Description
delay_time float or tuple - Optional, Defaults to None.
- When 'delay_time=float', it will crawl sites with delay.
- When 'delay_time=tuple', it will crawl sites with random delay.
saving_html bool - Optional, Defaults to False.
- When 'saving_html=False', it always requests url every function calling.
- When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load.

Attributes

Attributes Type Description
delay_time float or tuple
saving_html bool

Methods

Methods Description
dynamic_crawl(url) Return article text using Selenium.
static_crawl(url) Return article text using BeautifulSoup.
dynamic_crawl(url)
  • Return article text using Selenium.
Parameters Type Description
url str or list - When 'url=str', it will only crawl given url.
- When 'url=list', it will crawl with iterating url list.
Returns Type Description
list Return article list.
static_crawl(url)
  • Return article text using BeautifulSoup.
Parameters Type Description
url str or list - When 'url=str', it will only crawl given url.
- When 'url=list', it will crawl with iterating url list.
Returns Type Description
list Return article list.