ํ๊ตญ 10๋ ์ผ๊ฐ์ง ํฌ๋กค๋ง ๋ฐ ์ ์ฌ์ด ์ฌ์ ์ ๊ณต Python ๋ผ์ด๋ธ๋ฌ๋ฆฌ์
๋๋ค. ์์ง PyPI์ ์ ์๋ฑ๋ก๋์ง ์์ beta ๋ฒ์ ์
๋๋ค.
Open Source Project๋ก ๊ธฐ์ฌ์, ์ฐธ์ฌ์ ์์ ๋ชจ์งํ๊ณ ์์ต๋๋ค. ์ฐ๋ฝ์ฃผ์๋ฉด ๊ฐ์ฌํ๊ฒ ์ต๋๋ค.
This is Python library for crawling articles from Korean Top 10 Newspaper sites and providing synonym dictionary.
The copyright of articles are belong to original media company. We don't take any legal responsibility using of them. We assume that you have agreed to this.
We're greeting to join you as contibutors, collaborator. Thanks to give me contact.
- ์กฐ์ ์ผ๋ณด(Chosun Ilbo)
- ๋์์ผ๋ณด(Dong-a Ilbo)
- ํ๊ตญ์ผ๋ณด(Hankook Ilbo)
- ํ๊ฒจ๋ (Hankyeoreh)
- ์ค์์ผ๋ณด(JoongAng Ilbo)
- ๊ตญ๋ฏผ์ผ๋ณด(Kukmin Ilbo)
- ๊ฒฝํฅ์ ๋ฌธ(Kyunghyang Shinmun)
- ๋ฌธํ์ผ๋ณด(Munhwa Ilbo)
- ๋ด์ผ์ ๋ฌธ(Naeil News)
- ์ธ๊ณ์ผ๋ณด(Segye Ilbo)
- ์์ธ์ ๋ฌธ(Seoul Shinmun)
Indigo_Coder |
pip install korean_news_crawler
BeautifulSoup, Selenium, Requests are required.
from korean_news_crawler import chosun
chosun = Chosun()
print(chosun.dynamic_crawl("https://www.chosun.com/..."))
chosun_url_list = list() #Chosun Ilbo url list
print(chosun.dynamic_crawl(chosun_url_list))
Chosun()
Donga()
Hankook()
Hankyoreh()
Joongang()
Kukmin()
Kyunghyang()
Munhwa()
Naeil()
Segye()
Seoul()
It provides crawling Chosun Ilbo.
Parameters | Type | Description |
---|---|---|
delay_time | float or tuple | - Optional, Defaults to None. - When 'delay_time=float', it will crawl sites with delay. - When 'delay_time=tuple', it will crawl sites with random delay. |
saving_html | bool | - Optional, Defaults to False. - When 'saving_html=False', it always requests url every function calling. - When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load. |
Attributes | Type | Description |
---|---|---|
delay_time | float or tuple | |
saving_html | bool |
Methods | Description |
---|---|
dynamic_crawl(url) | Return article text using Selenium. |
static_crawl(url) | Return article text using BeautifulSoup. |
- Return article text using Selenium.
Parameters | Type | Description |
---|---|---|
url | str or list | - When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list. |
Returns Type | Description |
---|---|
list | Return article list. |
- Return article text using BeautifulSoup.
Parameters | Type | Description |
---|---|---|
url | str or list | - When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list. |
Returns Type | Description |
---|---|
list | Return article list. |
It provides crawling Dong-a Ilbo.
Parameters | Type | Description |
---|---|---|
delay_time | float or tuple | - Optional, Defaults to None. - When 'delay_time=float', it will crawl sites with delay. - When 'delay_time=tuple', it will crawl sites with random delay. |
saving_html | bool | - Optional, Defaults to False. - When 'saving_html=False', it always requests url every function calling. - When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load. |
Attributes | Type | Description |
---|---|---|
delay_time | float or tuple | |
saving_html | bool |
Methods | Description |
---|---|
dynamic_crawl(url) | Return article text using Selenium. |
static_crawl(url) | Return article text using BeautifulSoup. |
- Return article text using Selenium.
Parameters | Type | Description |
---|---|---|
url | str or list | - When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list. |
Returns Type | Description |
---|---|
list | Return article list. |
- Return article text using BeautifulSoup.
Parameters | Type | Description |
---|---|---|
url | str or list | - When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list. |
Returns Type | Description |
---|---|
list | Return article list. |
It provides crawling Hankook Ilbo.
Parameters | Type | Description |
---|---|---|
delay_time | float or tuple | - Optional, Defaults to None. - When 'delay_time=float', it will crawl sites with delay. - When 'delay_time=tuple', it will crawl sites with random delay. |
saving_html | bool | - Optional, Defaults to False. - When 'saving_html=False', it always requests url every function calling. - When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load. |
Attributes | Type | Description |
---|---|---|
delay_time | float or tuple | |
saving_html | bool |
Methods | Description |
---|---|
dynamic_crawl(url) | Return article text using Selenium. |
static_crawl(url) | Return article text using BeautifulSoup. |
- Return article text using Selenium.
Parameters | Type | Description |
---|---|---|
url | str or list | - When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list. |
Returns Type | Description |
---|---|
list | Return article list. |
- Return article text using BeautifulSoup.
Parameters | Type | Description |
---|---|---|
url | str or list | - When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list. |
Returns Type | Description |
---|---|
list | Return article list. |
It provides crawling Hankyoreh.
Parameters | Type | Description |
---|---|---|
delay_time | float or tuple | - Optional, Defaults to None. - When 'delay_time=float', it will crawl sites with delay. - When 'delay_time=tuple', it will crawl sites with random delay. |
saving_html | bool | - Optional, Defaults to False. - When 'saving_html=False', it always requests url every function calling. - When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load. |
Attributes | Type | Description |
---|---|---|
delay_time | float or tuple | |
saving_html | bool |
Methods | Description |
---|---|
dynamic_crawl(url) | Return article text using Selenium. |
static_crawl(url) | Return article text using BeautifulSoup. |
- Return article text using Selenium.
Parameters | Type | Description |
---|---|---|
url | str or list | - When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list. |
Returns Type | Description |
---|---|
list | Return article list. |
- Return article text using BeautifulSoup.
Parameters | Type | Description |
---|---|---|
url | str or list | - When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list. |
Returns Type | Description |
---|---|
list | Return article list. |
It provides crawling Joongang Ilbo.
Parameters | Type | Description |
---|---|---|
delay_time | float or tuple | - Optional, Defaults to None. - When 'delay_time=float', it will crawl sites with delay. - When 'delay_time=tuple', it will crawl sites with random delay. |
saving_html | bool | - Optional, Defaults to False. - When 'saving_html=False', it always requests url every function calling. - When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load. |
Attributes | Type | Description |
---|---|---|
delay_time | float or tuple | |
saving_html | bool |
Methods | Description |
---|---|
dynamic_crawl(url) | Return article text using Selenium. |
static_crawl(url) | Return article text using BeautifulSoup. |
- Return article text using Selenium.
Parameters | Type | Description |
---|---|---|
url | str or list | - When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list. |
Returns Type | Description |
---|---|
list | Return article list. |
- Return article text using BeautifulSoup.
Parameters | Type | Description |
---|---|---|
url | str or list | - When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list. |
Returns Type | Description |
---|---|
list | Return article list. |
It provides crawling Kukmin Ilbo.
Parameters | Type | Description |
---|---|---|
delay_time | float or tuple | - Optional, Defaults to None. - When 'delay_time=float', it will crawl sites with delay. - When 'delay_time=tuple', it will crawl sites with random delay. |
saving_html | bool | - Optional, Defaults to False. - When 'saving_html=False', it always requests url every function calling. - When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load. |
Attributes | Type | Description |
---|---|---|
delay_time | float or tuple | |
saving_html | bool |
Methods | Description |
---|---|
dynamic_crawl(url) | Return article text using Selenium. |
static_crawl(url) | Return article text using BeautifulSoup. |
- Return article text using Selenium.
Parameters | Type | Description |
---|---|---|
url | str or list | - When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list. |
Returns Type | Description |
---|---|
list | Return article list. |
- Return article text using BeautifulSoup.
Parameters | Type | Description |
---|---|---|
url | str or list | - When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list. |
Returns Type | Description |
---|---|
list | Return article list. |
It provides crawling Kyunghyang Shinmun.
Parameters | Type | Description |
---|---|---|
delay_time | float or tuple | - Optional, Defaults to None. - When 'delay_time=float', it will crawl sites with delay. - When 'delay_time=tuple', it will crawl sites with random delay. |
saving_html | bool | - Optional, Defaults to False. - When 'saving_html=False', it always requests url every function calling. - When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load. |
Attributes | Type | Description |
---|---|---|
delay_time | float or tuple | |
saving_html | bool |
Methods | Description |
---|---|
dynamic_crawl(url) | Return article text using Selenium. |
static_crawl(url) | Return article text using BeautifulSoup. |
- Return article text using Selenium.
Parameters | Type | Description |
---|---|---|
url | str or list | - When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list. |
Returns Type | Description |
---|---|
list | Return article list. |
- Return article text using BeautifulSoup.
Parameters | Type | Description |
---|---|---|
url | str or list | - When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list. |
Returns Type | Description |
---|---|
list | Return article list. |
It provides crawling Munhwa Ilbo.
Parameters | Type | Description |
---|---|---|
delay_time | float or tuple | - Optional, Defaults to None. - When 'delay_time=float', it will crawl sites with delay. - When 'delay_time=tuple', it will crawl sites with random delay. |
saving_html | bool | - Optional, Defaults to False. - When 'saving_html=False', it always requests url every function calling. - When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load. |
Attributes | Type | Description |
---|---|---|
delay_time | float or tuple | |
saving_html | bool |
Methods | Description |
---|---|
dynamic_crawl(url) | Return article text using Selenium. |
static_crawl(url) | Return article text using BeautifulSoup. |
- Return article text using Selenium.
Parameters | Type | Description |
---|---|---|
url | str or list | - When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list. |
Returns Type | Description |
---|---|
list | Return article list. |
- Return article text using BeautifulSoup.
Parameters | Type | Description |
---|---|---|
url | str or list | - When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list. |
Returns Type | Description |
---|---|
list | Return article list. |
It provides crawling Naeil News.
Parameters | Type | Description |
---|---|---|
delay_time | float or tuple | - Optional, Defaults to None. - When 'delay_time=float', it will crawl sites with delay. - When 'delay_time=tuple', it will crawl sites with random delay. |
saving_html | bool | - Optional, Defaults to False. - When 'saving_html=False', it always requests url every function calling. - When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load. |
Attributes | Type | Description |
---|---|---|
delay_time | float or tuple | |
saving_html | bool |
Methods | Description |
---|---|
dynamic_crawl(url) | Return article text using Selenium. |
static_crawl(url) | Return article text using BeautifulSoup. |
- Return article text using Selenium.
Parameters | Type | Description |
---|---|---|
url | str or list | - When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list. |
Returns Type | Description |
---|---|
list | Return article list. |
- Return article text using BeautifulSoup.
Parameters | Type | Description |
---|---|---|
url | str or list | - When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list. |
Returns Type | Description |
---|---|
list | Return article list. |
It provides crawling Segye Ilbo.
Parameters | Type | Description |
---|---|---|
delay_time | float or tuple | - Optional, Defaults to None. - When 'delay_time=float', it will crawl sites with delay. - When 'delay_time=tuple', it will crawl sites with random delay. |
saving_html | bool | - Optional, Defaults to False. - When 'saving_html=False', it always requests url every function calling. - When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load. |
Attributes | Type | Description |
---|---|---|
delay_time | float or tuple | |
saving_html | bool |
Methods | Description |
---|---|
dynamic_crawl(url) | Return article text using Selenium. |
static_crawl(url) | Return article text using BeautifulSoup. |
- Return article text using Selenium.
Parameters | Type | Description |
---|---|---|
url | str or list | - When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list. |
Returns Type | Description |
---|---|
list | Return article list. |
- Return article text using BeautifulSoup.
Parameters | Type | Description |
---|---|---|
url | str or list | - When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list. |
Returns Type | Description |
---|---|
list | Return article list. |
It provides crawling Seoul Shinmun.
Parameters | Type | Description |
---|---|---|
delay_time | float or tuple | - Optional, Defaults to None. - When 'delay_time=float', it will crawl sites with delay. - When 'delay_time=tuple', it will crawl sites with random delay. |
saving_html | bool | - Optional, Defaults to False. - When 'saving_html=False', it always requests url every function calling. - When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load. |
Attributes | Type | Description |
---|---|---|
delay_time | float or tuple | |
saving_html | bool |
Methods | Description |
---|---|
dynamic_crawl(url) | Return article text using Selenium. |
static_crawl(url) | Return article text using BeautifulSoup. |
- Return article text using Selenium.
Parameters | Type | Description |
---|---|---|
url | str or list | - When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list. |
Returns Type | Description |
---|---|
list | Return article list. |
- Return article text using BeautifulSoup.
Parameters | Type | Description |
---|---|---|
url | str or list | - When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list. |
Returns Type | Description |
---|---|
list | Return article list. |