navernews

A simple Python module to crawl Naver News. See https://github.com/q1c/navernews for more information.


Keywords
naver, news, scraping
License
Other
Install
pip install navernews==0.0.1

Documentation

Naver News Library

A simple Python library to scrape naver news with multi-threaded downloading.

Dependencies

  • requests
  • lxml

Usage

import navernews

l_article = []

from datetime import datetime
str_sid1 = '101'
#start from 2016/4/14 and go back to 2016/4/14
dt_org = datetime(2016,4,15)
dt_end = datetime(2016,4,14)
def mongo_callback(article, article_id):
    l_article.append((article_id,article))
navernews.download_naver_news_date_range(str_sid1, dt_org, dt_end, mongo_callback)

Output:

2016-04-14
324/324 100.00%
2016-04-13
247/247 100.00%

article_id, article = l_article[0]
print article['textv1']

Output:

20๋Œ€ ๊ตญํšŒ์˜์› ์„ ๊ฑฐ ๊ฒฐ๊ณผ ์œ ๋ ฅ ์ •์น˜์ธ๋“ค์˜ ํฌ๋น„๊ฐ€ ์—‡๊ฐˆ๋ฆฌ๋ฉด์„œ 14์ผ ๊ด€๋ จ ํ…Œ๋งˆ์ฃผ๋„ ์š”๋™์„ ์ณค๋‹ค. ์˜ˆ์ƒ์„ ๋›ฐ์–ด๋„˜๋Š” ์„ฑ๊ณผ๋ฅผ ๊ฑฐ๋‘” ๋”๋ถˆ์–ด๋ฏผ์ฃผ๋‹น๊ณผ ๊ตญ๋ฏผ์˜๋‹น ๊ด€๋ จ์ฃผ๋Š” ๊ธ‰๋“ฑํ–ˆ๊ณ , ์ฐธํŒจํ•œ ์ƒˆ๋ˆ„๋ฆฌ๋‹น ๊ด€๋ จ์ฃผ๋Š” ๊ธ‰๋ฝํ–ˆ๋‹ค.
์ด๋‚  ๊ฐ€์žฅ ๋ˆˆ์— ๋ˆ ์ข…๋ชฉ์€ ์•ˆ์ฒ ์ˆ˜ ํ…Œ๋งˆ์ฃผ์˜€๋‹ค. ๊ตญ๋ฏผ์˜๋‹น ์•ˆ์ฒ ์ˆ˜ ๊ณต๋™๋Œ€ํ‘œ๊ฐ€ ์„ค๋ฆฝํ•œ ์•ˆ๋žฉ์˜ ์ฃผ๊ฐ€๋Š” ์žฅ์ด ์‹œ์ž‘ํ•˜์ž๋งˆ์ž 21% ์ด์ƒ ์น˜์†Ÿ์•˜๋‹ค. ์ดํ›„ ์ฐจ์ต ๋ฌผ๋Ÿ‰์ด ์ƒ์Šน๋ถ„์„ ๋ฐ˜๋‚ฉํ•ด ์ „๋‚ ๋ณด๋‹ค 1.71%๋งŒ ์˜ค๋ฅธ ์ฑ„ ๋งˆ๊ฐํ–ˆ๋‹ค. ์—ญ์‹œ ์•ˆ์ฒ ์ˆ˜ ํ…Œ๋งˆ์ฃผ๋กœ ๊ผฝํžˆ๋Š” ์จ๋‹ˆ์ „์ž์™€ ๋‹ค๋ฏˆ๋ฉ€ํ‹ฐ๋ฏธ๋””์–ด๋„ ์žฅ ์ดˆ๋ฐ˜ ๊ฐ๊ฐ 17%, 15% ์˜ฌ๋ž๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๊ฐˆ์ˆ˜๋ก ์ฃผ๊ฐ€๊ฐ€ ๋น ์ ธ ๊ฐ๊ฐ 0.74%, -6.18%์˜ ๋“ฑ๋ฝ๋ฅ ๋กœ ์žฅ์„ ๋งˆ์ณค๋‹ค.
๋”๋ฏผ์ฃผ์˜ โ€˜๋ฌธ์žฌ์ธ ํ…Œ๋งˆ์ฃผโ€™๋Š” ๋Œ€๋ถ€๋ถ„ ํฐ ํญ์œผ๋กœ ์ƒ์Šนํ–ˆ๋‹ค. ์šฐ๋ฆฌ๋“คํœด๋ธŒ๋ ˆ์ธ ์ฃผ๊ฐ€๊ฐ€ 15%๋‚˜ ์˜ฌ๋ž๊ณ  ์šฐ๋ฆฌ๋“ค์ œ์•ฝ, ์—์ด์—”ํ”ผ ๋“ฑ๋„ 2โˆผ5% ์ƒ์Šนํ–ˆ๋‹ค.
๋ฐ˜๋ฉด ์ƒˆ๋ˆ„๋ฆฌ๋‹น ๊น€๋ฌด์„ฑ ๋Œ€ํ‘œ์˜ ๋ถ€์นœ์ด ์„ค๋ฆฝํ•œ ์ „๋ฐฉ์˜ ์ฃผ๊ฐ€๋Š” 18.65%๋‚˜ ๋น ์กŒ๋‹ค. ์—”์ผ€์ด(-20.4%), ๋””์ง€ํ‹€์กฐ์„ (-18.59%), ์กฐ์ผ์•Œ๋ฏธ๋Š„(-17.09%) ๋“ฑ ๋‹ค๋ฅธ ๊น€๋ฌด์„ฑ ํ…Œ๋งˆ์ฃผ๋“ค๋„ ๊ธ‰๋ฝ์„ธ๋ฅผ ๋นš์—ˆ๋‹ค.
ํ•œํŽธ ์ด๋‚  ์ฝ”์Šคํ”ผ๋Š” ์ค‘๊ตญ๋ฐœ ํ›ˆํ’์— ๊ธ‰๋ฐ˜๋“ฑํ•ด 2010์„ ์„ ๋ŒํŒŒํ–ˆ๋‹ค. ์ฝ”์Šคํ”ผ๋Š” ์ „๋‚ ๋ณด๋‹ค 34.61ํฌ์ธํŠธ(1.75%) ์˜ค๋ฅธ 2015.93์œผ๋กœ ์žฅ์„ ๋งˆ์ณค๋‹ค. ์—ฐ์ค‘ ์ตœ๊ณ ์น˜์ด์ž ์ง€๋‚œํ•ด 12์›”1์ผ(2023.93) ์ดํ›„ ๊ฐ€์žฅ ๋†’์€ ์ˆ˜์น˜๋‹ค. ๊น€์ •ํ˜„ IBKํˆฌ์ž์ฆ๊ถŒ ์—ฐ๊ตฌ์›์€ โ€œ์ค‘๊ตญ ์ˆ˜์ถœ ์ง€ํ‘œ์˜ ํ˜ธ์กฐ์„ธ, ์œ ๊ฐ€ ๋ฐ˜๋“ฑ์„ธ ๋“ฑ์œผ๋กœ ์œ„ํ—˜์ž์‚ฐ ์„ ํ˜ธ ์‹ฌ๋ฆฌ๊ฐ€ ๊ฐ•ํ™”๋๊ณ , ์™ธ๊ตญ์ธ ๋งค์ˆ˜์„ธ๊ฐ€ ์ง€์ˆ˜๋ฅผ ๋Œ์–ด์˜ฌ๋ ธ๋‹คโ€๊ณ  ๋ถ„์„ํ–ˆ๋‹ค.
์ด์ง„๊ฒฝ ๊ธฐ์ž ljin@segye.com
โ“’ ์„ธ์ƒ์„ ๋ณด๋Š” ๋ˆˆ, ๊ธ€๋กœ๋ฒŒ ๋ฏธ๋””์–ด

Installation

Run the following pip command to install this library:

pip install navernews

Manual Installation

Run the following command to install this library.

sudo python setup.py install