navernewscrawler

Tool for crawling news on Naver


License
MIT
Install
pip install navernewscrawler==0.0.3

Documentation

Naver News Crawler

Copyright (c) 2018-2019 Eunhou Esther Song

Python Package Index

https://pypi.org/project/navernewscrawler/

์†Œ๊ฐœ

๊ฒ€์ƒ‰์–ด์™€ ์‹œ์ž‘ ๋‚ ์งœ ๋ฐ ๋งˆ์ง€๋ง‰ ๋‚ ์งœ๋ฅผ ์ž…๋ ฅํ•˜์—ฌ ๋„ค์ด๋ฒ„ ํฌํƒˆ์— ๊ฒŒ์‹œ๋œ ๋‰ด์Šค๋ฅผ ์Šคํฌ๋ž˜์ดํ•‘ ํ•˜๋Š” ์Šคํฌ๋ฆฝํŠธ์ด๋‹ค. ๋„ค์ด๋ฒ„ ํฌํƒˆ์— ๊ฒŒ์‹œ๋œ ๋‰ด์Šค๋Š” http://news.naver.com ์œผ๋กœ ์‹œ์ž‘ํ•˜๋ฉฐ, ์ด ๋‰ด์Šค๋“ค ์™ธ์˜ ๋‰ด์Šค๋Š” ์Šคํฌ๋ž˜์ดํ•‘ ํ•˜์ง€ ์•Š๋Š”๋‹ค.

๋™๊ธฐ

๋‰ด์Šค๋ฅผ ๋‹ค์–‘ํ•œ ๋ชฉ์ ์„ ์œ„ํ•ด์„œ ์Šคํฌ๋ž˜์ดํ•‘ ํ•˜๊ณ ์žํ•˜๋Š” ์ˆ˜์š”๊ฐ€ ๋Š˜์—ˆ๋‹ค. ๋‹ค๋งŒ, ํ˜„์กดํ•˜๋Š” ์Šคํฌ๋ฆฝํŠธ๋Š” ํŠน์ • ๊ธฐ๊ฐ„ ์•ˆ์˜ ๋‰ด์Šค๋ฅผ ์Šคํฌ๋ž˜์ดํ•‘ ํ•  ์ˆ˜ ์—†๋‹ค. ์ด ์Šคํฌ๋ฆฝํŠธ๋Š” ํ˜„์กดํ•˜๋Š” ์Šคํฌ๋ฆฝํŠธ๋ฅผ ๊ฐœ์„ , ๋ณด์™„ํ•˜์—ฌ ๋‚ ์งœ ๋ฐ ๊ฒฐ๊ณผ๋ฌผ์˜ ํŽ˜์ด์ง€ ์ˆ˜๋ฅผ ์ง€์ •ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•˜์˜€๋‹ค. ๊ฐ€๋ น 2018-12-26์ผ โ€™๋…๋„โ€™์— ๋Œ€ํ•œ ๋‰ด์Šค ๊ฒ€์ƒ‰์„ ํ•˜์˜€์„๋•Œ, ๊ฒฐ๊ณผ๋ฌผ์˜ ์‹œ์ž‘ ํŽ˜์ด์ง€์™€ ๋งˆ์ง€๋ง‰ ํŽ˜์ด์ง€๋ฅผ ์ง€์ •ํ•ด์„œ ์Šคํฌ๋ ˆ์ดํ•‘ ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๋‚ ์งœ ๊ธฐ๊ฐ„ ๋˜ํ•œ ์ง€์ •ํ•  ์ˆ˜ ์žˆ๋‹ค (์˜ˆ: 2018-12-26 ๋ถ€ํ„ฐ 2018-12-30 ๊นŒ์ง€).

๊ฒฐ๊ณผ๋ฌผ์€ ๋‰ด์Šค์˜ ์ œ๋ชฉ, ๋‰ด์Šค ํšŒ์‚ฌ, ๋‚ ์งœ, ํ…Œ์ŠคํŠธ ๋„ค๊ฐ€์ง€์ด๋‹ค.

์„ค์น˜

์ด ์Šคํฌ๋ฆฝํŠธ๋Š” python 3.7์—์„œ๋งŒ ์ง€์›๋œ๋‹ค.

(sudo) pip3 install navernewscrawler

ํ˜น์€ repository์—์„œ setup.py์„ ๋ณต์‚ฌํ•˜์—ฌ ์ง์ ‘ ์ž…๋ ฅํ•œ๋‹ค.

python3.7 setup.py install

์ปค๋งจ๋“œ์–ด ์ •๋ฆฌ

  • -h ํ˜น์€ โ€“help: help message๋ฅผ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.
  • -bd ํ˜น์€ โ€“begindate: ์Šคํฌ๋ž˜์ดํ•‘ ์‹œ์ž‘ ๋‚ ์งœ๋ฅผ ์ง€์ •ํ•œ๋‹ค. ๋…„๋„, ์›”, ์ผ์€ โ€™-โ€™์œผ๋กœ ๊ตฌ๋ถ„ํ•œ๋‹ค. ์˜ˆ: 2018-12-26, 2018-06-19
  • -ed ํ˜น์€ โ€“enddate: ์Šคํฌ๋ž˜์ดํ•‘ ๋งˆ์ง€๋ง‰ ๋‚ ์งœ๋ฅผ ์ง€์ •ํ•œ๋‹ค.
  • -p ํ˜น์€ โ€“page: ๋‰ด์Šค ๊ฒฐ๊ณผ ํŽ˜์ด์ง€ ์ค‘ ์Šคํฌ๋ž˜์ดํ•‘ ํ•  ์ฒซ ํŽ˜์ด์ง€๋ฅผ ์ง€์ •ํ•œ๋‹ค. ๋””ํดํŠธ๋Š” ํŽ˜์ด์ง€ 1์ด๋‹ค.
  • -max_page ํ˜น์€ โ€“max_page: ๋‰ด์Šค ๊ฒฐ๊ณผ ํŽ˜์ด์ง€ ์ค‘ ๋งˆ์ง€๋ง‰ ํŽ˜์ด์ง€๋ฅผ ์ง€์ •ํ•œ๋‹ค. ๋””ํดํŠธ๋Š” ํŽ˜์ด์ง€ 5์ด๋‹ค. ํ•œ ํŽ˜์ด์ง€ ๋‹น 10๊ฑด์˜ ๋‰ด์Šค๊ฒฐ๊ณผ๋ฅผ ๋ณผ ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์—, ํ•˜๋ฃจ์— 50๊ฑด์˜ ๋‰ด์Šค๋ฅผ ์Šคํฌ๋ž˜์ดํ•‘ ํ•œ๋‹ค.
  • -c ํ˜น์€ โ€“csv: ์Šคํฌ๋ž˜์ดํ•‘ ๊ฒฐ๊ณผ๋ฅผ CSVํŒŒ์ผ์— ์ €์žฅํ•œ๋‹ค.
  • -d ํ˜น์€ โ€“dump: ์Šคํฌ๋ž˜์ดํ•‘ ๊ฒฐ๊ณผ๋ฅผ ์ฝ˜์†”์— ๋ณด์—ฌ์ค€๋‹ค.

์˜ˆ์‹œ

  • ์ปค๋งจ๋“œ ์ž…๋ ฅ์‹œ ๊ธฐ๋ณธ์œผ๋กœ ์‹œ์ž‘๋‚ ์งœ_news_scrape_๋งˆ์ง€๋ง‰๋‚ ์งœ.json ํŒŒ์ผ๋กœ ์ €์žฅ๋œ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด 2018๋…„12์›”26์ผ๋ถ€ํ„ฐ 2019๋…„12์›”26์ผ๊นŒ์ง€์˜ ๋‰ด์Šค๋ฅผ ์Šคํฌ๋ž˜์ดํ•‘ ํ•˜๋ฉด ๊ฒฐ๊ณผ๋Š” 20181226_news_scrape_20191226.json์œผ๋กœ ์ €์žฅ๋œ๋‹ค.

  • 2018-12-26, 2018-12-27 ์ดํ‹€ ๋™์•ˆ ๋…๋„์™€ ๊ด€๋ จ๋œ ๋‰ด์Šค๋ฅผ ๊ฒฐ๊ณผ๋ฌผ ํŽ˜์ด์ง€ 1์—์„œ 3๊นŒ์ง€ ์Šคํฌ๋ž˜์ดํ•‘ ํ•œ๋‹ค.

navernewscrawler ๋…๋„ -bd 2018-12-26 -ed 2018-12-27 -p 1 -max_page 3 
  • 2018-12-26 ํ•˜๋ฃจ ๋™์•ˆ ๋…๋„์™€ ๊ด€๋ จ๋œ ๋‰ด์Šค๋ฅผ ๊ฒฐ๊ณผ๋ฌผ ํŽ˜์ด์ง€ 1์—์„œ 3๊นŒ์ง€ ์Šคํฌ๋ž˜์ดํ•‘ ํ•œ๋‹ค. ๊ฒฐ๊ณผ๋Š” csv ํŒŒ์ผ๋กœ ์ €์žฅํ•œ๋‹ค.
navernewscrawler ๋…๋„ -bd 2018-12-26 -ed 2018-12-26 -p 1 -max_page 3 -c

๊ฒฐ๊ณผ๋ฌผ

[{'title': '์šฐ๋ฆฌ๋‚˜๋ผ ๊ตญํšŒ์˜์›๋“ค, ์ผ๋ณธ์—์„œ ๋ณด๋‚ธ ๋…๋„ ๋ฐฉ๋ฌธ ํ•ญ์˜ ์„œํ•œ ๋ฐ˜์†กํ•ด',
  'date': '2018-12-26 ',
  'company': 'YTN',
  'text': '์ง€๋‚œ 10์›” 22์ผ ๋…๋„๋ฅผ ๋ฐฉ๋ฌธํ•œ ๊ตญํšŒ ๊ต์œก์œ„์›ํšŒ ์†Œ์† ์˜์›๋“ค์ด ์ผ๋ณธ ์ž๋ฏผ๋‹น ์†Œ์† ์ค‘์˜์› ๋“ฑ์ด ๋ณด๋‚ธ ํ•ญ์˜ ์„œํ•œ์„ ๋˜๋Œ๋ ค๋ณด๋‚ธ ์‚ฌ์‹ค์ด ๋’ค๋Šฆ๊ฒŒ ์•Œ๋ ค์กŒ๋‹ค.์ผ๋ณธ ์–ธ๋ก ์— ๋”ฐ๋ฅด๋ฉด, ์ผ๋ณธ ์ž๋ฏผ๋‹น ์†Œ์† ์ค‘์˜์›์ด์ž \'์ผ๋ณธ ์˜ํ† ๋ฅผ ์ง€ํ‚ค๊ธฐ ์œ„ํ•ด ํ–‰๋™ํ•˜๋Š” ์˜์› ์—ฐ๋งน\' ์†Œ์† ์‹ ๋„ ์š”์‹œํƒ€์นด ์˜์›์€ 25์ผ ๊ธฐ์žํšŒ๊ฒฌ์„ ์—ด์–ด ํ•œ๊ตญ ๊ตญํšŒ์˜์›๋“ค์ด ๋ฐ˜์†กํ•œ ์„œํ•œ์„ ๊ณต๊ฐœํ–ˆ๋‹ค.\'์ผ๋ณธ ์˜ํ† ๋ฅผ ์ง€ํ‚ค๊ธฐ ์œ„ํ•œ ์˜์› ์—ฐ๋งน\'์€ "๋…๋„๊ฐ€ ํ•œ๊ตญ ๋•…์ธ ๊ทผ๊ฑฐ๋ฅผ ๋Œ€๋ผ"๋Š” ๋‚ด์šฉ์„ ๋‹ด์€ ํ•ญ์˜์„œํ•œ 13ํ†ต ์ค‘ 10ํ†ต์€ ๋œฏ์–ด์ง„ ์ฑ„๋กœ, ๋‚˜๋จธ์ง€๋Š” ๋ด‰ํˆฌ ์—†์ด, ๋‹ค๋ฅธ ํ•œ ํ†ต์€ ๋ฐ˜์†ก๋˜์ง€ ์•Š์•˜๋‹ค๊ณ  ๋ฐํ˜”๋‹ค.์ง€๋‚œ 10์›”, ๋…๋„๋ฅผ ๋ฐฉ๋ฌธํ–ˆ๋˜ ๊ตญํšŒ ๊ต์œก์˜์›์ด์—ˆ๋˜ ์ด์ฐฌ์—ด ๋ฐ”๋ฅธ๋ฏธ๋ž˜๋‹น ์˜์›์€ ๋…๋„ ๋ฐฉ๋ฌธ ํ›„์— ์ด๋ฏธ ํ•ญ์˜ ์„œํ•œ์„ ๋ฐ›์ง€ ์•Š๊ฒ ๋‹ค๋Š” ์˜์ง€๋ฅผ ํ‘œ๋ช…ํ•œ ๋ฐ” ์žˆ๋‹ค. ์ด์ฐฌ์—ด ์˜์›์€ ๋…๋„๊ฐ€ ์šฐ๋ฆฌ ๋•…์ธ ๊ทผ๊ฑฐ๋ฅผ ๋Œ€๋ผ๋Š” ์งˆ๋ฌธ์— ๋Œ€ํ•ด "๋‹ต๋ณ€ํ•  ์ด์œ ๊ฐ€ ์—†๋‹ค"๊ณ  ์ž˜๋ผ ๋งํ•˜๊ธฐ๋„ ํ–ˆ๋‹ค.์ด์ฐฌ์—ด ์˜์›์€ CBS์™€์˜ ์ธํ„ฐ๋ทฐ์—์„œ "(๋ฐ˜๋Œ€๋กœ) ๋‹น์‹ ๋“ค์ด ๋…๋„๊ฐ€ ์ผ๋ณธ ๋•…์ด๋ผ๊ณ  ์ฃผ์žฅํ•˜๋Š” ๊ทผ๊ฑฐ๋ฅผ ๋Œ€๋ด๋ผ"๋ผ๋ฉฐ "์ผ๋ณธ์ด ๊ตฐ๊ตญ์ฃผ์˜์˜ ์•ผ์‹ฌ๋งŒ ๋“œ๋Ÿฌ๋‚ด๊ณ  ์žˆ๋‹ค"๊ณ  ๋งํ•œ ๋ฐ” ์žˆ๋‹ค.[์‚ฌ์ง„ = ์ผ๋ณธ ์˜ํ† ๋ฅผ ์ง€ํ‚ค๊ธฐ ์œ„ํ•ด ํ–‰๋™ํ•˜๋Š” ์˜์› ์—ฐ๋งน, ์ด์ฐฌ์—ด ์˜์› ํŠธ์œ„ํ„ฐ]YTN PLUS ์ตœ๊ฐ€์˜ ๊ธฐ์ž (weeping07@ytnplus.co.kr)  โ–ถ 24์‹œ๊ฐ„ ์‹ค์‹œ๊ฐ„ ๋‰ด์Šค ์ƒ๋ฐฉ์†ก ๋ณด๊ธฐ  โ–ถ ๋„ค์ด๋ฒ„ ๋ฉ”์ธ์—์„œ YTN์„ ๊ตฌ๋…ํ•ด์ฃผ์„ธ์š”! [์ €์ž‘๊ถŒ์ž(c) YTN & YTN PLUS ๋ฌด๋‹จ์ „์žฌ ๋ฐ ์žฌ๋ฐฐํฌ ๊ธˆ์ง€]'},
 {'title': "ๆ—ฅ์˜์›์ด ้Ÿ“์˜์›์— ๋ณด๋‚ธ '๋…๋„ ์˜์œ ๊ถŒ' ์งˆ๋ฌธ์„œ ๋ฐ˜์†ก",
  'date': '2018-12-26 ',
  'company': '์—ฐํ•ฉ๋‰ด์Šค',
  'text': '(๋„์ฟ„=์—ฐํ•ฉ๋‰ด์Šค) ๊น€์ •์„  ํŠนํŒŒ์› = ์ผ๋ณธ ์—ฌ์•ผ ์˜์›๋“ค๋กœ ๊ตฌ์„ฑ๋œ ๋ชจ์ž„์ด ์ง€๋‚œ 10์›” ๋…๋„๋ฅผ ๋ฐฉ๋ฌธํ•œ ์šฐ๋ฆฌ๋‚˜๋ผ ๊ตญํšŒ์˜์›๋“ค์—๊ฒŒ ํ•œ๊ตญ์˜ ๋…๋„ ์˜์œ ๊ถŒ ์ฃผ์žฅ ๊ทผ๊ฑฐ๋ฅผ ์ œ์‹œํ•˜๋ผ๋ฉฐ ๋ณด๋ƒˆ๋˜ ๊ณต๊ฐœ์งˆ๋ฌธ์„œ๊ฐ€ ๋ฐ˜์†ก๋œ ๊ฒƒ์œผ๋กœ ๋‚˜ํƒ€๋‚ฌ๋‹ค.     26์ผ NHK ๋“ฑ์— ๋”ฐ๋ฅด๋ฉด \'์ผ๋ณธ ์˜ํ† ๋ฅผ ์ง€ํ‚ค๊ธฐ ์œ„ํ•ด ํ–‰๋™ํ•˜๋Š” ์˜์›์—ฐ๋งน\'(์ดํ•˜ ์˜์›์—ฐ๋งน)์˜ ์‹ ๋„ ์š”์‹œํƒ€์นด(ๆ–ฐ่—ค็พฉๅญยท์ž๋ฏผ๋‹น) ํšŒ์žฅ์€ ์ „๋‚  ๊ธฐ์žํšŒ๊ฒฌ์—์„œ ์ง€๋‚œ๋‹ฌ ๋ฐœ์†กํ•œ ์งˆ๋ฌธ์„œ๊ฐ€ ๊ทธ๋Œ€๋กœ ๋ฐ˜์†ก๋๋‹ค๊ณ  ๋ฐํ˜”๋‹ค.     ์˜์›์—ฐ๋งน์€ ์ง€๋‚œ 10์›” 22์ผ ํ•œ๊ตญ์˜ ๊ตญํšŒ ๊ต์œก์œ„์›ํšŒ ์†Œ์† ์˜์›๋“ค์ด ๋…๋„๋ฅผ ๋ฐฉ๋ฌธํ•˜์ž ๋‹ค์Œ ๋‹ฌ ์ด๋ฅผ ์šฉ๋‚ฉํ•  ์ˆ˜ ์—†๋‹ค๋ฉฐ ํ•œ๊ตญ ์ธก์˜ ์˜์œ ๊ถŒ ๊ทผ๊ฑฐ ๋“ฑ์„ ์ œ์‹œํ•˜๋ผ๋Š” ์งˆ๋ฌธ์„œ๋ฅผ ๋ณด๋ƒˆ๋‹ค. \'๋…๋„๋Š” ์šฐ๋ฆฌ๋•…\'(์„œ์šธ=์—ฐํ•ฉ๋‰ด์Šค) ๊น€์ฃผ์„ฑ ๊ธฐ์ž = ์ผ๋ณธ ์‹œ๋งˆ๋„ค(ๅณถๆ น)ํ˜„์ด \'๋‹ค์ผ€์‹œ๋งˆ(็ซนๅณถยท์ผ๋ณธ์ด ์ฃผ์žฅํ•˜๋Š” ๋…๋„ ๋ช…์นญ)์˜ ๋‚ \' ํ–‰์‚ฌ๋ฅผ ์ฃผ์ตœํ•œ 2017๋…„ 2์›” 22์ผ ์˜คํ›„ ์„œ์šธ ์ข…๋กœ๊ตฌ ์ฃผํ•œ์ผ๋ณธ๋Œ€์‚ฌ๊ด€ ์˜›ํ„ฐ ์•ž์—์„œ ๋‚˜๋ผ์‚ด๋ฆฌ๊ธฐ๊ตญ๋ฏผ์šด๋™๋ณธ๋ถ€ ์ฐธ๊ฐ€ ํ•™์ƒ๋“ค์ด ์ผ๋ณธ์˜ ๋…๋„ ์นจํƒˆ ์•ผ์š•์„ ๊ทœํƒ„ํ•œ ๋’ค ๋งŒ์„ธ์‚ผ์ฐฝ์„ ํ•˜๊ณ  ์žˆ๋‹ค. 2017.2.22 utzza@yna.co.kr    ์˜์›์—ฐ๋งน์€ ํ•œ๊ตญ ๊ตญํšŒ์˜์› 13๋ช…์—๊ฒŒ ์งˆ๋ฌธ์„œ๋ฅผ ๋ณด๋ƒˆ์ง€๋งŒ 12ํ†ต์ด ๋ฐ˜์†ก๋๋‹ค๊ณ  ์‚ฐ์ผ€์ด์‹ ๋ฌธ์€ ์ „ํ–ˆ๋‹ค.     ์‹ ๋„ ํšŒ์žฅ์€ ๊ธฐ์žํšŒ๊ฒฌ์—์„œ ์งˆ๋ฌธ์„œ๊ฐ€ ๋ฐ˜์†ก๋œ ๊ฒƒ์— ๋Œ€ํ•ด "๋งค์šฐ ์œ ๊ฐ"์ด๋ผ๋ฉฐ "๋…์„ ์  ํ–‰๋™๋ฐ–์— ํ•˜์ง€ ์•Š๋Š” ๊ตญ๊ฐ€์˜ ๋ฏธ๋ž˜๋Š” ๋งค์šฐ ๊ฑฑ์ •์Šค๋Ÿฝ๋‹ค"๊ณ  ์ฃผ์žฅํ–ˆ๋‹ค.    ์‹ ๋„ ์˜์›์€ "ํ•œ์ผ๊ด€๊ณ„๋Š” ๋‹ค์ผ€์‹œ๋งˆ(็ซนๅณถยท์ผ๋ณธ์ด ์ฃผ์žฅํ•˜๋Š” ๋…๋„์˜ ๋ช…์นญ) ๋ฌธ์ œ๊ฐ€ ๊ทผ์›์— ๋ฐ•ํ˜€ ์žˆ์–ด ์ด๊ฒƒ์ด ๋น ์ง€์ง€ ์•Š๋Š” ํ•œ ์ง„์ •ํ•œ ์‹ ๋ขฐ๋กœ๋Š” ์ด์–ด์ง€์ง€ ์•Š์„ ๊ฒƒ"์ด๋ผ๊ณ  ๋งํ–ˆ๋‹ค๊ณ  ๋ฐฉ์†ก์€ ๋ง๋ถ™์˜€๋‹ค.     jsk@yna.co.krโ–ถ๋ญ ํ•˜๊ณ  ๋†€๊นŒ? #ํฅ  โ–ถ์‡ผ๋ฏธ๋”๋‰ด์Šค! ์˜ค๋Š˜ ๋งŽ์ด ๋ณธ ๋‰ด์Šค์˜์ƒ โ–ถ๋„ค์ด๋ฒ„ ํ™ˆ์—์„œ [์—ฐํ•ฉ๋‰ด์Šค] ์ฑ„๋„ ๊ตฌ๋…ํ•˜๊ธฐ'}]

Json ํŒŒ์ผ ์ฝ๊ธฐ

import codecs
import json
with codecs.open('ํŒŒ์ผ์ด๋ฆ„.json', 'r', 'utf-8') as f:
    news = json.load(f, encoding='utf-8')

Introduction

This is a script that scrapes Naver news results of a query word(s). The scraped results only include news published on naver news portal, which begins with url http://news.naver.com. This tool does not scrape results that do not begin with this url.

The scraped results include the title, text, date, and the media source.

Motivation

There has been rise in demand for scraping news online, yet there has not been a proper tool that allows scraping Korean news online. This tool allows users to scrape news published on Naver, one of the largest web portals in South Korea. Pre-existing tools only allow crawling a single query result that does not allow collection of new results over time. This tool allows collection of news published on Naver over a period of time, and also provides the user with the option to limit the scrape results per date. For instance, news results per day may reach more than 40,000 page results, but the user can limit the scope by setting the starting page and the ending page using command line options.

Installation

This script only runs on Python 3.7.

(sudo) pip3 install navernewscrawler

Or you can download setup.py and directly install the file.

python3.7 setup.py install

Commands

  • -h or โ€“help: See the help message
  • -bd or โ€“begindate: Set the begin date in โ€˜Y-M-Dโ€™ format. ex: 2018-12-26, 2018-06-19. The default is 2018-12-26.
  • -ed or โ€“enddate: Set the end date. The default is 2018-12-26.
  • -p or โ€“page: Out of all news results, set the starting page. Default is 1.
  • -max_page or โ€“max_page: Out of all news results, set the end page. Default is 5.
  • -c or โ€“csv: Save the scraped results to CSV file.
  • -d or โ€“dump: Show the scraped results in console.

Example

  • The default setting is that the output is stored in .json format. The name of the file is โ€˜start date_news_scrape_end dateโ€™. ex: 20181226_news_scrape_20191226.json

  • Below scrapes the news results querying โ€˜๋…๋„โ€™(Dokdo Island) for two days: 2018-12-26 and 2018-12-27

navernewscrawler ๋…๋„ -bd 2018-12-26 -ed 2018-12-27 -p 1 -max_page 3 
  • Below scrapes the news results querying โ€˜๋…๋„โ€™(Dokdo Island) for one day: 2018-12-26, and stores the results to CSV file.
navernewscrawler ๋…๋„ -bd 2018-12-26 -ed 2018-12-26 -p 1 -max_page 3 -c