resoup

Various convenient features related to requests.


Keywords
requests, bs4, BeautifulSoup, async, caching, cache
License
MIT
Install
pip install resoup==0.5.2

Documentation

Caution

This library is unmaintained and replaced with hxsoup.

resoup

Various convenient features related to requests and BeautifulSoup. (requests + BeautifulSoup)

  1. requests๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์™€ BeatifulSoup๋ฅผ ํ•ฉ์ณ ๋ช‡ ์ค„์˜ ์ฝ”๋“œ๋ฅผ ํ•˜๋‚˜์— ํ•ฉ์น  ์ˆ˜ ์žˆ์œผ๋ฉฐ,
  2. ๊ฐ„๋‹จํ•˜๊ฒŒ async, cache๋ฅผ ๋ถˆ๋Ÿฌ์™€ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  3. ์›น ์Šคํฌ๋ž˜ํ•‘ ์‹œ ํŽธ๋ฆฌํ•œ ๊ธฐ๋ณธ๊ฐ’๋„ ์ค€๋น„๋˜์–ด ์žˆ๊ณ ,
  4. no_empty_result, attempts, avoid_sslerror ๋“ฑ ๋‹ค์–‘ํ•˜๊ณ  ์†Œ์†Œํ•œ ๊ธฐ๋Šฅ๋„ ์ค€๋น„๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

์†Œ์†Œํ•˜์ง€๋งŒ ์œ ์šฉํ•˜๋ฉฐ, ์„œ๋„ˆ ์ค„์˜ ์ฝ”๋“œ ์ž‘์„ฑ๋Ÿ‰์„ ์ค„์—ฌ์ฃผ๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์ž…๋‹ˆ๋‹ค.

์‹œ์ž‘ํ•˜๊ธฐ

  1. ํŒŒ์ด์ฌ์„ ์„ค์น˜ํ•ฉ๋‹ˆ๋‹ค.

  2. ํ„ฐ๋ฏธ๋„์—์„œ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

    pip install -U resoup

requests์™€ bs4๋Š” ๊ฐ™์ด ์„ค์น˜๋˜์ง€๋งŒ BeatifulSoup์˜ ์ถ”๊ฐ€์ ์ธ parser์ธ lxml์™€ html5lib๋Š” ๊ธฐ๋ณธ์œผ๋กœ ์ œ๊ณตํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

๋”ฐ๋ผ์„œ lxml, html5lib ๋“ฑ์€ ์Šค์Šค๋กœ ์„ค์น˜ํ•˜์…”์•ผ ์˜ค๋ฅ˜๊ฐ€ ๋‚˜์ง€ ์•Š์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋งŒ์•ฝ ์„ค์น˜๋˜์ง€ ์•Š์€ ์ƒํƒœ๋กœ ํ•ด๋‹น parser๋ฅผ ์ด์šฉํ•œ๋‹ค๋ฉด NoParserError๊ฐ€ ๋‚ฉ๋‹ˆ๋‹ค.

์‚ฌ์šฉ๋ฒ•

์ฐธ๊ณ : ์˜ˆ์‹œ๋“ค์˜ ๊ฒฝ์šฐ ๋งŽ์€ ๊ฒฝ์šฐ get ์š”์ฒญ์„ ์œ„์ฃผ๋กœ ์„ค๋ช…ํ•˜์ง€๋งŒ, ๋‹ค๋ฅธ ๋ชจ๋“  ๋ฉ”์†Œ๋“œ(options/head/post/put/patch/delete)์—์„œ๋„ ๋™์ผํ•˜๊ฒŒ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.

resoup.requests ๋ชจ๋“ˆ

resoup.requests ๋ชจ๋“ˆ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด importํ•ด ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

from resoup import requests  # `import requests`์™€ ํ˜ธํ™˜๋จ.

์ด ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋Š” requests ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์™€ 99% ํ˜ธํ™˜๋˜๋ฉฐ (์‹ฌ์ง€์–ด ํƒ€์ž… ํžŒํŠธ๋„ requests ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์™€ ๋˜‘๊ฐ™์ด ์ž˜ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค!), ๊ทธ ์œ„์— ํŽธ๋ฆฌํ•œ ๊ธฐ๋Šฅ์„ ์–น์€ ํ˜•ํƒœ์ž…๋‹ˆ๋‹ค. ์ฆ‰, ๊ธฐ์กด import requests๋ฅผ ์œ„์˜ ์ฝ”๋“œ๋กœ ๊ต์ฒดํ•˜๋ฉด ๊ธฐ์กด์˜ ์ฝ”๋“œ๋ฅผ ๋ง๊ฐ€๋œจ๋ฆฌ์ง€ ์•Š์œผ๋ฉด์„œ๋„ ์ž˜ ํ†ตํ•ฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

requests์˜ Session๋„ ๋น„์Šทํ•˜๊ฒŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

from resoup import requests

with requests.Session() as session:
    ...  # cget, attempts ๋“ฑ ๋ชจ๋“  ๊ธฐ๋Šฅ ์‚ฌ์šฉ ๊ฐ€๋Šฅ

๊ธฐ๋ณธ๊ฐ’

๊ธฐ๋ณธ๊ฐ’๋“ค์€ ๊ฐ๊ฐ ์ ๋‹นํ•œ ๊ฐ’์œผ๋กœ ์„ค์ •๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

๊ธฐ๋ณธ๊ฐ’๋“ค์€ ๋‹ค์Œ๊ณผ ๊ฐ™๊ณ  request.get/options/head/post/put/patch/delete์—์„œ ์ ์šฉ๋ฉ๋‹ˆ๋‹ค.

timeout ๊ธฐ๋ณธ๊ฐ’: 120
headers ๊ธฐ๋ณธ๊ฐ’: {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
    "Accept-Encoding": "gzip, deflate, br",
    "Accept-Language": "ko-KR,ko;q=0.9",
    "Sec-Ch-Ua": '"Chromium";v="116", "Not)A;Brand";v="24", "Google Chrome";v="116"',
    "Sec-Ch-Ua-Mobile": "?0",
    "Sec-Ch-Ua-Platform": '"Windows"',
    "Sec-Fetch-Dest": "document",
    "Sec-Fetch-Mode": "navigate",
    "Sec-Fetch-Site": "none",
    "Sec-Fetch-User": "?1",
    "Upgrade-Insecure-Requests": "1",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36",
}
attempts ๊ธฐ๋ณธ๊ฐ’: 1
avoid_sslerror ๊ธฐ๋ณธ๊ฐ’: False
>>> from resoup import requests
>>>
>>> from resoup import requests
>>> res = requests.get("https://httpbin.org/headers")
>>> res.json()['headers']
{'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
 'Accept-Encoding': 'gzip, deflate, br',
 'Accept-Language': 'ko-KR,ko;q=0.9',
 'Host': 'httpbin.org',
 'Sec-Ch-Ua': '"Chromium";v="116", "Not)A;Brand";v="24", "Google Chrome";v="116"',
 'Sec-Ch-Ua-Mobile': '?0',
 'Sec-Ch-Ua-Platform': '"Windows"',
 'Sec-Fetch-Dest': 'document',
 'Sec-Fetch-Mode': 'navigate',
 'Sec-Fetch-Site': 'none',
 'Sec-Fetch-User': '?1',
 'Upgrade-Insecure-Requests': '1',
 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36',
 'X-Amzn-Trace-Id': ...}

์‘๋‹ต

resoup.requests ๋ชจ๋“ˆ์˜ get/options/head/post/put/patch/delete ํ•จ์ˆ˜๋Š” ๋ชจ๋‘ ResponseProxy๋ฅผ ๋ฆฌํ„ดํ•ฉ๋‹ˆ๋‹ค.

ResponseProxy๋Š” ๊ธฐ์กด Response์™€ 100% ํ˜ธํ™˜๋˜๋Š” Response์˜ subclass์ž…๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ ResponseProxy ํ•ญ๋ชฉ์„ ์ฐธ๊ณ ํ•˜์„ธ์š”.

๊ธฐ๋Šฅ์„ ์ž˜ ์ดํ•ดํ•˜์ง€ ๋ชปํ–ˆ๋‹ค๋ฉด ๊ธฐ์กด์— Response๋ฅผ ์‚ฌ์šฉํ•˜๋˜ ๋ฐฉ์‹๋Œ€๋กœ ์‚ฌ์šฉํ•˜์‹œ๋ฉด ๋ฌธ์ œ ์—†์ด ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.

attempts

attempts๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ, ๋ชจ์ข…์˜ ์ด์œ ๋กœ ConnectionError๊ฐ€ ๋ฐœ์ƒํ–ˆ์„ ๋•Œ ๊ฐ™์€ requests๋ฅผ ๋ช‡ ๋ฒˆ ๋” ๋ฐ˜๋ณตํ•  ๊ฒƒ์ธ์ง€ ์„ค์ •ํ•˜๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ์ž…๋‹ˆ๋‹ค.

๋งŒ์•ฝ 10๋ฒˆ์„ ์‹คํ–‰ํ•˜๊ณ ๋„ ์‹คํŒจํ–ˆ๋‹ค๋ฉด ๊ฐ€์žฅ ์ตœ๊ทผ์— ์‹คํŒจํ•œ ์—ฐ๊ฒฐ์˜ ์ด์œ ๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

>>> from resoup import requests
>>>
>>> requests.get('https://some-not-working-website.com', attempts=10)
WARNING:root:Retring...
WARNING:root:Retring...
WARNING:root:Retring...
WARNING:root:Retring...
WARNING:root:Retring...
WARNING:root:Retring...
WARNING:root:Retring...
WARNING:root:Retring...
WARNING:root:Retring...
WARNING:root:Retring...
Traceback (most recent call last):
...
socket.gaierror: [Errno 11001] getaddrinfo failed

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
...
urllib3.exceptions.NameResolutionError: <urllib3.connection.HTTPSConnection object at ...>: Failed to resolve 'some-not-working-website.com' ([Errno 11001] getaddrinfo failed)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
...
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='some-not-working-website.com', port=443): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at ...>: Failed to resolve 'some-not-working-website.com' ([Errno 11001] getaddrinfo failed)"))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
...
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='some-not-working-website.com', port=443): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at ...>: Failed to resolve 'some-not-working-website.com' ([Errno 11001] getaddrinfo failed)"))

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
...
ConnectionError: Trying 10 times but failed to get data.
URL: https://some-not-working-website.com

avoid_sslerror

avoid_sslerror๋Š” UNSAFE_LEGACY_RENEGOTIATION_DISABLED์œผ๋กœ ์ธํ•ด ์˜ค๋ฅ˜๊ฐ€ ๋‚˜ํƒ€๋‚˜๋Š” ์‚ฌ์ดํŠธ์—์„œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์˜ˆ๋ฅผ ๋“ค์–ด ๋‹ค์Œ์˜ ์‚ฌ์ดํŠธ๋Š” avoid_sslerror ์—†์ด๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์˜ค๋ฅ˜๋ฅผ ์ผ์œผํ‚ต๋‹ˆ๋‹ค.

>>> from resoup import requests
>>> requests.get('https://bufftoon.plaync.com')
---------------------------------------------------------------------------
SSLError                                  Traceback (most recent call last)
...
SSLError: HTTPSConnectionPool(host='bufftoon.plaync.com', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))

avoid_sslerror๋ฅผ True๋กœ ํ•˜๋ฉด ํ•ด๋‹น ์˜ค๋ฅ˜๋ฅผ ํ”ผํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

<Response [200]>

์ผ๋ฐ˜ ์š”์ฒญ ํ•จ์ˆ˜

์ผ๋ฐ˜ requests.get/options/head/post/put/patch/delete๋ฅผ requests์—์„œ ์‚ฌ์šฉํ•˜๋˜ ๋ฐฉ์‹ ๊ทธ๋Œ€๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋‹ค์Œ์€ requests.get๊ณผ post์˜ ์˜ˆ์‹œ์ž…๋‹ˆ๋‹ค. requests๋ชจ๋“ˆ๊ณผ ๋˜‘๊ฐ™์ด ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.

>>> from resoup import requests
>>>
>>> requests.get('https://jsonplaceholder.typicode.com/todos/1').json()  # API that can send request in order to test. Don't execute this command unless you trust this API.
{'userId': 1, 'id': 1, 'title': 'delectus aut autem', 'completed': False}
>>> requests.post('https://jsonplaceholder.typicode.com/todos', json={
...     'title': 'foo',
...     'body': 'bar',
...     'userId': 1,
... }).json()
{'title': 'foo', 'body': 'bar', 'userId': 1, 'id': 201}  # Same with original requests library

์บ์‹œ๋œ ์š”์ฒญ ํ•จ์ˆ˜

์ผ๋ฐ˜ requests.get/../delete ์š”์ฒญ๊ณผ ๋™์ผํ•˜์ง€๋งŒ ์บ์‹œ๋ฉ๋‹ˆ๋‹ค. ์ด๋•Œ ์บ์‹œ๋Š” ํ›„์ˆ ํ•  ๋น„๋™๊ธฐ์ ์ด๋ฉฐ ์บ์‹œ๋œ ์š”์ฒญ ํ•จ์ˆ˜์™€ ๊ณต์œ ๋ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ๊ฐ ๋ฉ”์†Œ๋“œ๋“ค๋ผ๋ฆฌ ๊ณต์œ ๋˜์ง€๋Š” ์•Š์Šต๋‹ˆ๋‹ค. ์•ž์— c๋ฅผ ๋ถ™์—ฌ requests.cget/coptions/chead/cpost/cput/cpatch/cdelete๋กœ ํ•จ์ˆ˜๋ฅผ ์ž‘์„ฑํ•ด ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๊ฐ™์€ URL์„ ๋ณด๋‚ด๋„ ๋‹ค๋ฅธ ๊ฒฐ๊ณผ๋ฅผ ์‘๋‹ตํ•  ์ˆ˜ ์žˆ๋Š” ๋™์ ์ธ ์„œ๋น„์Šค๋ฅผ ์‚ฌ์šฉํ•˜๊ฑฐ๋‚˜(์‹œ๊ฐ„์— ๋”ฐ๋ฅธ ์‘๋‹ต์˜ ๋ณ€ํ™”๋ฅผ ๋ฐ˜์˜ํ•˜์ง€ ์•Š์Œ) ์‘๋‹ต์˜ ํฌ๊ธฐ๊ฐ€ ํด ๊ฒฝ์šฐ(๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ๋‚ญ๋น„๋  ์ˆ˜ ์žˆ์Œ) ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค.

>>> # ๊ธฐ๊ธฐ ์‚ฌ์–‘๊ณผ ์ธํ„ฐ๋„ท ์—ฐ๊ฒฐ ํ’ˆ์งˆ์— ๋”ฐ๋ผ ๊ฒฐ๊ณผ๋Š” ๋‹ค๋ฅผ ์ˆ˜ ์žˆ์Œ
>>> import timeit
>>>
>>> timeit.timeit('requests.get("https://python.org")', number=10, setup='from resoup import requests')
1.1833231999917189 # ๊ธฐ๊ธฐ ์‚ฌ์–‘๊ณผ ์ธํ„ฐ๋„ท ์—ฐ๊ฒฐ ํ’ˆ์งˆ์— ๋”ฐ๋ผ ๋‹ค๋ฆ„: 10๋ฒˆ์˜ ์—ฐ๊ฒฐ ๋ชจ๋‘ request๋ฅผ ๋ณด๋ƒ„
>>> timeit.timeit('requests.cget("https://python.org")', number=10, setup='from resoup import requests')
0.10267569999268744 # : ์ฒ˜์Œ ํ•œ ๋ฒˆ๋งŒ request๋ฅผ ๋ณด๋‚ด๊ณ  ๊ทธ ๋’ค๋Š” ์บ์‹œ์—์„œ ๊ฐ’์„ ๋ถˆ๋Ÿฌ์˜ด

๋น„๋™๊ธฐ์ ์ธ ์š”์ฒญ ํ•จ์ˆ˜

๋น„๋™๊ธฐ์ ์ธ ์š”์ฒญ์„ ๋ณด๋ƒ…๋‹ˆ๋‹ค. ์•ž์— a๋ฅผ ๋ถ™์—ฌ requests.aget/aoptions/ahead/apost/aput/apatch/adelete๋กœ ํ•จ์ˆ˜๋ฅผ ์ž‘์„ฑํ•ฉ๋‹ˆ๋‹ค.

run_in_executer๋Š” ๊ธฐ๋ณธ์ ์œผ๋กœ ์ผœ์ ธ ์žˆ์Šต๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ ์•„๋ž˜์˜ run_in_executer ์‚ฌ์šฉ์„ ์ฐธ๊ณ ํ•˜์„ธ์š”.

>>> import asyncio
>>> 
>>> from resoup import requests
>>>
>>> res = asyncio.run(requests.aget('https://python.org'))
>>> res
<response [200]>

๋น„๋™๊ธฐ์ ์ด๋ฉฐ ์บ์‹œ๋œ ์š”์ฒญ ํ•จ์ˆ˜

๋น„๋™๊ธฐ์ ์ด๋ฉฐ ์บ์‹œ๋˜๋Š” ์š”์ฒญ์ž…๋‹ˆ๋‹ค. ์ด๋•Œ ์บ์‹œ๋Š” ๊ฐ™์€ ๋ฉ”์†Œ๋“œ๋ผ๋ฉด ์บ์‹œ๋œ ์š”์ฒญ ํ•จ์ˆ˜์™€ ๊ณต์œ ๋ฉ๋‹ˆ๋‹ค. ์•ž์— ac๋ฅผ ๋ถ™์—ฌ requests.acget/acoptions/achead/acpost/acput/acpatch/acdelete๋กœ ํ•จ์ˆ˜๋ฅผ ์ž‘์„ฑํ•ฉ๋‹ˆ๋‹ค.

๊ฐ™์€ URL์„ ๋ณด๋‚ด๋„ ๋‹ค๋ฅธ ๊ฒฐ๊ณผ๋ฅผ ์‘๋‹ตํ•  ์ˆ˜ ์žˆ๋Š” ๋™์ ์ธ ์„œ๋น„์Šค๋ฅผ ์‚ฌ์šฉํ•˜๊ฑฐ๋‚˜(์‹œ๊ฐ„์— ๋”ฐ๋ฅธ ์‘๋‹ต์˜ ๋ณ€ํ™”๋ฅผ ๋ฐ˜์˜ํ•˜์ง€ ์•Š์Œ) ์‘๋‹ต์˜ ํฌ๊ธฐ๊ฐ€ ํด ๊ฒฝ์šฐ(๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ๋‚ญ๋น„๋  ์ˆ˜ ์žˆ์Œ) ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค.

run_in_executer๋Š” ๊ธฐ๋ณธ์ ์œผ๋กœ ์ผœ์ ธ ์žˆ์Šต๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ ์•„๋ž˜์˜ run_in_executer ์‚ฌ์šฉ์„ ์ฐธ๊ณ ํ•˜์„ธ์š”.

>>> import asyncio
>>> import timeit
>>>
>>> timeit.timeit('asyncio.run(requests.aget("https://python.org"))', number=10, setup='from resoup import requests; import asyncio')
0.8676127000362612 # ๊ธฐ๊ธฐ ์‚ฌ์–‘๊ณผ ์ธํ„ฐ๋„ท ์—ฐ๊ฒฐ ํ’ˆ์งˆ์— ๋”ฐ๋ผ ๋‹ค๋ฆ„: 10๋ฒˆ์˜ ์—ฐ๊ฒฐ ๋ชจ๋‘ request๋ฅผ ๋ณด๋ƒ„
>>> timeit.timeit('asyncio.run(requests.acget("https://python.org"))', number=10, setup='from resoup import requests; import asyncio')
0.11984489997848868 # ์ฒ˜์Œ ํ•œ ๋ฒˆ๋งŒ request๋ฅผ ๋ณด๋‚ด๊ณ  ๊ทธ ๋’ค๋Š” ์บ์‹œ๋ฅผ ๋ถˆ๋Ÿฌ์˜ด

run_in_executer ์‚ฌ์šฉ

๋น„๋™๊ธฐ์ ์ธ ์š”์ฒญ(aget, acget ๋“ฑ a๊ฐ€ ๋ถ™์€ ๋ฉ”์†Œ๋“œ)์—์„œ๋Š” run_in_executer parameter๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด parameter๋Š” ํ•จ์ˆ˜๊ฐ€ ๋‹ค๋ฅธ ์“ฐ๋ ˆ๋“œ์—์„œ ๋Œ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. ์ˆœ์ฐจ์ ์œผ๋กœ ํ”„๋กœ๊ทธ๋žจ์ด ๋™์ž‘ํ•  ๋•Œ์—๋Š” ํฐ ์ฐจ์ด๊ฐ€ ์—†์ง€๋งŒ ๋ณ‘๋ ฌ์ ์œผ๋กœ ํ”„๋กœ๊ทธ๋žจ์„ ๋Œ๋ฆด ๋•Œ ํฐ ์†๋„ ํ–ฅ์ƒ์„ ๊ธฐ๋Œ€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์•„๋ž˜์™€ ๊ฐ™์ด asyncio.gather๋ฅผ ์ด์šฉํ•˜๋ฉด ํฐ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋ณด์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

import asyncio
import time

from resoup import requests

async def masure_coroutine_time(coroutine):
    start = time.perf_counter()
    await coroutine
    end = time.perf_counter()

    print(end - start)

async def main():
    # ๋‹จ์ผ request๋ฅผ ๋ณด๋‚ผ ๋•Œ(ํฐ ์ฐจ์ด ์—†์Œ)

    req = requests.aget('https://python.org', run_in_executor=False)
    await masure_coroutine_time(req)  # 0.07465070000034757

    req = requests.aget('https://python.org')
    await masure_coroutine_time(req)  # 0.05844969999452587

    # ์—ฌ๋Ÿฌ request๋ฅผ ๋ณด๋‚ผ ๋•Œ(ํฐ ์†๋„ ํ–ฅ์ƒ์„ ๋ณด์ž„)

    reqs = (requests.aget(f'https://python.org/{i}', run_in_executor=False) for i in range(10))  # ๋”๋ฏธ url์„ ๋งŒ๋“ฆ
    await masure_coroutine_time(asyncio.gather(*reqs))  # run_in_executor๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š์„ ๋•Œ: ๋Š๋ฆผ(3.7874760999984574)

    reqs = (requests.aget(f'https://python.org/{i}') for i in range(10))  # ๋”๋ฏธ url์„ ๋งŒ๋“ฆ
    await masure_coroutine_time(asyncio.gather(*reqs))  # run_in_executor๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ(๊ธฐ๋ณธ๊ฐ’): ๋น ๋ฆ„(0.11582900000212248)

if __name__ == '__main__':
    asyncio.run(main())

requests ๋ชจ๋“ˆ๊ณผ ํ˜ธํ™˜๋˜์ง€ ์•Š๋Š” ๋ถ€๋ถ„

์ด ๋ชจ๋“ˆ์€ requests ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์™€ ๊ฑฐ์˜ ๋ชจ๋“  ๋ถ€๋ถ„์—์„œ ํ˜ธํ™˜๋˜์ง€๋งŒ ํ˜ธํ™˜๋˜์ง€ ์•Š๋Š” ๋ถ€๋ถ„์ด ๋ช‡ ๊ฐ€์ง€ ์žˆ์Šต๋‹ˆ๋‹ค.

dunder method(__dunder__)

์ž ์ •์  ๋ฒ„๊ทธ์˜ ์ด์œ ๊ฐ€ ๋  ์ˆ˜ ์žˆ๋‹ค๋Š” ์ด์œ  ํ˜น์€ ๊ธฐ์ˆ ์ ์ธ ์ด์œ ๋กœ ์ผ๋ถ€ dunder method๋Š” ๋ถˆ๋Ÿฌ์™€์ง€์ง€ ์•Š๊ฑฐ๋‚˜ ํ˜ธํ™˜๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

์‚ฌ์šฉํ•  ์ˆ˜ ์—†๊ฑฐ๋‚˜ requests ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์™€ ์ผ์น˜ํ•˜์ง€ ์•Š๋Š” dunder method: __builtins__, __cached__, __doc__, __file__, __loader__, __name__, __package__, __spec__

์‚ฌ์šฉ ๊ฐ€๋Šฅํ•˜๊ณ  requests ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์™€ ์ผ์น˜ํ•˜๋Š” dunder method: __author__, __author_email__, __build__, __cake__, __copyright__, __description__, __license__, __title__, __url__, __version__

>>> import requests
>>> requests.__name__
'requests'
>>> requests.__path__
['some path']
>>> requests.__cake__
'โœจ ๐Ÿฐ โœจ'
>>>
>>> from resoup import requests
>>> requests.__name__  # ํ˜ธํ™˜๋˜์ง€ ์•Š๋Š” dunder method
'resoup.requests_proxy'  # requests์™€ ๊ฐ’์ด ๋‹ค๋ฆ„
>>> requests.__path__ # ์‚ฌ์šฉํ•  ์ˆ˜ ์—†๊ณ  ํ˜ธํ™˜๋˜์ง€ ์•Š๋Š” dunder method
AttributeError: module 'resoup.requests_' has no attribute '__path__'
>>> requests.__cake__  # ํ˜ธํ™˜๋˜๋Š” dunder method
'โœจ ๐Ÿฐ โœจ'
import

resoup.requests๋Š” ๊ฑฐ์˜ ๋ชจ๋“  ๊ฒฝ์šฐ์—์„œ import ๊ด€๋ จ ํ˜ธํ™˜์„ฑ์ด ์œ ์ง€๋ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ import์™€ ๊ด€๋ จํ•ด์„œ๋Š” ๋ช‡ ๊ฐ€์ง€ ๊ทœ์น™์ด ์กด์žฌํ•ฉ๋‹ˆ๋‹ค.

resoup.requests๋Š” from resoup import requests์˜ ํ˜•ํƒœ๋กœ๋งŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

# ๊ฐ ๋ผ์ธ์—์„œ ์œ—์ค„๊ณผ ์•„๋žซ์ค„์€ ๊ฐ๊ฐ requests๋ฅผ import ํ•  ๋•Œ์™€ `resoup.requests`๋ฅผ importํ•  ๋•Œ๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.

# requests ๋ชจ๋“ˆ import
import requests
from resoup import requests  # ๊ฐ€๋Šฅ

๋”ฐ๋ผ์„œ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ฒฝ์šฐ๋Š” resoup.requests์—์„œ import๊ฐ€ ๋ถˆ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

# requests์˜ ํ•˜์œ„ ๋ชจ๋“ˆ import
import requests.models  # ๊ฐ€๋Šฅ
import resoup.requests.models  # ๋ถˆ๊ฐ€๋Šฅ!

# requests์˜ ํ•˜์œ„ ๋ชจ๋“ˆ import (w/ from .. import ...)
from request import models  # ๊ฐ€๋Šฅ
from resoup.requests import models  # ๋ถˆ๊ฐ€๋Šฅ!

# requests์˜ ํ•˜์œ„ ๋ชจ๋“ˆ์˜ ํ•˜์œ„ ๊ตฌ์„ฑ ์š”์†Œ import
from request.models import Response  # ๊ฐ€๋Šฅ
from resoup.requests.models import Response  # ๋ถˆ๊ฐ€๋Šฅ!

์ด๋Ÿฐ ๊ฒฝ์šฐ์—” ๋ชจ๋“ˆ import๋ฅผ ์ด์šฉํ•˜๋ฉด ํ•ด๊ฒฐ๋ฉ๋‹ˆ๋‹ค..

์˜ˆ๋ฅผ ๋“ค์–ด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ฝ”๋“œ๊ฐ€ ์žˆ๋‹ค๊ณ  ํ•ด ๋ด…์‹œ๋‹ค.

from request.models import Response  # ํ•˜์œ„ ๋ชจ๋“ˆ์˜ ํ•˜์œ„ ๊ตฌ์„ฑ ์š”์†Œ import ์‚ฌ์šฉ

def is_response(instance):
    return isinstance(instance, Response)

์ด ์ฝ”๋“œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

# requests.models.Response๋กœ ๋ฐ”๊พธ๊ธฐ.
# ์žฅ์ : ๊น”๋”ํ•˜๊ณ  error-proneํ•˜์ง€ ์•Š์Œ.
from resoup import requests  # requests ๋ชจ๋“ˆ import
def is_response(instance):
    return isinstance(instance, requests.models.Response)  # requests.models.Response๋กœ ๋ณ€๊ฒฝํ•จ
# Response ์ •์˜ํ•˜๊ธฐ.
# ์žฅ์ : ์ฝ”๋“œ๋ฅผ ์ˆ˜์ •ํ•  ํ•„์š”๊ฐ€ ์—†์Œ.
from resoup import requests
Response = requests.models.Response

def is_response(instance):
    return isinstance(instance, Response)

๊ฐœ์ธ์˜ ์„ ํ˜ธ์— ๋”ฐ๋ผ ์›ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ์‚ฌ์šฉํ•˜์‹œ๋ฉด ๋ฉ๋‹ˆ๋‹ค.

ResponseProxy

ResponseProxy๋Š” ์ด ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์—์„œ requests.get/options/head/post/put/patch/delete๋ฅผ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ์˜ ๋ฆฌํ„ด๊ฐ’์ž…๋‹ˆ๋‹ค. ๊ธฐ์กด Response์™€ 100% ํ˜ธํ™˜๋˜๋ฉด์„œ๋„ ์ถ”๊ฐ€์ ์ธ ํ•จ์ˆ˜ 6๊ฐœ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

ํ˜ธํ™˜์„ฑ

์ด ํŒŒํŠธ์—์„œ๋Š” ์ฃผ์„์— ๋‚ด์šฉ์„ ์ ์—ˆ์Šต๋‹ˆ๋‹ค.

>>> # ๋‘ ๋ชจ๋“ˆ์„ ๋™์‹œ์— ์‚ฌ์šฉํ•ด์•ผ ํ•˜๋‹ˆ ์ด๋ฆ„์„ ๋ณ€๊ฒฝํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.
>>> import requests as orginal_requests
>>> from resoup import requests as utils_requsts
>>>
>>> # requests ๋ชจ๋“ˆ์€ Response๋ฅผ ์‘๋‹ตํ•ฉ๋‹ˆ๋‹ค.
>>> response1 = orginal_requests.get("https://peps.python.org/pep-0020/")  # ์ •์ ์ธ ์›น์‚ฌ์ดํŠธ
>>> print(response1)
<Response [200]>
>>> print(type(response1))  # Response ๊ฐ์ฒด
<class 'requests.models.Response'>
>>> # resoup.requests๋ชจ๋“ˆ์€ ResponseProxy๋ฅผ ์‘๋‹ตํ•ฉ๋‹ˆ๋‹ค.
>>> response2 = utils_requsts.get("https://peps.python.org/pep-0020/")
>>> print(response2)
<Response [200]>
>>> print(type(response2))  # ResponseProxy ๊ฐ์ฒด
<class 'resoup.response_proxy.ResponseProxy'>
>>>
>>> # ๋‹ค์Œ์˜ ๋ชจ๋“  ๊ฒ€์‚ฌ๋“ค์„ ํ†ต๊ณผํ•ฉ๋‹ˆ๋‹ค.
>>> assert response1.text == response2.text
>>> assert response1.status_code == response2.status_code
>>> assert response1.url == response2.url
>>> assert response1.content == response2.content
>>>
>>> # ํ•˜์ง€๋งŒ RequestsProxy์—๋Š” ์ด๋Ÿฌํ•œ ์ถ”๊ฐ€์ ์ธ ๊ธฐ๋Šฅ๋“ค์ด ์กด์žฌํ•ฉ๋‹ˆ๋‹ค.
>>> print(response2.soup())
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8"/>
...
<script src="../_static/wrap_tables.js"></script>
<script src="../_static/sticky_banner.js"></script>
</body>
</html>
>>> print(response2.soup_select('title'))
[<title>PEP 20 โ€“ The Zen of Python | peps.python.org</title>, <title>Following system colour scheme</title>, <title>Selected dark colour scheme</title>, <title>Selected light colour scheme</title>]
>>> print(response2.soup_select_one('p', no_empty_result=True).text)
Long time Pythoneer Tim Peters succinctly channels the BDFLโ€™s guiding
principles for Pythonโ€™s design into 20 aphorisms, only 19 of which
have been written down.
>>>
>>> from requests.models import Response
>>> # RequestsProxy๋Š” Requsests์˜ subclass์ž…๋‹ˆ๋‹ค.
>>> # ๋”ฐ๋ผ์„œ isinstance ๊ฒ€์‚ฌ๋ฅผ ํ†ต๊ณผํ•ฉ๋‹ˆ๋‹ค.
>>> isinstance(response2, Response)
True
>>> # ๋ฌผ๋ก  subclass์ด๊ธฐ ๋•Œ๋ฌธ์— '==' ๊ฒ€์‚ฌ๋Š” ํ†ต๊ณผํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
>>> type(response1) == type(response2)
False

๊ธฐ๋ณธ ๊ตฌ์กฐ

ResponseProxy์—๋Š” ์—ฌ๋Ÿฌ ๋ชจ๋“ˆ๋“ค์ด ์žˆ์œผ๋ฉฐ, ํฌ๊ฒŒ ์„ธ ๊ฐ€์ง€ ์ข…๋ฅ˜๋กœ ๋ถ„๋ฅ˜๋ฉ๋‹ˆ๋‹ค.

  • soup๋ฅ˜: .soup(), .soup_select(), .soup_select_one() ๊ธฐ๋ณธ์ ์ธ ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค.
  • xml๋ฅ˜: .xml(), .xml_select(), .xml_select_one() soup๋ฅ˜์—์„œ parser๊ฐ€ 'xml'์ธ ๊ฒฝ์šฐ์ž…๋‹ˆ๋‹ค.

๊ฐ๊ฐ์˜ ์ข…๋ฅ˜์—๋Š” ์„ธ ๊ฐ€์ง€ ํ•จ์ˆ˜๊ฐ€ ์žˆ์œผ๋ฉฐ ํ•จ์ˆ˜ ๊ฐ๊ฐ์˜ ๊ธฐ๋Šฅ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  • .soup()/.xml(): BeatifulSoup๋กœ ํ•ด์„๋œ ์ฝ”๋“œ๊ฐ€ ๋‚˜์˜ต๋‹ˆ๋‹ค.
  • .soup_select()/.xml_select(): .soup().select()์™€ ๋น„์Šทํ•ฉ๋‹ˆ๋‹ค.
  • .soup_select_one()/.xml_select_one(): .soup().select_one()๊ณผ ๋น„์Šทํ•ฉ๋‹ˆ๋‹ค.

์ž์„ธํ•œ ๋‚ด์šฉ์€ ์•„๋ž˜๋ฅผ ์‚ดํŽด๋ณด์„ธ์š”.

.soup()

.soup()๋Š” ํ…์ŠคํŠธ๋‚˜ response๋ฅผ ๋ฐ›์•„ BeatifulSoup๋กœ ๋‚ด๋ณด๋ƒ…๋‹ˆ๋‹ค.

์ด๋•Œ ์ธ์ž๋Š” response์™€ response.text ๋ชจ๋‘ ๊ฐ€๋Šฅํ•˜์ง€๋งŒ response๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์„ ๊ถŒํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋ฉด ๋”์šฑ ์ƒ์„ธํ•œ ์˜ค๋ฅ˜ ๋ฉ”์‹œ์ง€๋ฅผ ๋ฐ›์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

>>> from resoup import requests
>>>
>>> response = requests.get("https://python.org")
>>> response.soup()  # BeatifulSoup์—์„œ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๋ชจ๋“  parameter ์‚ฌ์šฉ ๊ฐ€๋Šฅ
<!DOCTYPE html>
...
</body>
</html>

์ด ํ•จ์ˆ˜๋Š” ์‚ฌ์‹ค์ƒ BeatifulSoup๋ฅผ ํ†ต๊ณผ์‹œํ‚ค๋Š” ๊ฒƒ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. ์•„๋ž˜์˜ ์ฝ”๋“œ๋Š” ์œ„์˜ ์ฝ”๋“œ์™€ ๊ฑฐ์˜ ๊ฐ™์Šต๋‹ˆ๋‹ค.

>>> import requests
>>> from bs4 import BeautifulSoup
>>>
>>> response = requests.get("https://python.org")
>>> BeautifulSoup(response.text)
<!DOCTYPE html>
<!DOCTYPE html>
...
</body>
</html>

parser๊ฐ€ ์—†์„ ๊ฒฝ์šฐ BeatifulSoup๋Š” FeatureNotFound์—๋Ÿฌ๊ฐ€ ๋‚˜์˜ค์ง€๋งŒ .soup()๋Š” NoParserError๊ฐ€ ๋‚˜์˜ต๋‹ˆ๋‹ค.

.soup_select()

.soup_select()๋Š” ํ…์ŠคํŠธ๋‚˜ response๋ฅผ ๋ฐ›์•„ BeatifulSoup์˜ Tag๋กœ ๋‚ด๋ณด๋ƒ…๋‹ˆ๋‹ค. selector parameter๋Š” CSS ์„ ํƒ์ž๋ฅผ ๋ฐ›์Šต๋‹ˆ๋‹ค.

>>> from resoup import requests
>>>
>>> response = requests.get("https://python.org")
>>> response.soup_select("p")
[<p><strong>Notice:</strong> While JavaScript is not essential for this website
...]

์•„๋ž˜์˜ ์ฝ”๋“œ๋Š” ์œ„์˜ ์ฝ”๋“œ์™€ ์œ ์‚ฌํ•˜๊ฒŒ ๋™์ž‘ํ•ฉ๋‹ˆ๋‹ค.

>>> import requests
>>> from bs4 import BeautifulSoup
>>>
>>> response = requests.get('https://python.org')
>>> soup = BeautifulSoup(response.text).select('p')
>>> soup
[<p><strong>Notice:</strong> While JavaScript is not essential for this website
...]

์ด ํ•จ์ˆ˜์˜ ๋…ํŠนํ•œ ์ ์€, no_empty_result๋ผ๋Š” parameter์˜ ์กด์žฌ์ž…๋‹ˆ๋‹ค. ์ด parameter๊ฐ€ True์ด๋ฉด .select()์˜ ๊ฒฐ๊ณผ๊ฐ€ ๋นˆ ๋ฆฌ์ŠคํŠธ์ผ๋•Œ EmptyResultError๋ฅผ ๋ƒ…๋‹ˆ๋‹ค.

>>> from resoup import requests
>>>
>>> response = requests.get("https://python.org")
>>> response.soup_select("data-some-complex-and-error-prone-selector")
[]
>>>
>>> response = requests.get("https://python.org")
>>> response.soup_select(
...     "data-some-complex-and-error-prone-selector",
...     no_empty_result=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "...souptools.py", line 148, in soup_select
    raise EmptyResultError(
resoup.exceptions.EmptyResultError: Result of select is empty list("[]"). This error happens probably because of invalid selector or URL. Check if both selector and URL are valid. Set to False `no_empty_result` if empty list is intended. It may also because of selector is not matched with URL.
selector: data-some-complex-and-error-prone-selector, URL: https://www.python.org/

์ด ํ•จ์ˆ˜๋ฅผ ๊ธฐ๋ณธ์ ์œผ๋กœ BroadcastList๋ฅผ ์ถœ๋ ฅ๊ฐ’์œผ๋กœ ์„ค์ •ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. BroadcastList์— ๋Œ€ํ•ด ์ž์„ธํžˆ ์•Œ๊ณ  ์‹ถ๋‹ค๋ฉด ์•„๋ž˜์˜ BroadcastList ํ•ญ๋ชฉ์„ ํ™•์ธํ•ด ๋ณด์„ธ์š”.

.soup_select_one()

.soup_select_one()๋Š” ํ…์ŠคํŠธ๋‚˜ response๋ฅผ ๋ฐ›์•„ BeatifulSoup์˜ Tag๋กœ ๋‚ด๋ณด๋ƒ…๋‹ˆ๋‹ค. selector parameter๋Š” CSS ์„ ํƒ์ž๋ฅผ ๋ฐ›์Šต๋‹ˆ๋‹ค.

>>> from resoup import requests
>>>
>>> response = requests.get('https://python.org')
>>> response.soup_select_one('p strong', no_empty_result=True)
<strong>Notice:</strong>

์•„๋ž˜์˜ ์ฝ”๋“œ๋Š” ์œ„์˜ ์ฝ”๋“œ์™€ ์œ ์‚ฌํ•˜๊ฒŒ ๋™์ž‘ํ•ฉ๋‹ˆ๋‹ค.

>>> import requests
>>> from bs4 import BeautifulSoup
>>>
>>> response = requests.get('https://python.org')
>>> soup = BeautifulSoup(response.text, 'html.parser').select('p strong')
>>> if soup is None:  # no_empty_result ๊ด€๋ จ ํ™•์ธ ์ฝ”๋“œ
...     raise Exception
...
>>> soup
<strong>Notice:</strong>

no_empty_result parameter๊ฐ€ True์ด๋ฉด .select_one()์˜ ๊ฒฐ๊ณผ๊ฐ€ None์ผ๋•Œ EmptyResultError๋ฅผ ๋ƒ…๋‹ˆ๋‹ค.

์ด ๊ธฐ๋Šฅ์€ ํƒ€์ž… ํžŒํŠธ์—์„œ๋„ ์œ ์šฉํ•˜๊ฒŒ ์“ฐ์ผ ์ˆ˜ ์žˆ๊ณ , ์˜ค๋ฅ˜๋ฅผ ๋” ๋ช…ํ™•ํžˆ ํ•˜๋Š” ๋ฐ์—๋„ ๋„์›€์„ ์ค๋‹ˆ๋‹ค.

๊ธฐ์กด BeatifulSoup์—์„œ๋Š” .select_one()์˜ ๋ฆฌํ„ด๊ฐ’์„ Tag | None์œผ๋กœ ํ‘œ์‹œํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— ๋งŒ์•ฝ .select_one().text์™€ ๊ฐ™์€ ์ฝ”๋“œ๋ฅผ ์‚ฌ์šฉํ•˜๋ ค๊ณ  ํ•˜๋ฉด ์ •์  ํƒ€์ž… ๊ฒ€์‚ฌ ๋„๊ตฌ๋“ค์—์„œ ์˜ค๋ฅ˜๋ฅผ ๋ฐœ์ƒ์‹œ์ผฐ์Šต๋‹ˆ๋‹ค.

ํŠนํžˆ .select_one()์˜ ๊ฒฐ๊ณผ๊ฐ€ None์ด ๋˜๋ฉด 'NoneType' object has no attribute 'text'๋ผ๋Š” ์–ด๋–ค ๋ถ€๋ถ„์—์„œ ์˜ค๋ฅ˜๊ฐ€ ๋‚ฌ๋Š”์ง€ ํ•œ๋ˆˆ์— ํ™•์ธํ•˜๊ธฐ ํž˜๋“  ์˜ค๋ฅ˜ ๋ฉ”์‹œ์ง€๊ฐ€ ๋‚˜์™”์Šต๋‹ˆ๋‹ค.

no_empty_result๋ฅผ ์ด์šฉํ•˜๋ฉด ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋“ค์„ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. no_empty_result๋ฅผ True๋กœ ํ•˜๋ฉด ํƒ€์ž… ๊ฒ€์‚ฌ ๋„๊ตฌ๋“ค๋„ ์กฐ์šฉํ•ด์ง€๊ณ , ํ˜น์‹œ๋ผ๋„ None์ด ๊ฒฐ๊ณผ๊ฐ’์ด ๋  ๋•Œ ๋Œ€์‹  ํ›จ์”ฌ ๋” ์ž์„ธํ•˜๋ฉฐ ํ•ด๊ฒฐ์ฑ…์„ ํฌํ•จํ•œ ์˜ค๋ฅ˜ ๋ฉ”์‹œ์ง€๋ฅผ ๋งŒ๋“ค์–ด ๋ƒ…๋‹ˆ๋‹ค.

>>> from resoup import requests
>>>
>>> response = requests.get("https://python.org")
>>> print(response.soup_select_one("data-some-complex-and-error-prone-selector"))
None  # ์‹ค์ œ๋กœ None์ด ๊ฒฐ๊ณผ๊ฐ’์œผ๋กœ ๋‚˜์˜ค์ง„ ์•Š๊ณ  ๊ทธ๋ƒฅ ์กฐ์šฉํžˆ ์ข…๋ฃŒ๋จ.
>>>
>>> response = requests.get("https://python.org")
>>> response.soup_select_one(
...     "data-some-complex-and-error-prone-selector",
...     no_empty_result=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "...souptools.py", line 220, in soup_select_one
    raise EmptyResultError(
resoup.exceptions.EmptyResultError: Result of select_one is None. This error happens probably because of invalid selector or URL. Check if both selector and URL are valid. Set to False `no_empty_result` if empty list is intended. It may also because of selector is not matched with URL.  
selector: data-some-complex-and-error-prone-selector, URL: https://www.python.org/

xml ๊ด€๋ จ ํ•จ์ˆ˜

ResponseProxy์˜ soup ๊ด€๋ จ ํ•จ์ˆ˜์—์„œ soup๋ฅผ xml๋กœ ์น˜ํ™˜ํ•˜๋ฉด xml ํ•จ์ˆ˜๊ฐ€ ๋ฉ๋‹ˆ๋‹ค.

์ด ํ•จ์ˆ˜๋“ค์€ parser๊ฐ€ 'xml'์ด๋ผ๋Š” ์ ์„ ์ œ์™ธํ•˜๊ณ ๋Š” soup์™€ ์ฐจ์ด์ ์ด ์—†์Šต๋‹ˆ๋‹ค.

์˜ˆ์‹œ ์ฝ”๋“œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค

>>> from resoup import requests
>>>
>>> response = requests.get('https://www.w3schools.com/xml/plant_catalog.xml')
>>> selected = response.xml_select('LIGHT', no_empty_result=True)
>>> selected
[<LIGHT>Mostly Shady</LIGHT>, <LIGHT>Mostly Shady</LIGHT>, <LIGHT>Mostly Sunny</LIGHT>, <LIGHT>Mostly Shady</LIGHT>, <LIGHT>Mostly Shady</LIGHT>, <LIGHT>Mostly Shady</LIGHT>, <LIGHT>Mostly Shady</LIGHT>, <LIGHT>Mostly Shady</LIGHT>, <LIGHT>Mostly Shady</LIGHT>, <LIGHT>Mostly Shady</LIGHT>, <LIGHT>Sun or Shade</LIGHT>, <LIGHT>Sun or Shade</LIGHT>, <LIGHT>Mostly Shady</LIGHT>, <LIGHT>Sun or Shade</LIGHT>, <LIGHT>Sun or Shade</LIGHT>, <LIGHT>Shade</LIGHT>, <LIGHT>Shade</LIGHT>, <LIGHT>Shade</LIGHT>, <LIGHT>Mostly Shady</LIGHT>, <LIGHT>Mostly Shady</LIGHT>, <LIGHT>Shade</LIGHT>, <LIGHT>Shade</LIGHT>, <LIGHT>Sunny</LIGHT>, <LIGHT>Shade</LIGHT>, <LIGHT>Shade</LIGHT>, <LIGHT>Sunny</LIGHT>, <LIGHT>Shade</LIGHT>, <LIGHT>Sunny</LIGHT>, <LIGHT>Sun or Shade</LIGHT>, <LIGHT>Sun or Shade</LIGHT>, <LIGHT>Shade</LIGHT>, <LIGHT>Shade</LIGHT>, <LIGHT>Sun</LIGHT>, <LIGHT>Mostly Shady</LIGHT>, <LIGHT>Shade</LIGHT>, <LIGHT>Shade</LIGHT>]

์œ„์˜ ์ฝ”๋“œ๋Š” ์•„๋ž˜์˜ ์ฝ”๋“œ์™€ ๊ฑฐ์˜ ๊ฐ™์Šต๋‹ˆ๋‹ค.

>>> from resoup import requests
>>> from functools import partial
>>>
>>> response = requests.get('https://www.w3schools.com/xml/plant_catalog.xml')
>>> # corespond to `.xml_select()`
>>> xml_select_partial = partial(response.soup_select, parser='xml')
>>> selected = xml_select_partial('LIGHT', no_empty_result=True)
>>> selected
[<LIGHT>Mostly Shady</LIGHT>, <LIGHT>Mostly Shady</LIGHT>, <LIGHT>Mostly Sunny</LIGHT>, <LIGHT>Mostly Shady</LIGHT>, <LIGHT>Mostly Shady</LIGHT>, <LIGHT>Mostly Shady</LIGHT>, <LIGHT>Mostly Shady</LIGHT>, <LIGHT>Mostly Shady</LIGHT>, <LIGHT>Mostly Shady</LIGHT>, <LIGHT>Mostly Shady</LIGHT>, <LIGHT>Sun or Shade</LIGHT>, <LIGHT>Sun or Shade</LIGHT>, <LIGHT>Mostly Shady</LIGHT>, <LIGHT>Sun or Shade</LIGHT>, <LIGHT>Sun or Shade</LIGHT>, <LIGHT>Shade</LIGHT>, <LIGHT>Shade</LIGHT>, <LIGHT>Shade</LIGHT>, <LIGHT>Mostly Shady</LIGHT>, <LIGHT>Mostly Shady</LIGHT>, <LIGHT>Shade</LIGHT>, <LIGHT>Shade</LIGHT>, <LIGHT>Sunny</LIGHT>, <LIGHT>Shade</LIGHT>, <LIGHT>Shade</LIGHT>, <LIGHT>Sunny</LIGHT>, <LIGHT>Shade</LIGHT>, <LIGHT>Sunny</LIGHT>, <LIGHT>Sun or Shade</LIGHT>, <LIGHT>Sun or Shade</LIGHT>, <LIGHT>Shade</LIGHT>, <LIGHT>Shade</LIGHT>, <LIGHT>Sun</LIGHT>, <LIGHT>Mostly Shady</LIGHT>, <LIGHT>Shade</LIGHT>, <LIGHT>Shade</LIGHT>]

BroadcastList

.soup_select()์™€ .xml_select()์˜ ๊ฒฝ์šฐ์—๋Š” ๋ฆฌ์ŠคํŠธ๋ฅผ ๊ฐ’์œผ๋กœ ๋‚ด๋ณด๋ƒ…๋‹ˆ๋‹ค. ์ด๋Š” .soup()๋‚˜ .soup_select_one()์—์„œ ๊ธฐ๋Œ€ํ•  ์ˆ˜ ์žˆ๋Š” .text์™€ ๊ฐ™์€ ํŒŒ๋ผ๋ฏธํ„ฐ ์‚ฌ์šฉ์„ ์–ด๋ ต๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.

์ด๋Š” for loop๋‚˜ ๋ฆฌ์ŠคํŠธ ์ปดํ”„๋ฆฌํ—จ์…˜์œผ๋กœ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

>>> from resoup import requests
>>> tags_list = requests.get("https://python.org").soup_select("p strong")
>>> [element.text for element in tags_list]
['Notice:', 'relaunched community-run job board']

ํ•˜์ง€๋งŒ ์ด๊ฒƒ์ด ๋งˆ์Œ์— ๋“ค์ง€ ์•Š์„ ์ˆ˜๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ ๊ฐœ๋ฐœ ์ค‘์ด๋ผ๋ฉด ๋น ๋ฅธ ๊ฐœ๋ฐœ ์†๋„๋ฅผ ์œ„ํ•ด for loop๋‚˜ ๋ฆฌ์ŠคํŠธ ์ปดํ”„๋ฆฌํ—จ์…˜์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ ์™ธ์— ๋” ์‹ ์†ํ•˜๊ฒŒ .text ๋“ฑ์„ ์ ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๊ณ ๋ คํ•˜๊ณ  ์‹ถ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ด ํ”„๋กœ์ ํŠธ์˜ .soup_select()์˜ ๊ธฐ๋ณธ ๋ฆฌํ„ด๊ฐ’์œผ๋กœ ์„ค์ •๋œ BroadcastList๋Š” ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ๋ฐฉํŽธ์ž…๋‹ˆ๋‹ค.

BroadcastList์—์„œ๋Š” ๋ฆฌ์ŠคํŠธ๋ฅผ ํ†ตํ•ด ์ง์ ‘ Tag์—์„œ ์‚ฌ์šฉ๋˜๋Š” ์†์„ฑ์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

>>> from resoup import requests
>>> tags_list = requests.get("https://python.org").soup_select("p strong")
>>> tags_list
[<strong>Notice:</strong>, <strong>relaunched community-run job board</strong>]
>>> type(tags_list)
<class 'resoup.broadcast_list.TagBroadcastList'>  # BroadcastList๊ฐ€ ์‚ฌ์šฉ๋จ
>>> tags_list.text  # ๋ธŒ๋กœ๋“œ์บ์ŠคํŒ…
['Notice:', 'relaunched community-run job board']
>>>
>>> tags_list_with_no_broadcast_list = requests.get('https://python.org').soup_select('p', use_broadcast_list=False)
>>> type(tags_list_with_no_broadcast_list)
<class 'bs4.element.ResultSet'>  # BroadcastList๊ฐ€ ์‚ฌ์šฉ๋˜์ง€ ์•Š์Œ
>>> tags_list_with_no_broadcast_list.text
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "...element.py", line 2428, in __getattr__
    raise AttributeError(
AttributeError: ResultSet object has no attribute 'text'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?

BroadcastList๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด ๋Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

>>> from resoup import requests
>>>
>>> tags_list = requests.get("https://python.org").soup_select("p", use_broadcase_list=False)
>>> type(tags_list)
bs4.element.ResultSet
>>> tags_list.text  # ๋ธŒ๋กœ๋“œ์บ์ŠคํŒ… ์•ˆ ๋จ
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "...element.py", line 2428, in __getattr__
    raise AttributeError(
AttributeError: ResultSet object has no attribute 'text'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?

ํŠน๋ณ„ํ•œ ํ˜•ํƒœ์˜ ๋ฆฌ์ŠคํŠธ getitem

BroadCastList์—์„œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํŠน์ดํ•œ ๊ธฐ๋Šฅ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

๋งŒ์•ฝ ๋ฆฌ์ŠคํŠธ์— ์ •์ˆ˜๋‚˜ ์Šฌ๋ผ์ด์Šค๋กœ getitem์„ ์š”์ฒญํ•œ๋‹ค๋ฉด ์ผ๋ฐ˜์ ์ธ ๋ฆฌ์ŠคํŠธ์˜ ์—ญํ• ์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

>>> from resoup import requests
>>> # ๊ฐ’ ๋ถˆ๋Ÿฌ์˜ด()
>>> tag_broadcast_list = requests.cget("https://www.python.org/community/logos/").soup_select("img")
>>> tag_broadcast_list
[<img alt="Python Software Foundation" class="psf-logo" src="/static/img/psf-logo.png"/>,
...
<img alt="Logo device only" src="https://s3.dualstack.us-east-2.amazonaws.com/pythondotorg-assets/media/community/logos/python-logo-only.png" style="height: 48px;"/>,
<img alt="/static/community_logos/python-powered-w-100x40.png" src="/static/community_logos/python-powered-w-100x40.png"/>,
<img alt="/static/community_logos/python-powered-h-50x65.png" src="/static/community_logos/python-powered-h-50x65.png"/>]
>>> # ์ •์ˆ˜ getitem
>>> tag_broadcast_list[0]
<img alt="Python Software Foundation" class="psf-logo" src="/static/img/psf-logo.png"/>
>>> # ์Šฌ๋ผ์ด์‹ฑ
>>> tag_broadcast_list[3:5]
[<img alt="/static/community_logos/python-powered-w-100x40.png" src="/static/community_logos/python-powered-w-100x40.png"/>,
 <img alt="/static/community_logos/python-powered-h-50x65.png" src="/static/community_logos/python-powered-h-50x65.png"/>]
>>> # ๋ฌธ์ž์—ด getitem (๋ธŒ๋กœ๋“œ์บ์ŠคํŒ… ์ ์šฉ๋จ!)
>>> tag_broadcast_list["alt"]
['Python Software Foundation',
 'Combined logo',
 'Logo device only',
 '/static/community_logos/python-powered-w-100x40.png',
 '/static/community_logos/python-powered-h-50x65.png']

CustomDefaults

CustomDefaults๋ฅผ ํ†ตํ•ด ์ง์ ‘ ๊ธฐ๋ณธ๊ฐ’์„ ์„ค์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๊ฐ’์œผ๋กœ ์ผ๋ฐ˜ get/options/head/post/put/patch/delete ๋ฐ c../a../ac.. ํ•จ์ˆ˜์˜ ๊ธฐ๋ณธ๊ฐ’์„ ํšจ๊ณผ์ ์œผ๋กœ ์„ค์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

>>> from resoup import CustomDefaults
>>>
>>> requests = CustomDefaults(headers={'User-Agent': 'User Agent for Test'})
>>> requests.get('https://httpbin.org/headers').json()['headers']['User-Agent']
'User Agent for Test'

๋ผ์ด์„ ์Šค ์ •๋ณด

์ด ํ”„๋กœ๊ทธ๋žจ์€ MIT ๋ผ์ด์„ ์Šค๋กœ ๊ณต์œ ๋ฉ๋‹ˆ๋‹ค.

์ด ํ”„๋กœ๊ทธ๋žจ์˜ ์ผ๋ถ€๋Š” requests(Apache License 2.0) ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์— ์žˆ๋˜ ์ฝ”๋“œ๋ฅผ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค. Some part of this program contains code from requests library.

์ด ํ”„๋กœ๊ทธ๋žจ์˜ ์ผ๋ถ€๋Š” typeshed(Apache License 2.0 or MIT License) ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์— ์žˆ๋˜ ์ฝ”๋“œ๋ฅผ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค. Some part of this program contains code from typeshed library.

Relese Note

0.5.2 (2023-12-26): Timeout ์˜ค๋ฅ˜๋„ attempts์— ๊ฑธ๋ฆด ์ˆ˜ ์žˆ๋„๋ก ๋ณ€๊ฒฝ, root์—์„œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๋ณ€์ˆ˜ ์ถ”๊ฐ€, ๋นŒ๋“œ ์ฝ”๋“œ ๊ฐœ์„ , ์ฝ”๋“œ ๊ฐœ์„ 

0.5.1 (2023-12-9): ๋ฒ„๊ทธ ์ˆ˜์ •

0.5.0 (2023-12-9): resoup๋กœ ์ด๋ฆ„ ๋ณ€๊ฒฝ, ์ƒˆ BroadcastList ๊ธฐ๋ณธ ์ ์šฉ, poetry ์‚ฌ์šฉ, ๊ธฐ์กด souptools ๋ชจ๋“ˆ ์ œ๊ฑฐ ๋ฐ souptoolsclass ๋ชจ๋“ˆ๋กœ ๋Œ€์ฒด, ํ…Œ์ŠคํŠธ ์ถ”๊ฐ€

0.4.1 (2023-11-4): ๊ธด๊ธ‰ ๋ฒ„๊ทธ ์ˆ˜์ •

0.4.0 (2023-11-4): raise_for_status ๊ธฐ๋ณธ๊ฐ’ ๋ณ€๊ฒฝ, souptoolsclass ์ถ”๊ฐ€, avoid_sslerror ์ถ”๊ฐ€

0.3.0 (2023-10-05): BroadcastList ๋ณต์›, sessions_with_tools ์ถ”๊ฐ€

0.2.3 (2023-09-19): header ๊ธฐ๋ณธ๊ฐ’ ๋ณ€๊ฒฝ, ConnectionError์‹œ ์—๋Ÿฌ ํ•œ ๊ฐœ๋งŒ ๋ณด์ด๋Š” ๊ฒƒ์œผ๋กœ ๋ณ€๊ฒฝ, attempts๋กœ ์žฌ์‹œ๋„ํ•  ๋•Œ ์„ฑ๊ณตํ–ˆ์„ ๋•Œ ๋ฉ”์‹œ์ง€ ์ถ”๊ฐ€, retry์—์„œ url ์ œ๊ฑฐ, setup.py์™€ ๊ด€๋ จ ํŒŒ์ผ ๋ณ€๊ฒฝ

0.2.2 (2023-09-08): attempt parameter๋ฅผ attempts๋กœ ๋ณ€๊ฒฝ, BroadcastList ์ œ๊ฑฐ

0.2.1 (2023-08-31): py.typed ์ถ”๊ฐ€, freeze_dict_and_list ์ถ”๊ฐ€

0.2.0 (2023-08-27): CustomDefaults ์ถ”๊ฐ€

0.1.1 (2023-08-27): ์ฒซ ๋ฆด๋ฆฌ์ฆˆ