Simple web scraping system to find best prices for medicines


Keywords
beautifulsoup4, chromedriver, geckodriver, phamtomjs, python37, selenium, webscraping
License
MIT
Install
pip install druglord==1.0.1

Documentation

Drug Lord

Simple web scraping system to find best prices for medicines.

Requirements

You MUST download Firefox webdriver for working with DrugLord.

If you want to use another browser, find a suitable webdriver for your favorite browser below:

Browser Driver
Chrome https://sites.google.com/a/chromium.org/chromedriver/downloads
Firefox https://github.com/mozilla/geckodriver/releases
Safari https://webkit.org/blog/6900/webdriver-support-in-safari-10/
PhantomJS (Deprecated) http://phantomjs.org/download.html

After downloading your driver, extract the contents to any folder and make sure to add your webdriver to your computer PATH

PS: Windows users may have to restart the computer after adding the webdriver to PATH

PPS: If you need a headless environment, you must use either Chrome, Firefox or PhantomJS


Installation

It's as simple as installing any other pip package

$ pip install druglord

Usage

Run it as a module using default Firefox browser in headless mode:

$ python -m druglord tylex

Run it as a module using with a different locale:

$ python -m druglord tylex --locale=en_US  # unavailable

Run it as a module using Chrome browser in headless mode:

$ python -m druglord tylex --browser=Chrome

Run it as a module and search using multiple terms:

$ python -m druglord meloxicam 7,5mg

Or create a new project (this one using Chrome browser in headless mode):

$ mkdir druglord_test
$ cd druglord_test
$ python -m venv venv
$ . ./venv/Scripts/activate
$ pip install druglord

Create a new file named search.py and add the following contents:

import sys
from druglord import Druglord
from druglord.browsers import Chrome as Browser


if __name__ == '__main__':
    browser = Browser()
    dl = Druglord(browser)
    term = sys.argv[1]
    print()

    results = dl.search(term).order_by('price').results

    for product in results:
        if product.available:
            print(f'{product.drugstore.name[:9]: <10}', end='')
            print(f'{product.name[:89]: <90}R$ ', end='')
            print(f'{product.price: >7.2f}')

    print(f'\n{len(results)} products found '
          f'in {dl.search_time:.2f} seconds')

From command-line, you can search for a medicine:

$ python search.py tylex

You can also search using multiple words:

$ python search.py "Vallium 10mg"

If you're a Flask fanboy like myself, you could build an API as simple as :

import os
from druglord import Druglord
from druglord.browsers import Firefox as Browser
from flask import Flask, jsonify, request
from flask_cors import CORS

app = Flask(__name__)
CORS(app)
browser = Browser()


@app.route('/search', methods=['POST'])
def search():
    data = request.get_json()
    if data is None:
        return jsonify({'message': 'Invalid payload'}), 400
    term = data.get('search', None)
    if term is None:
        return jsonify({'message': 'Invalid search term'}), 400

    dl = Druglord(browser)
    results = dl.search(term).order_by('price').results
    products = [product.to_dict() for product in results if product.available]
    return jsonify({
        'message': 'success',
        'products': products,
        'search_time': dl.search_time
    }), 200


if __name__ == '__main__':
    app.run()

Todo (Help wanted)

  • Add more tests
  • Add drugstores for multiple countries
  • Traverse multiple page results
  • Multi-threading/Parallel Processing
  • Machine learning to normalize product names by similarity