pip install scrape-nhs-conditions==1.0.3


Scrape NHS Conditions

Build Code style: black License: MIT

This package uses the NHS Website Developer portal and Scrapy to pull down the text content of the NHS Conditions website into text files for downstream use by data science projects.

This is a simplified version of the work found here:


This repository is maintained by NHS England Data Science Team.

To contact us raise an issue on Github or via email.

See our (and our colleagues') other work here:


There is a need for easy access to the text content of NHS Conditions, particularly given the useful work by CogStack in creating lists of NHS Conditions questions and answers.

The NHS Developer API is very useful, but requires some setup and training to use - overkill if all a data science project needs is the NHS Conditions text. Additionally, the outputs of the API need further processing to get just the textual components of each page.

This package aims to make this whole process easier, requiring the user to simply run:

  • run_nhs_conditions_scraper: to extract the HTML for each page
  • process_nhs_conditions_json: to extract the text for each page into txt files

An example of how these are used can be see in the scrape_nhs_conditions.ipynb notebook


  • Python (> 3.0)

Getting Started

  1. Clone the repository. To learn about what this means, and how to use Git, see the Git guide.
git clone <insert URL>
  1. Set up your environment using pip. For more information on how to use virtual environments and why they are important see the virtual environments guide.

Using pip

python -m venv .venv
python -m pip install -r requirements.txt

For Visual Studio Code it is necessary that you change your default interpreter to the virtual environment you just created .venv. To do this use the shortcut Ctrl-Shift-P, search for Python: Select interpreter and select .venv from the list.

Project structure

The LICENCE file will need to be updated with the correct year and owner

Unless stated otherwise, the codebase is released under the MIT License. This covers both the codebase and any sample code in the documentation.

Any HTML or Markdown documentation is © Crown copyright and available under the terms of the Open Government 3.0 licence.
