ukds
A Python package for working with datasets from the UK Data Service (UKDS).
Any problems? Please raise an Issue on GitHub
To install:
pip install ukds
Quick Demo
(This demonstration uses the following dataset: Gershuny, J., Sullivan, O. (2017). United Kingdom Time Use Survey, 2014-2015. Centre for Time Use Research, University of Oxford. [data collection]. UK Data Service. SN: 8128, http://doi.org/10.5255/UKDA-SN-8128-1)
The following code reads a UK Data Service .tab data file and its associated .rtf data dictionary file, and converts them to a Pandas DataFrame:
import ukds
dt=UKDS.DataTable(fp_tab=r'.../uktus15_household.tab'
fp_dd=r'.../uktus15_household_ukda_data_dictionary.rtf')
df=dt.get_dataframe()
The DataFrame looks like this:
User Guide
The ukds package provides two classes:
DataTable
class
The The DataTable class converts a UKDS .tab data file and .rtf data dictionary file into a single Pandas DataFrame ready for further analysis.
Importing the DataTable class
from ukds import DataTable
Creating an instance of DataTable and reading in the data file and the datadictionary file
Either:
dt=DataTable()
dt.read_tab(r'.../uktus15_household.tab')
df.read_datadictionary(r'.../uktus15_household_ukda_data_dictionary.rtf')
or:
dt=DataTable(fp_tab=r'.../uktus15_household.tab',
fp_dd=r'.../uktus15_household_ukda_data_dictionary.rtf')
Attributes
As the files are read in, a number of attributes are populated. These are:
dt.tab # a pandas.DataFrame object
dt.datadictionary # a ukds.DataDictionary object
get_dataframe method
The method get_dataframe
is available which converts the information in the tab
and datadictionary
attributes into a new pandas DataFrame.
dt=df.get_dataframe()
See the datatable_demo.ipynb Jupyter Notebook in the 'demo' section for more information.
DataDictionary
class
The The DataDictionary class provides access to UKDS .rtf data dictionary files.
Importing the DataDictionary class
from ukds import DataDictionary
Creating an instance of DataTable and reading in the data file and the datadictionary file
Either:
dd=DataDictionary()
dd.read_rtf(r'.../uktus15_household_ukda_data_dictionary.rtf')
or:
dd=DataDictionary(fp_dd=r'.../uktus15_household_ukda_data_dictionary.rtf')
Attributes
As the file are read in, a number of attributes are populated. These are:
dt.rtf # a string of the raw contents of the rtf file
dt.variablelist # a list of dictionaries with the variable information
get_variable_dict method
Returns a dictionary with the information for a single variable. For example:
serial=dd.get_variable_dict('serial')
returns:
{'pos': '1',
'variable': 'serial',
'variable_label': 'Household number',
'variable_type': 'numeric',
'SPSS_measurement_level': 'SCALE',
'SPSS_user_missing_values': '',
'value_labels': ''}
get_variable_names method
Returns a list of the variable names:
dd.get_variable_names()
See the datadictionary_demo.ipynb Jupyter Notebook in the 'demo' section for more examples based on this class.