pyspa is an object-oriented python package which enables you to conduct a parametric structural path analysis on square A matrices (process or input-output) for any number of environmental, social or economic satellites/flows and for any number of stages upstream in your supply chain (as long you have enough RAM). The package produces a SupplyChain object which includes Pathway and Node objects (among others). Results can be exported to the csv format with a single line of code.
The concept behind pyspa was driven by the lack of open source code to conduct structural path analysis in a robust and object-oriented manner.
You will need python to run this package as well as the following python packages:
Download and install the package from pip
pip install pyspa
Identify the template files in the installed directory, or download them directly from the github repository. The template files include:
Once you have located these files, you need to run a single function that will read the data, conduct the structural path analysis and return a SupplyChain object, as per the following code.
sc = pyspa.get_spa(target_id = 70, max_stage = 10, a_matrix_file_path ='A_matrix_template.csv', infosheet_file_path='Infosheet_template.csv', thresholds_file_path='Thresholds_template.csv')
This will return your SupplyChain object which has numerous methods. Read the documentation for more information.
To export the structural path analysis to a csv file, use the built-in method.
To save your SupplyChain object and avoid having to recalculate everything (this uses pickle):
To load a previously saved SupplyChain object:
loaded_sc = pyspa.load_instance_from_file('supply_chain.sc', pyspa.SupplyChain)
We have developped the required python methods on each object so that you can compare them. Thus,
sc == loaded_sc
sc.pathways_list[-1] == loaded_sc.pathways_list[-1]
sc.root_node == loaded_sc.root_node
will return True.
The detailed documentation is available here
The package requires three csv files to be able to conduct a structural path analysis:
- A square technological matrix, aka an A matrix
- An infosheet listing all sectors or processes, along with the direct and total intensities/multipliers/requirements for any number of environmental/economic/social satellites, and their metadata
- The cut-off thresholds used to trim the supply chain branches for each satellite.
These csv files must be formatted in a certain way for the code to work. The formatting requirements are described below.
Square technological matrix (A matrix)
The A matrix should be provided in a single csv file, regardless of its size (we have tried the code on 15k×15k matrix so far, and it works fine). It must be formatted as follows:
- The top row must be the indexes of the sectors/processes, numbered from 1 to n.
- The rest of the matrix comes underneath that row.
- No text headers nor text content
Preview of A matrix csv layout ↓
|<A matrix: input from 1 into 1>||<A matrix: input from 1 into ...>||<A matrix: input from 1 into n>|
|<A matrix: input from ... into 1>||<A matrix: input from ... into ...>||<A matrix: input from ... into n>|
|<A matrix: input from n into 1>||<A matrix: input from n into ...>||<A matrix: input from n into n>|
The infosheet must contain mandatory columns and at least one environmental/social/economic satellite/flow. It must be formatted as follows (all headers are case sensitive):
- The first column has a header called "Sector ID" and contains the IDs of each sector/process from 1 to n. These IDs match those included as a header in the A matrix.
- The second column has a header called "Name" and contains the name of each sector/process. It is highly recommended to have unique names as the csv output of the package uses names (not IDs).
- The third column has a header called "Unit" and contains the functional unit of each sector/process. It is usually a financial currency for input-output sectors (e.g. AUD, USD, EUR, YEN, etc.) and can be a physical unit for processes (e.g. kg, m³, tkm, etc.).
- The fourth column has a header called "Region" and contains the region of each sector processs. If you are not working with multiregional data, simply populate this column with the name of the region for your data (for instance in the template file, the region for all sectors is "Australia".
- From the fifth column onwards you need to include at least one satellite/flow. Satellites/flows are included using two columns: +The first column contains the direct intensity/multiplier/requirement for your satellite/flow and has a header in the following format: DR_<satellite/flow_name>_(<satellite/flow_unit>) For example, for greenhouse gas emissions, you can write: DR_GHGe_(kgCO2e) +The second column contains the total intensity/multiplier/requirement for your satellite/flow and has a header in the following format: TR_<satellite/flow_name>_(<satellite/flow_unit>) For example, for greenhouse gas emissions, you can write: TR_GHGe_(kgCO2e)
You can add as many satellites as you need to the infosheet. The code will detect them automatically, as long as their headers are formatted as above. You can also add any other metadata column for your sectors/processes, and then access them through manual coding using the predefined method on your Node objects: get_node_attribute. See the detailed documentation for more details.
The thresholds csv is by far the simplest csv file to provide. It contains only two columns and must be formatted as below:
- The first column has a header called "Flow" which contains the name of each satellite/flow that you are using, e.g. GHGe. The name of the satellite/flow must be exactly the same as what is contained in the DR and TR headers of the infosheet, but without the DR/TR_ prefix and without the _(<satellite/flow_unit>) suffix.
- The second column has a header called "Value" which contains the threshold value of each satellite/flow that you are using, e.g. GHGe. This value is usually very low. For common environmental satellites/flows, such as water(kL), energy(GJ) and greenhouse gas emissions(kgCO2e), we use threshold values for input-output data in the range of 0.000 1 and 0.000 000 000 1. The lower the threshold, the more supply chain nodes you consider, the longer the structural path analysis will take.
CSV output file
The csv output file contains some metadata on the structural path analysis itself and then lists, for each satellite/flow, the pathways extracted, by order of significance in terms of the direct intensity/multiplier/requirement of the last node in that pathway. The columns for these listing are:
- The percentage of contribution of that last node in that pathway, to the total intensity/multiplier/requirement of the selected sector/process is provided
- The value of the corresponding direct intensity/multiplier/requirement
- The value of the corresponding total intensity/multiplier/requirement
- The name of each node in the pathway, for each stage of the supply chain (1 to n).
The direct intensity/multiplier/requirement of the selected sector/process is referred to as DIRECT (Stage 0). Stage 1 refers to the first stage upstream in the supply chain, Stage 2 the following stage, all the way to Stage m as selected at the start. We recommend using around 10 stages upstream for process data, and 8 stages upstream for input-output data, based on our experience. But these values might differ.
Note: The results for each satellite/flow are listed on the same csv sheet, in the order the appear in the infosheet. You will need to scroll down to identify where each new satellite/flow results starts, which is indicated by a header and an empty row. For those using Windows, you can click on any pathway for any given satellite/flow and press: "Ctrl + Shift + ↓". This will take you to the last pathway for this satellite/flow.
Authors and contributors
- André Stephan - overall design, implementation, testing and debugging - ORCID
- Paul-Antoine Bontinck - optimisation, implementation, testing and debugging - ORCID
This project is shared under a GNU General Public License v3.0. See the LICENSE file for more information.
This project was funded by the Australian Research Council Discovery Project DP150100962 at the University of Melbourne, Australia. As such, we are endebted to Australian taxpayers for making this work possible and to the University of Melbourne for providing the facilities and intellectual space to conduct this research. The code for the base method for conducting the structural path analysis is inspired from the code of late A/Prof Graham Treloar at the University of Melbourne, who pioneered a Visual Basic Script in his PhD thesis to conduct a structural path analysis in 1997.