DataStoryPatternLibrabry

Data Story Patterns Library is a repository with pattern analysis designated for Linked Open Statistical Data. Story Patterns were retrieved from literture reserach udenr general subject of "data journalism".

Installation

pip install datastories

Requirements will be automatically installed with package

###Import/Usage

import datastories.analytical as patterns

patterns.DataStoryPattern(sparqlendpointurl, jsonmetadata)

Object created allow to query SPARQL endpoint based on JSON meatadat provided.

JSON Template

{
    "cube_key" : {
		"title":"title of cube",
		"dataset_structure":"URI for cube structure",
        "dimensions":{
            "dimension_key":{
                "dimension_title":"Title of diemnsion",
                "dimension_url":"URI for dimension",
                "dimension_prefix":"URI for dimension's values"
            },
            "dimension_key":{
                "dimension_title":"Title of diemnsion",
                "dimension_url":"URI for dimension",
                "dimension_prefix":"URI for dimension's values"
            }
		},
		"hierarchical_dimensions":{
			"dimension_key":{
                "dimension_title":"Title of diemnsion",
                "dimension_url":"URI for dimension",
                "dimension_prefix":"URI for dimension's values",
				"dimension_levels":
				{
					"level_key":"integer(granularity level)",
					"level_key":"integer(granularity level)"

				}
			}
		},
		"measures":{
			"measure_key":{
			"measure_title":"Title of measure",
			"measure_url":"URI for measure"
			}

		}
    }
}

Patterns Description

Measurement and Counting
League Table
Internal Comprison
Profile Outliers
Dissect Factors
Highlight Contrast
Start Big Drill Down
Start Small Zoom Out
Analysis By Category
Explore Intersection
Narrating Change Over Time

MCounting

Measurement and Counting Arithemtical operators applied to whole dataset - basic information regarding data

Attributes

def MCounting(self,cube="",dims=[],meas=[],hierdims=[],count_type="raw",df=pd.DataFrame() )

Parameter	Type	Description
cube	`String`	Cube, which dimensions and measures will be investigated
dims	`list[String]`	List of dimensions (from cube) to take into investigation
meas	`list[String]`	List of measures (from cube) to take into investigation
hierdims	`dict{hierdim:{"selected_level":[value]}}`	Hierarchical Dimesion with selected hierarchy level to take into investigation
count_type	`String`	Type of Count to perform
df	`DataFrame`	DataFrame object, if data is already retrieved from endpoint

Output

Based on count_type value

Count_type	Description
raw	data without any analysis performed
sum	sum across all numeric columns
mean	mean across all numeric columns
min	minimum values from all numeric columns
max	maximum values from all numeric columns
count	amount of records

LTable

LeagueTable - sorting and extraction specific amount of records

Attributes

def LTable(self,cube=[],dims=[],meas=[],hierdims=[], columns_to_order="", order_type="asc", number_of_records=20,df=pd.DataFrame())

Parameter	Type	Description
cube	`String`	Cube, which dimensions and measures will be investigated
dims	`list[String]`	List of dimensions (from cube) to take into investigation
meas	`list[String]`	List of measures (from cube) to take into investigation
hierdims	`dict{hierdim:{"selected_level":[value]}}`	Hierarchical Dimesion with selected hierarchy level to take into investigation
columns_to_order	`list[String]`	Set of columns to order by
order_type	`String`	Type of order (asc/desc)
number_of_records	`Integer`	Amount of records to retrieve
df	`DataFrame`	DataFrame object, if data is already retrieved from endpoint

Output

Based on sort_type value

Sort_type	Description
asc	ascending order based on columns provided in `columns_to_order`
desc	descending order based on columns provided in `columns_to_order`

InternalComparison

InternalComparison - comparison of numeric values related to textual values within one column

Attributes

def InternalComparison(self,cube="",dims=[],meas=[],hierdims=[],df=pd.DataFrame(), dim_to_compare="",meas_to_compare="",comp_type="")

Parameter	Type	Description
cube	`String`	Cube, which dimensions and measures will be investigated
dims	`list[String]`	List of dimensions (from cube) to take into investigation
meas	`list[String]`	List of measures (from cube) to take into investigation
hierdims	`dict{hierdim:{"selected_level":[value]}}`	Hierarchical Dimesion with selected hierarchy level to take into investigation
df	`DataFrame`	DataFrame object, if data is already retrieved from endpoint
dim_to_compare	`String`	Dimension, which values will be investigated
meas_to_compare	`String`	Measure, which numeric values related to `dim_to_compare` will be processed
comp_type	`String`	Type of comparison to perform

Output

Independent from comp_type selected, output data will have additional column with numerical column meas_to_compare processed in specific way.

Available types of comparison comp_type

Comp_type	Description
diffmax	difference with max value related to specific textual value
diffmean	difference with arithmetic mean related to specific textual values
diffmin	difference with minimum value related to specific textual value

ProfileOutliers

ProfileOutliers - detection of unusual values within data (anomalies)

Attributes

def ProfileOutliers(self,cube=[],dims=[],meas=[],hierdims=[],df=pd.DataFrame(), displayType="outliers_only")

Parameter	Type	Description
cube	`String`	Cube, which dimensions and measures will be investigated
dims	`list[String]`	List of dimensions (from cube) to take into investigation
meas	`list[String]`	List of measures (from cube) to take into investigation
hierdims	`dict{hierdim:{"selected_level":[value]}}`	Hierarchical Dimesion with selected hierarchy level to take into investigation
df	`DataFrame`	DataFrame object, if data is already retrieved from endpoint
display_type	`String`	What information display are bound to display (with/without anomalies)

Output

Pattern analysis using python scipy library will perform quick exploration in serach of unusual values within data.

Based on display_type parameter data will be displayed with/without ddetected unusual values.

Available types of displaying display_type

display_type	Description
outliers_only	returns rows from dataset where unusual values were detected
without_outliers	returns dataset with excluded rows where unusual values were detected

DissectFactors

DissectFactors - decomposition of data based on values in dim_to_dissect

Attributes

def DissectFactors(self,cube="",dims=[],meas=[],hierdims=[],df=pd.DataFrame(),dim_to_dissect="")

Parameter	Type	Description
cube	`String`	Cube, which dimensions and measures will be investigated
dims	`list[String]`	List of dimensions (from cube) to take into investigation
meas	`list[String]`	List of measures (from cube) to take into investigation
hierdims	`dict{hierdim:{"selected_level":[value]}}`	Hierarchical Dimesion with selected hierarchy level to take into investigation
df	`DataFrame`	DataFrame object, if data is already retrieved from endpoint
dim_to_dissect	`String`	Based on which dimension data should be decomposed

Output

As an output, data will be decomposed in a form of a dictionary, where each subset have values only related to specific value. Dictionary of subdataset will be constructed as a series of paiers where key per each susbet will values from dim_to_dissect and this key value will be data, where yhis key value was occurring.

HighlightContrast

HighlightContrast - partial difference within values related to one textual column

Attributes

def HighlightContrast(self,cube="",dims=[],meas=[],hierdims=[],df=pd.DataFrame(),dim_to_contrast="",contrast_type="",meas_to_contrast="")

Parameter	Type	Description
cube	`String`	Cube, which dimensions and measures will be investigated
dims	`list[String]`	List of dimensions (from cube) to take into investigation
meas	`list[String]`	List of measures (from cube) to take into investigation
hierdims	`dict{hierdim:{"selected_level":[value]}}`	Hierarchical Dimesion with selected hierarchy level to take into investigation
df	`DataFrame`	DataFrame object, if data is already retrieved from endpoint
dim_to_contrast	`String`	Textual column, from which values will be contrasted
meas_to_contrast	`String`	Numerical column, which values are contrasted
contrast_type	`String`	Type of contrast to present

Output

Independent from contrast_type selected, output data will have additional column with numerical column meas_to_contrast processed in specific way.

Available types of comparison contrast_type

Contrast_type	Description
partofwhole	difference with max value related to specific textual value
partofmax	difference with arithmetic mean related to specific textual values
partofmin	difference with minimum value related to specific textual value

StartBigDrillDown

StartBigDrillDown - data retrieval from multiple hierachical levels.

This pattern can be only applied to data not stored already in DataFrame

Attributes

def StartBigDrillDown(self,cube="",dims=[],meas=[],hierdim_drill_down=[])

Parameter	Type	Description
cube	`String`	Cube, which dimensions and measures will be investigated
dims	`list[String]`	List of dimensions (from cube) to take into investigation
meas	`list[String]`	List of measures (from cube) to take into investigation
hierdim_drill_down	`dict{hierdim:list[str]}`	Hierarchical dimension with list of hierarchy levels to inspect

Output

As an output, data will be retrieved in a form of a dictionary, where each dataset will be retrieved from different hierachy level. List will be provided inhierdim_drill_down. Hierachy levels provided by in parameter will automatically sorted in order from most general to most detailed level based on metadata provided.

StartSmallZoomOut

StartSmallZoomOut - data retrieval from multiple hierachical levels.

This pattern can be only applied to data not stored already in DataFrame

Attributes

def StartSmallZoomOut(self,cube="",dims=[],meas=[],hierdim_zoom_out=[])

Parameter	Type	Description
cube	`String`	Cube, which dimensions and measures will be investigated
dims	`list[String]`	List of dimensions (from cube) to take into investigation
meas	`list[String]`	List of measures (from cube) to take into investigation
hierdim_zoom_out	`dict{hierdim:list[str]}`	Hierarchical dimension with list of hierarchy levels to inspect

Output

As an output, data will be retrieved in a form of a dictionary, where each dataset will be retrieved from different hierachy level. List will be provided inhierdim_zoom_out. Hierachy levels provided by in parameter will automatically sorted in order from most detaile to most general level based on metadata provided.

AnalysisByCategory

AnalysisByCategory - ecomposition of data based on values in dim_for_category with analysis performed on each susbet

Attributes

def AnalysisByCategory(self,cube="",dims=[],meas=[],hierdims=[],df=pd.DataFrame(),dim_for_category="",meas_to_analyse="",analysis_type="min"):

Parameter	Type	Description
cube	`String`	Cube, which dimensions and measures will be investigated
dims	`list[String]`	List of dimensions (from cube) to take into investigation
meas	`list[String]`	List of measures (from cube) to take into investigation
hierdims	`dict{hierdim:{"selected_level":[value]}}`	Hierarchical Dimesion with selected hierarchy level to take into investigation
df	`DataFrame`	DataFrame object, if data is already retrieved from endpoint
dim_for_category	`String`	Dimension, based on which input data will be categorised
meas_to_analyse	`String`	Measure, which will be analysed
analysis_type	`String`	Type of analysis to perform

Output

As an output, data will be decomposed in a form of a dictionary, where each subset have values only related to specific value. Such subset will get analysed based on analysis_type parameter

Available types of analysis analysis_type

Analysis_type	Description
min	Minimum per each category
max	Maximum per each category
mean	Arithmetical mean per each category
sum	Total value from each category

ExploreIntersection

Attributes

def ExploreIntersection(self, dim_to_explore=""):

Parameter	Type	Description
dim_to_explore	`String`	Dimension, which existence within enpoint is going to be investigated

Output

Pattern will return series of datasets, where each will represent occurence of dim_to_explore in one cube

NarratingChangeOverTime

Attributes

def NarrChangeOT(self,cube="",dims=[],meas=[],hierdims=[],df=pd.DataFrame(),meas_to_narrate="",narr_type="")

Parameter	Type	Description
cube	`String`	Cube, which dimensions and measures will be investigated
dims	`list[String]`	List of dimensions (from cube) to take into investigation
meas	`list[String]`	List of measures (from cube) to take into investigation
hierdims	`dict{hierdim:{"selected_level":[value]}}`	Hierarchical Dimesion with selected hierarchy level to take into investigation
df	`DataFrame`	DataFrame object, if data is already retrieved from endpoint
meas_to_narrate	`String`	Set of 2 measures, which change will be narrated
narr_type	`String`	Type of narration to perform

Output

Independent from narr_type selected, output data will have additional column with numerical values processed in specific way.

Available types of analysis narr_type

Narr_type	Description
percchange	Percentage change between first nad second property
diffchange	Quantitive change between first and second property

datastories Release 0.3.3

Release 0.3.3 Toggle Dropdown 0.3.11 0.3.10 0.3.9 0.3.8 0.3.7 0.3.6 0.3.5 0.3.4 0.3.3 0.3.2

Documentation

DataStoryPatternLibrabry

Installation

JSON Template

Patterns Description

MCounting

Attributes

Output

LTable

Attributes

Output

InternalComparison

Attributes

Output

ProfileOutliers

Attributes

Output

DissectFactors

Attributes

Output

HighlightContrast

Attributes

Output

StartBigDrillDown

Attributes

Output

StartSmallZoomOut

Attributes

Output

AnalysisByCategory

Attributes

Output

ExploreIntersection

Attributes

Output

NarratingChangeOverTime

Attributes

Output

Stats

Development practices

Releases

Contributors

datastories
Release 0.3.3

Release 0.3.3

0.3.11

0.3.10

0.3.9

0.3.8

0.3.7

0.3.6

0.3.5

0.3.4

0.3.3

0.3.2