spkly

sparklines for pandas dataframes


License
MIT
Install
pip install spkly==0.1

Documentation

sparkly

sparkly

i was gonna replace this with some mamamoo gif in honor of the fact i was listening to hip while fixing this and watching my son jiggle about dancing but nothing does 'em justice

Changes!!!! November 26, 2019

  • Thanks to Professor Brooks' suggestion, all the images are now svgs instead of png, which also conveniently means I can nbconvert the crap out of this to HTML and the images will now render in my html files!!!! I would add a thousand more exclaimation points to this but that would out me as a crazy person. Just because I want to live large like Richard Garriott Lord British, I don't want to be him.
  • This morphed from an extra credit assignment to a legitimate thing I'm going to use. On bus ride home, I thought... if I can do sparklines and this is really just matplotlib under the hood, why can't I also use this to generate the one chart I use mostly for throwaway analysis (histograms, mostly)? Heck, couldn't I just... use anything with matplotlib, for that matter, that I use for throwaway analysis? Could this mean... I don't have to really use matplotlib that often?
  • So now my next step is to do the part of the python experience I've never had to handle before until now: learnin' how to build a package. And writing whatever is python's equivalent of javadoc! Yeah, baby steps here.

Intro to sparkly

Look... I have no imagination for title names, not after spending 18 hours on this, most of it going down deadends.

  • Pandas styler options? Nope. I still think this is a viable route, though.
  • Javascript workarounds messing with __repr__: No, I have spent all of my life not learning Javascript, I will continue to do so
  • Ipywidgets? Wow this seems finicky when all I'm doing is some preliminary analysis and am going to throw away 99% of what I work on
  • Maybe just having a single function that leverages Ipython's HTML? Oh, yes, maybe.

(This all avoids the fact that as much as I'd like to contribute to open source, 1) I suck at git and 2) I completely forgot until nearly the end that I probably have to ask permission from work for project work, even nonprofit work, first.)

Getting Started

These instructions will get your sparkly sparkline thing going. Put the python file somewhere convenient. Then:

import sparkly

Yes, that's it. I will make it fancier in the future, but you'll have to catch me at a time when I have a better handle on python101 stuff like classes, modules, packages...

Dependencies

  • matplotlib >= 3.1.1
  • pandas >= 0.25.1
  • numpy >= 1.16.5
  • ipython >= 7.8.0

Wait, how do I view a sparkline on my data?

sparkly.display(df)

Some of the optional flags available:

sparkly.display(df, order=None, type='line', max_rows=7, max_cols=999)
  • dataframe: pandas dataframe for generating sparklines. Plots generated are based on order within the dataframe (meaning dataframe sorts will generate different sparklines if you do not specify some order), see below for examples of this behavior
  • order: Series of same length as df for x-axis ordering, e.g., use df.index if you want the chart to be sorted by df.index values. Probably can accept more than a series, like a list and suchlike... so long as matplotlib will accept it for an x-axis, then sure, this will accept it too
  • type: Type of chart you want to see. Want to see sparklines? That's 'line' (it's also the default). Want to see histograms? 'histogram'. Default bins is 20 for now.
  • max_rows: max rows to display. Default is 7 now
  • max_cols: max columns to display. Default is basically all because I'm assuming you wanna see the sparklines for all of it at once or something

Examples

1. A straightforward df with no Multiindex columns or ind... indexes? Indices? Is this like the matplotlib axes thing?

import pandas as pd
import numpy as np
import datetime
import sparkly
import random

df = pd.read_csv('samplefile.csv')
df.Date = pd.to_datetime(df.Date, format='%Y%m%d')
df.set_index(['AssetClass', 'AssetClassSubType', 'Date'], inplace=True)
df.replace('*', '', inplace=True)
df = df.apply(pd.to_numeric)
df.reset_index(inplace=True)

sparkly.display(df)
AssetClass AssetClassSubType Date UMBSTradeCount <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> UMBSUniqueID <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> UMBS$Trades <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> FNMATradeCount <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> FNMAUniqueID <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> FNMA$Trades <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> FHLMCTradeCount <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> FHLMCUniqueID <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> FHLMC$Trades <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> GNMATradeCount <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> GNMAUniqueID <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> GNMA$Trades <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> OtherTradeCount <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> OtherUniqueID <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> Other$Trades <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style>
0 AGENCY PASS-THRU (TBA, STIP, $ ROLLS) SINGLE FAMILY 15Y 2019-08-01 1029.0 16.0 10367700.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 AGENCY PASS-THRU (TBA, STIP, $ ROLLS) SINGLE FAMILY 30Y 2019-08-01 6773.0 24.0 186632371.9 0.0 0.0 0.0 5.0 4.0 102700.0 2015.0 35.0 41183055.9 0.0 0.0 0.0
2 AGENCY PASS-THRU (TBA, STIP, $ ROLLS) OTHER 2019-08-01 0.0 0.0 0.0 17.0 10.0 372127.4 0.0 0.0 0.0 16.0 9.0 279696.4 0.0 0.0 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
582 AGENCY PASS-THRU (SPECIFIED) OTHER 2019-10-31 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
583 AGENCY CMO P&I 2019-10-31 0.0 0.0 0.0 280.0 145.0 205136.0 297.0 152.0 194484.0 209.0 117.0 78706.8 18.0 11.0 34884.2
584 AGENCY CMO IO/PO 2019-10-31 0.0 0.0 0.0 29.0 17.0 303877.5 34.0 16.0 321216.4 39.0 21.0 336688.5 NaN NaN NaN

2. A straightforward df, but with the frame already sorted in some predetermined way.

df2 = df.copy()
df2.sort_values(by='UMBSTradeCount', inplace=True)

sparkly.display(df2)
AssetClass AssetClassSubType Date UMBSTradeCount <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> UMBSUniqueID <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> UMBS$Trades <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> FNMATradeCount <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> FNMAUniqueID <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> FNMA$Trades <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> FHLMCTradeCount <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> FHLMCUniqueID <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> FHLMC$Trades <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> GNMATradeCount <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> GNMAUniqueID <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> GNMA$Trades <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> OtherTradeCount <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> OtherUniqueID <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> Other$Trades <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style>
584 AGENCY CMO IO/PO 2019-10-31 0.0 0.0 0.0 29.0 17.0 303877.5 34.0 16.0 321216.4 39.0 21.0 336688.5 NaN NaN NaN
231 AGENCY PASS-THRU (SPECIFIED) OTHER 2019-09-06 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
230 AGENCY PASS-THRU (SPECIFIED) ADJUSTABLE/HYBRID 2019-09-06 0.0 0.0 0.0 9.0 6.0 84252.2 38.0 19.0 974.0 7.0 4.0 31394.5 10.0 8.0 18077.3
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
37 AGENCY PASS-THRU (TBA, STIP, $ ROLLS) SINGLE FAMILY 30Y 2019-08-07 10987.0 28.0 363701507.6 0.0 0.0 0.0 25.0 6.0 1283690.0 2194.0 32.0 46713012.0 0.0 0.0 0.0
415 AGENCY PASS-THRU (TBA, STIP, $ ROLLS) SINGLE FAMILY 30Y 2019-10-07 11254.0 29.0 395799855.8 0.0 0.0 0.0 48.0 10.0 384107.0 1713.0 29.0 41745567.8 0.0 0.0 0.0
463 AGENCY PASS-THRU (SPECIFIED) SINGLE FAMILY 30Y 2019-10-14 NaN NaN NaN NaN NaN NaN NaN NaN NaN 52.0 42.0 355.3 0.0 0.0 0.0

3. A df with a Multiindex on the ind... on the rows.

df3 = df.copy()
df3 = df2.groupby(['AssetClass', 'Date']).mean()

sparkly.display(df3)
UMBSTradeCount <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> UMBSUniqueID <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> UMBS$Trades <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> FNMATradeCount <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> FNMAUniqueID <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> FNMA$Trades <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> FHLMCTradeCount <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> FHLMCUniqueID <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> FHLMC$Trades <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> GNMATradeCount <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> GNMAUniqueID <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> GNMA$Trades <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> OtherTradeCount <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> OtherUniqueID <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> Other$Trades <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style>
AssetClass Date
AGENCY CMO 2019-08-01 0.000000 0.000000 0.000000e+00 157.000000 81.000000 125801.70 154.0 75.5 108367.50 134.500000 62.500000 5.923772e+05 48.5 26.5 103661.00
2019-08-02 0.000000 0.000000 0.000000e+00 205.000000 108.000000 121861.70 103.5 53.5 217979.95 197.000000 91.000000 8.559520e+04 6.5 6.5 24734.35
2019-08-05 0.000000 0.000000 0.000000e+00 167.000000 72.000000 145733.15 189.0 105.0 170971.40 110.500000 63.000000 2.223507e+05 7.5 5.0 1004.40
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
AGENCY PASS-THRU (TBA, STIP, $ ROLLS) 2019-10-29 1907.666667 10.333333 4.314584e+07 3.333333 2.666667 18666.00 0.0 0.0 0.00 444.666667 12.000000 8.422877e+06 0.0 0.0 0.00
2019-10-30 2222.000000 12.666667 5.330876e+07 2.333333 2.333333 35673.00 0.0 0.0 0.00 477.333333 14.333333 1.458364e+07 0.0 0.0 0.00
2019-10-31 2477.666667 11.666667 6.939723e+07 5.666667 2.333333 173862.00 0.0 0.0 0.00 584.000000 15.333333 1.589110e+07 0.0 0.0 0.00

4. A df with Multiindex columns.

df4 = df.copy()
df4 = df4.groupby(['AssetClass', 'Date']).agg({
    'UMBSTradeCount': [min, max], 
    'FNMA$Trades': np.mean
})

sparkly.display(df4)
UMBSTradeCount FNMA$Trades
min <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> max <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> mean <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style>
AssetClass Date
AGENCY CMO 2019-08-01 0.0 0.0 125801.70
2019-08-02 0.0 0.0 121861.70
2019-08-05 0.0 0.0 145733.15
... ... ... ... ...
AGENCY PASS-THRU (TBA, STIP, $ ROLLS) 2019-10-29 0.0 4749.0 18666.00
2019-10-30 0.0 5419.0 35673.00
2019-10-31 0.0 6346.0 173862.00

5. A df where we have some predetermined x-axis for ordering, rather than the order the data is already set within the dataframe.

It'll sort by however matplotlib does it.

In this case I generated a randomly ordered index for the axis, which creates kind of a crazy scribbly effect.

df5 = df.copy()
random_counter = [random.randint(1, 2000000) for i in df5.index]
df5['counter'] = random_counter

sparkly.display(df=df5, order=df5['counter'])
AssetClass AssetClassSubType Date UMBSTradeCount <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> UMBSUniqueID <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> UMBS$Trades <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> FNMATradeCount <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> FNMAUniqueID <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> FNMA$Trades <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> FHLMCTradeCount <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> FHLMCUniqueID <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> FHLMC$Trades <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> GNMATradeCount <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style> GNMAUniqueID <style type="text/css">*{stroke-linecap:butt;stroke-linejoin:round;} </style>