PipeKML - ETL Based KML Library for Tabular Data in Python

Purpose

Requirements/Dependencies and Installation

Usage - Ingesting a CSV file

Usage - Working with a List

Usage - Working with a pandas DataFrame

Usage - Why PipeKML is useful for Data Analysis

Versions/History & License Agreement

Purpose

This module takes tabular data and pipes it into KMLs.

This repository was constructed with easy KML file integration with common data store files in this case csv files or lists. You can think of it as sort of an ETL pipeline for data that contains spatial fields. Other modules in python (fastkml) will provide all the functionality one could ever use, in the sense of having a robust set of options to choose from. This module differs in the sense that its goal is to get a kml from a tabular structure to a kml while ingesting the desired data associated with that object in the simplest way possible while still trying to maintain some level of functionality. (i.e. color, icon, scale, folder name/description)

The goal/idea being that fastkml isn't necessarily built for raw data being put within KML files in a simple manner and I think theirs a demand/use for a simple KML maker that admittedly takes a lot short cuts that sacrifice robustness in KML options for robustness in ingesting data and placing it within a KML that is structured how I generally desire it and also provides integration with existing geospatial libraries or typical geospatial datasets.

Other Repositories I've Created to Be Used with PipeKML

pipegeohash - A tile association and reduction script optimized for ETL processes.
pipeheatmap - A Python Heatmap Library (working progress)
pipeslice - A library meant for higher level DataFrame abstractions for geospatial data. (working progress)

Whether it be future updates/merges from my existing repositories for aggregation (left), full fledged data analysis aggregating coal truck traffic (middle), or heatmapping crime clusters in Chicago (right), PipeKML has a lot of uses.

Requirements/Dependencies and Installation.

The modules themselves should only require what exists in the python standard library, however examples use Pandas and Numpy.

Pandas
Numpy

Install PipeKML

To install PipeKML in terminal execute the following command:

pip install pipekml

Ingesting a CSV File

The example below shows the process to make a kml from a csv file. (assuming you have the proper headers)

from pipekml.points import *
from pipekml.alignments import *

#These two files are located in readme_example_files/

#writing a kml containing points of NY subway stations
#make_points() returns a list of kml lines 
#parselist simply writes whatever is in that list to a kml file
a=make_points('stations.csv',scale=.5) 
parselist(a,'stations.kml')

#writing a kml file containing the alignment of satellite from a few days ago
#take note when passing a file location it must be passed in as a list and it will right all the those locations 
a=make_route(['satellite.csv'],scale=2,color='light green')
parselist(a,'satellite.kml')

Working With a List

The example shows the process to make a kml from a regular Python list (must have headers)

Lets assume we have a list that looks something like this in tabular format. This list, as they often are formatted, has lats and longs as a string in one column, luckily PipeKML has attempted to account for that.

Country	Name	Location	Total number of reactors	Active Reactors	Reactors Under Construction	Shut Down Reactors
UKRAINE	CHERNOBYL	51.383331,30.1	4	1	0	3
UKRAINE	KHMELNITSKI	50.599998,26.549999	4	1	3	0
UKRAINE	ROVNO	51.333328,25.883329	4	3	1	0
UKRAINE	SOUTH UKRAINE	47.816669,31.216669	3	3	0	0
UKRAINE	ZAPOROZHE	47.48333,34.633331	6	6	0	0

from pipekml.points import *
from pipekml.alignments import *

#the list above
mylist=[['Country', 'Name', 'Location', 'Total number of reactors', 'Active Reactors', 'Reactors Under Construction', 'Shut Down Reactors'], ['UKRAINE', 'CHERNOBYL', '51.383331,30.1', '4', '1', '0', '3'], ['UKRAINE', 'KHMELNITSKI', '50.599998,26.549999', '4', '1', '3', '0'], ['UKRAINE', 'ROVNO', '51.333328,25.883329', '4', '3', '1', '0'], ['UKRAINE', 'SOUTH UKRAINE', '47.816669,31.216669', '3', '3', '0', '0'], ['UKRAINE', 'ZAPOROZHE', '47.48333,34.633331', '6', '6', '0', '0']]

#when inputting a list use the key list=True to indicate what your inputting is actually a list
a=make_points(mylist,list=True,scale=1)
parselist(a,'nuclear_plants.kml')

Working With a DataFrame

A pandas DataFrame is ingested using the exact same list=True boolean as before. Here we ingest a dataframe and querry out the accidents involving one fatality.

import pandas as pd
import numpy as np
from pipekml.points import *

#reading dataframe from csv file
data=pd.read_csv('STSIFARS.csv')

#getting accidents invovling more than one fatality
data=data[(data.FATALS>1)&(data.STANAME=='CALIFORNIA')&(data.VAR23C=='Angle')]
a=make_points(data,list=True,scale=.5)
parselist(a,'fatalsgreaterthan1.kml')

Why PipeKML is useful for Data Analysis

The key to pipekml is that it can ingest common data stores lists or pandas DataFrames and carry the data throughout the structure, That means nothing is inbetween you and the data you want to analyze pure Python solutions to GIS are possible. Most tools like ArcPy have a backend that works with all the data your querrying, filtering, or updating, pipekml says whatever you throw into it, thats what will be displayed making updating or adding fields much much easier. By bringing whatever data you need into memory you can put the analysis on the shoulders of powerful array-orriented data analysis tools like pandas, and with the use of pandas querries can be done in usually 1 line.

The explanation and code below relate to the table containing traffic fatality data in the link here.

The code below querries traffic fatalities to younger drivers, on the weekend, and between the hours of 10PM-2AM. Its worth noting huge long line operations like the one below are better to avoid just because their hard to read but I figured its worth showing how much easier it is to work in DataFrames with tabular data.

import pandas as pd
import numpy as np
from pipekml.points import *

fatalities=pd.read_csv('STSIFARS.csv')

#Querrying for young drivers, on the weekend, with crash hours between 10 P.M.-2 A.M.
fatalities=fatalities[((fatalities.VAR24C=='Weekend')&(fatalities.YOUNG_OLD=='Young Driver')&((fatalities.HOUR=='10:00pm-10:59pm')|(fatalities.HOUR=='11:00pm-11:59pm')|(fatalities.HOUR=='0:00am-0:59am')|(fatalities.HOUR=='1:00am-1:59am')))]

#taking fatalities dataframe adding the header row and taking the dataframe to a list
fatalities=[fatalities.columns.tolist()]+fatalities.values.tolist()

kml=make_points(fatalities,list=True,scale=.5)
parselist(kml,'young_drivers_weekend_night.kml')

Output Showing Fatalities related to Young Drivers, On the weekend, between 10PM-2AM

PipeKML gives you just what you need to display the data, nothing more. This enables you to write your own algorithms to deal with your own vector data. Also because your bringing data to memory then editing your not updating or editing any existing files. This makes it perfect for algorithm development/testing.

It also makes it super easy to slice data and display/package it in useful ways. The script below shows an example of what PipeKML can do in regards to viewing and pivoting data to view geospatially.

In other words the script below takes a table, querries out values I don't need, makes folders for each unique value in a field (hour in which crash occured), then displays different icons based on a value in another field. (crash causes)

from pipekml.points import *
import pandas as pd
import numpy as np
import itertools

#function that given a crash cause will return unique url
#i.e. for a list with causes and icon links to specific cause return link for a cause input
def get_icon(cause,iconmap):
    for row in iconmap:
        if str(row[0])==str(cause):
            return row[1]

#working with a set of points and attributes (25 mb csv file)
#bringing into memory as a dataframe
fatalities=pd.read_csv('STSIFARS.csv')

#getting header values
header=fatalities.columns.values.tolist()

#querrying state field to west virginia
fatalities=fatalities[(fatalities.STANAME=='WEST VIRGINIA')]

#selecting only the counties were analyzing
fatalities=fatalities[((fatalities.CNTYNAME=='Cabell County')|(fatalities.CNTYNAME=='Kanawha County')|(fatalities.CNTYNAME=='Putnam County'))]

#pre-made icon map that maps each cause to a given url containg a png to be displayed
iconmap=[['Angle', 'http://maps.google.com/mapfiles/kml/pushpin/blue-pushpin.png'], ['Head-On', 'http://maps.google.com/mapfiles/kml/pushpin/grn-pushpin.png'], ['Not Collision with Motor Vehicle in Transport', 'http://maps.google.com/mapfiles/kml/pushpin/ltblu-pushpin.png'], ['Rear-End', 'http://maps.google.com/mapfiles/kml/pushpin/pink-pushpin.png'], ['Sideswipe', 'http://maps.google.com/mapfiles/kml/pushpin/purple-pushpin.png']]

#getting unique months crashes occured
uniquehours=np.unique(fatalities.HOUR).tolist()

totalkml=[]
#iterating through each unique hour block
for row in uniquehours:
    #slicing each unique hour iterated through
    slicedhours=fatalities[(fatalities.HOUR==str(row))]

    #getting unique crash causes for each sliced hour interval
    uniquecrashcauses=np.unique(slicedhours['VAR23C']).tolist()

    temporarylist=[]
    hours=str(row)
    #iterating through each crash cause
    for row in uniquecrashcauses:
        #getting icon url from function made above
        icon=get_icon(str(row),iconmap)

        #slicing by crash the already sliced dataframe by crash cause
        slicedcauses=slicedhours[(slicedhours.VAR23C==str(row))].values.tolist()

        #adding a header to the output list
        slicedcauses=[header]+slicedcauses

        #sending into pipekml for output kml lines
        kml=make_points(slicedcauses,list=True,icon=str(icon),scale=1)
        kml=packagepoints(kml) #removing syntactic header and footer for folder integration

        #adding the sliced causes list to the temporarylist
        temporarylist+=kml

    #packaging list into a newfolder
    newfolder=folder(str(hours),'Crashes Occuring in hours: '+str(hours),temporarylist)

    #adding the list to totalkml
    totalkml+=newfolder

#Now that totalkml has all slices of unique hours folders containing unique icons for each crash cause
#we can now package and write to a file
totalkml=packagefinal('Traffic_Fatalities_Sliced_by_Hour',totalkml)

#writing to kml file
parselist(totalkml,'traffic_fatalities.kml')

Output of Code Above take note of Folders/Icons

Alignments Module

How It Works

Alignment data now supports raw ingestion, meaning one can simply input a list of csv file locations or a list into the make_route() function and it will make a route with the given alignment by parsing through the header (requiring a labeled header at the top of csv file) and finding each row positions for latitudes, longitudes, and elevations and then parsing said rows into geometric alignment or path data. The headers only require the text 'lat', 'long', or 'elev' in them to be parsed into geometric data meaning sometimes it may assume a row is alignment data when it is not. Being aware that alignment data sometimes is sometimes stored with two points per line the script also supports two points per line by assuming the alignment data field that comes first is the preceding point.

Another important update was made regarding file structure and module usage assumptions being that if you have exemplary/unused data stored on every line that is repitive and repeats throughout the csv alignment file this module will assume the repeating data will be added to the alignment data's kml data. In other words if a value repeats it will assume that attribute/field should be associated with the entire alignment data displayed in the popup box.

Alignments Module - Functions

make_route(filelocationlist[,segmentinfocsv,indexs,color,scale,folder,description])

parselist(total,filename)-given the function above and a file name (hopefully ending in .kml) writes kml file.

packageroutes(makerouteoutput)-takes the make_route() output and returns a list without the starting and ending wrapper of most csv files so they can be added together with subsequent list or in different folders. Use this if you want your directories in specific locations

packagefinal(nameinside,list)-replaces that syntatic sugar of the first few rows and last couple given an in filename for inside the kml and a list of rows that are xml/kml structured. essentially packageroutes/packagepoints and packagefinal are so structured so that if you take it out of its wrapper it must have packagefinal performed on the list before sending into parselist.

folder(nameoffolder,description,list)given an output of make_route or make_points() that has had its wrapper removed (packageroutes/packagepoints) returns a file director for a objects in the given list

All the kwarg argument, and argument defination below is for the make_route() function mostly.

filelocationlist - list of csv file locations to ingest that contain desired road alignment data

Function arguments below are the ones that directly affect the way in which the data is handled.

segmentinfocsv - kwarg arg, csv file location containing information about all the desired alignments you wish to parse into kml (i.e. a csv file containg all the alignment data)
indexs - kwarg arg, unique identifier within row of segmentinfocsv so that the rows data can be drawn with the associated position in the filelocationlist and parsed into into data associated with the route from segmentinfocsv (i.e. list of keys to draw data from the segmentinfocsv file into associated alignment )
list=True - kwarg, technically just list is the kwarg arg but the only time you will use this arguement is if your inputting a raw list in where the csv file location normally should be.

Function arguments below are for aesthetics and nameing/description

color - kwarg arg, currently can take inputs of red, light green, orange, and white can support much more but no way of gathering color ids in bulk (that I know of)
scale - kwarg arg, adjusts the depth of your line I suggest about 5 for a line thats pretty easy to click but not to large for most applications
folder - kwarg arg, inputs the folder name desired for the paths contained within the filelocationlist segments
description - kwarg arg, inputs the description for the given folder name

Points Module

How It Works

Not unlike the alignment module the point module recently was updated to ingest csv file or lists with greater ease. It as you might have guessed uses the header in each row for data to be displayed with each point as well as the values in the rows that each represent a point. Meaning, its looking for both 'lat', 'long', and/or 'elevation' in each header value, it also accepts a 'location' header and value with the syntax '(lat, long)' or 'lat, long'. This module takes a csv file or listwith point rows and translates each row into a kml structure with data represent that row, differing from the alignment module because of the fact that in the points module one kml element is in each row of a table, but in the alignments module each table is one kml element itself. The function make_points() is used to make points from a respective csv file.

The main difference between this module and alignment module in the way it functions is that it can take a 'location' header with values commonly in the syntax '(latitude, longitude)' or 'latitude, longitude' and parse them into the KML correctly. I found this implementation helpful as a lot of files or tables simply have a 'LOCATION' header with the values placed in a string like the examples above.

Points Module - Functions

make_points(pointcsvfile[indexs,color,scale,folder,description,icon,icons])

parselist(total,filename)-given the function above and a file name (hopefully ending in .kml) writes kml file.

packageroutes(makepointsoutput)-takes the make_points() output and returns a list without the starting and ending wrapper of most csv files so they can be added together with subsequent list or in different folders. Use this if you want your directories in specific locations

All the kwarg argument, and argument defination below is for the make_points() function mostly.

pointcsvfile - string containing file location of table to parse into a kml

The kwarg argument below 'indexs' should be used when you want to querry points based on a list of unique index that exist only in the desired row. This saves from having to write a new csv file and then take the new csv file and turn it into a kml by taking the intial csv and only grabbing the rows in the desired list of indexs. If your not to concerned with performence writing to a new csv file would be suggested just for ease of use and clarity.

indexs - kwarg arg, given a list of unique keys will find unique row in list corresponding to each key and only write those rows to a csv file, basically querry values in the csv file without having to rewrite a new csvfile with just the desired rows.
list=True - kwarg arg, technically just list is the kwarg arg but the only time you will use this arguement is if your inputting a raw list in where the csv file location normally should be.

Function arguments below are for aesthetics and nameing/description

color - kwarg arg, currently can take inputs of red, light green, orange, and white can support much more but no way of gathering color ids in bulk (that I know of)
scale - kwarg arg, adjusts the depth of your line I suggest about 5 for a line thats pretty easy to click but not to large for most applications
folder - kwarg arg, inputs the folder name desired for the paths contained within the filelocationlist segments
description - kwarg arg, inputs the description for the given folder name
icon - kwarg arg, inputs the href or link for a photo of the icon wanting to be used please note only href icons are supported at this time, not the icon keys that default google icons uses. To get the href of the google default icons go here, then simply use that link for icon value. If icon argument is filled it will assume all points are desired with this icon.
icons - kwarg arg, inputs a list of hrefs or links fora photo of an icon wanting to be used. This argument assumes that the size of the list of icons is the same as the size of the list of point rows in the csv file or the size of the list of the indexs to be written into the kml file. (i.e. the icons list size needs to be the same size as the list of indexs or the total points within the file.)

Blocks Module

The blocks module takes a list or csv file and parses it into a set of blocks from the four corners of each block that exist in each row. Its meant mainly to only support colors at this time but it could be good for visualizing clusters or other things. The blocks module assumes you have a structure like the one below for each block:

COUNT	LAT1	LONG1	LAT2	LONG2	LAT3	LONG3	LAT4	LONG4
1	41.295	-87.832	41.31	-87.832	41.295	-87.817	41.31	-87.817

An example of random colors generated from a high definition parition of Chicago.

The table below shows the default labels that can be used and the the actual image associated with them.

Label	Corresponding Image
yellow
white
red
pink
orange
light green
blue
light blue

Now the "heatmap" table called using "heatmap=True" in kwargs uses a different approach right now it has 7 levels and its labels correspond to the following images.

Block Tier/Label	Corresponding Image
block1
block2
block3
block4
block5
block6
block7

Blocks Module - Functions

make_blocks(csvfile[color,colorfield,heatmap,list])-given a structure like the one you see above (4 lat/longs on per line) will take that in a csvfile form or list form and turn it in to corresponding kml lines

parselist(total,filename)-given the function above and a file name (hopefully ending in .kml) writes kml file.

packageblocks(makepointsoutput)-takes the make_blocks() output and returns a list without the starting and ending wrapper of most kml files so they can be added together with subsequent list or in different folders. Use this if you want your directories in specific locations

folder(nameoffolder,description,list)given an output of make_route or make_points() that has had its wrapper removed (packageroutes/packagepoints/packageblocks) returns a file director for a objects in the given list

All the kwarg argument, and argument definition below is for the make_blocks() function mostly.

list=True - kwarg arg, technically just list is the kwarg arg but the only time you will use this arguement is if your inputting a raw list in where the csv file location normally should be.

Function arguments below are for aesthetics and nameing/description

color - kwarg arg, when this kwarg is given it assumes you want all the colors in the csv file or list given to be this color
folder - kwarg arg, inputs the folder name desired for the paths contained within the filelocationlist segments
description - kwarg arg, inputs the description for the given folder name
colorfield - kwarg arg, assuming you have a field with colors each block should be give the its position in the row as an integer and it will process all the colors at once (the counterpart to the color argument)
heatmap=True-boolean to turn the color map from the normal one above to the heatmap

Example 1 - Sample Dataset Provided

This example uses a geospatial dataset I've used extensively and just shows a small sample of some of the things I've done with it as well as some of the ways you can integrate the data into a kml file in a useful manner.

To view the example code please go to the directory and view the example1.py script.

Output from Example 1 Displayed Below (w/ MUTCD signage integration)

Example 2 - Public Dataset Pandas/matplotlib Integration

Download csv file

This example utilizes pandas and numpy as well as the modules that exist in this repository. This shows a simple imperitive script for how to break down structures for use with separation via folder directories. I think it shows that this module can be pretty useful for displaying geospatial data.

To view the example code please go to the directory and view the example2.py script.

Example 3 - Randomly Generating Colors on Earth

This example is simply to show a general over of the things that can be done with the block overlay.

See example3.py script here.

from pipekml.blocks import *
from random import randint

#colors blocks can be at this time
colors=['yellow','white','red','pink','orange','blue','light green','light blue']

#reading the squares into memory 
data=read('squares.csv')

newlist=[data[0]]
#randomizing a color for each block and adding to the end of each row
for row in data[1:]:
    #randomly generating a number corresponding to a row position in colors list
    color=colors[randint(0,7)]
    newrow=row+[color]
    newlist.append(newrow)

#sending into make blocks 
#note list=True argument if your not using a csv file
#the colorfield argument corresponds to the row position of the color column
#becaue we just added to the end we can use -1
a=make_blocks(newlist,list=True,colorfield=-1)
parselist(a,'random_colors.kml')

Output of Colors Generated Displayed Below

Gallery

Complex KML Structures (w/MUTCD signage integration)

Block Image Overlay Aggregation/Pivoting below Shows Crime Clusters in Chicago

Future Updates/Past Uses I've found

Plane Flight Trajectory for Metrojet Flight 9268 showing it most likely broke up in air.

Click here or image below for YouTube video.

Ordinal KML file representing a path from point A to point B with data associated with traffic aggregated.

Contribute/Feedback

If you like to give me feedback or contribute your more then welcome, if you have feedback or would like to collaborate on a new project email me at murphy214@live.marshall.edu.

History/Versions

This module was directed ported from Routing-KML-Maker on 11/25/15 the reason being I was afraid simply changing the repositories name would 404 links to the module so instead I just created a new one with a name that I felt fitting for what this module really does.

No huge structure or changes are planned just continued updates.

Version	Description of Update	Date
Version 1.0	Ported Routing-KML-Maker	11/24/15
Version 1.1	Name change to module lower case	11/29/15
Version 1.2	Added more functionality	12/5/15
Version 1.3	Added Blocks module	12/27/15

License Information

I use the standard apache license agreement from the little I've gathered on license's actually mean or if its even applicable to my project.

Want more information? Find detailed information about what each license means here.

pipekml Release 1.4

Release 1.4 Toggle Dropdown 1.5 1.4 1.3 1.0

Documentation

PipeKML - ETL Based KML Library for Tabular Data in Python

Table of Contents

Purpose

Other Repositories I've Created to Be Used with PipeKML

Requirements/Dependencies and Installation.

Install PipeKML

Ingesting a CSV File

Working With a List

Working With a DataFrame

Why PipeKML is useful for Data Analysis

Output Showing Fatalities related to Young Drivers, On the weekend, between 10PM-2AM

Output of Code Above take note of Folders/Icons

Alignments Module

How It Works

Alignments Module - Functions

Points Module

How It Works

Points Module - Functions

Blocks Module

An example of random colors generated from a high definition parition of Chicago.

Blocks Module - Functions

Example 1 - Sample Dataset Provided

Output from Example 1 Displayed Below (w/ MUTCD signage integration)

Example 2 - Public Dataset Pandas/matplotlib Integration

Example 3 - Randomly Generating Colors on Earth

Output of Colors Generated Displayed Below

Gallery

Complex KML Structures (w/MUTCD signage integration)

Block Image Overlay Aggregation/Pivoting below Shows Crime Clusters in Chicago

Future Updates/Past Uses I've found

Plane Flight Trajectory for Metrojet Flight 9268 showing it most likely broke up in air.

Ordinal KML file representing a path from point A to point B with data associated with traffic aggregated.

Contribute/Feedback

History/Versions

License Information

Stats

Development practices

Releases

Contributors

pipekml
Release 1.4

Release 1.4

1.5

1.4

1.3

1.0