PipeKML - ETL Based KML Library for Tabular Data in Python
Table of Contents
Requirements/Dependencies and Installation
Usage - Working with a pandas DataFrame
Usage - Why PipeKML is useful for Data Analysis
Versions/History & License Agreement
Purpose
This module takes tabular data and pipes it into KMLs.
This repository was constructed with easy KML file integration with common data store files in this case csv files or lists. You can think of it as sort of an ETL pipeline for data that contains spatial fields. Other modules in python (fastkml) will provide all the functionality one could ever use, in the sense of having a robust set of options to choose from. This module differs in the sense that its goal is to get a kml from a tabular structure to a kml while ingesting the desired data associated with that object in the simplest way possible while still trying to maintain some level of functionality. (i.e. color, icon, scale, folder name/description)
The goal/idea being that fastkml isn't necessarily built for raw data being put within KML files in a simple manner and I think theirs a demand/use for a simple KML maker that admittedly takes a lot short cuts that sacrifice robustness in KML options for robustness in ingesting data and placing it within a KML that is structured how I generally desire it and also provides integration with existing geospatial libraries or typical geospatial datasets.
Other Repositories I've Created to Be Used with PipeKML
- pipegeohash - A tile association and reduction script optimized for ETL processes.
- pipeheatmap - A Python Heatmap Library (working progress)
- pipeslice - A library meant for higher level DataFrame abstractions for geospatial data. (working progress)
Whether it be future updates/merges from my existing repositories for aggregation (left), full fledged data analysis aggregating coal truck traffic (middle), or heatmapping crime clusters in Chicago (right), PipeKML has a lot of uses.
Requirements/Dependencies and Installation.
The modules themselves should only require what exists in the python standard library, however examples use Pandas and Numpy.
Pandas
Numpy
Install PipeKML
To install PipeKML in terminal execute the following command:
pip install pipekml
Ingesting a CSV File
The example below shows the process to make a kml from a csv file. (assuming you have the proper headers)
from pipekml.points import *
from pipekml.alignments import *
#These two files are located in readme_example_files/
#writing a kml containing points of NY subway stations
#make_points() returns a list of kml lines
#parselist simply writes whatever is in that list to a kml file
a=make_points('stations.csv',scale=.5)
parselist(a,'stations.kml')
#writing a kml file containing the alignment of satellite from a few days ago
#take note when passing a file location it must be passed in as a list and it will right all the those locations
a=make_route(['satellite.csv'],scale=2,color='light green')
parselist(a,'satellite.kml')
Working With a List
The example shows the process to make a kml from a regular Python list (must have headers)
Lets assume we have a list that looks something like this in tabular format. This list, as they often are formatted, has lats and longs as a string in one column, luckily PipeKML has attempted to account for that.
Country | Name | Location | Total number of reactors | Active Reactors | Reactors Under Construction | Shut Down Reactors |
---|---|---|---|---|---|---|
UKRAINE | CHERNOBYL | 51.383331,30.1 | 4 | 1 | 0 | 3 |
UKRAINE | KHMELNITSKI | 50.599998,26.549999 | 4 | 1 | 3 | 0 |
UKRAINE | ROVNO | 51.333328,25.883329 | 4 | 3 | 1 | 0 |
UKRAINE | SOUTH UKRAINE | 47.816669,31.216669 | 3 | 3 | 0 | 0 |
UKRAINE | ZAPOROZHE | 47.48333,34.633331 | 6 | 6 | 0 | 0 |
from pipekml.points import *
from pipekml.alignments import *
#the list above
mylist=[['Country', 'Name', 'Location', 'Total number of reactors', 'Active Reactors', 'Reactors Under Construction', 'Shut Down Reactors'], ['UKRAINE', 'CHERNOBYL', '51.383331,30.1', '4', '1', '0', '3'], ['UKRAINE', 'KHMELNITSKI', '50.599998,26.549999', '4', '1', '3', '0'], ['UKRAINE', 'ROVNO', '51.333328,25.883329', '4', '3', '1', '0'], ['UKRAINE', 'SOUTH UKRAINE', '47.816669,31.216669', '3', '3', '0', '0'], ['UKRAINE', 'ZAPOROZHE', '47.48333,34.633331', '6', '6', '0', '0']]
#when inputting a list use the key list=True to indicate what your inputting is actually a list
a=make_points(mylist,list=True,scale=1)
parselist(a,'nuclear_plants.kml')
Working With a DataFrame
A pandas DataFrame is ingested using the exact same list=True boolean as before. Here we ingest a dataframe and querry out the accidents involving one fatality.
import pandas as pd
import numpy as np
from pipekml.points import *
#reading dataframe from csv file
data=pd.read_csv('STSIFARS.csv')
#getting accidents invovling more than one fatality
data=data[(data.FATALS>1)&(data.STANAME=='CALIFORNIA')&(data.VAR23C=='Angle')]
a=make_points(data,list=True,scale=.5)
parselist(a,'fatalsgreaterthan1.kml')
Why PipeKML is useful for Data Analysis
The key to pipekml is that it can ingest common data stores lists or pandas DataFrames and carry the data throughout the structure, That means nothing is inbetween you and the data you want to analyze pure Python solutions to GIS are possible. Most tools like ArcPy have a backend that works with all the data your querrying, filtering, or updating, pipekml says whatever you throw into it, thats what will be displayed making updating or adding fields much much easier. By bringing whatever data you need into memory you can put the analysis on the shoulders of powerful array-orriented data analysis tools like pandas, and with the use of pandas querries can be done in usually 1 line.
The explanation and code below relate to the table containing traffic fatality data in the link here.
The code below querries traffic fatalities to younger drivers, on the weekend, and between the hours of 10PM-2AM. Its worth noting huge long line operations like the one below are better to avoid just because their hard to read but I figured its worth showing how much easier it is to work in DataFrames with tabular data.
import pandas as pd
import numpy as np
from pipekml.points import *
fatalities=pd.read_csv('STSIFARS.csv')
#Querrying for young drivers, on the weekend, with crash hours between 10 P.M.-2 A.M.
fatalities=fatalities[((fatalities.VAR24C=='Weekend')&(fatalities.YOUNG_OLD=='Young Driver')&((fatalities.HOUR=='10:00pm-10:59pm')|(fatalities.HOUR=='11:00pm-11:59pm')|(fatalities.HOUR=='0:00am-0:59am')|(fatalities.HOUR=='1:00am-1:59am')))]
#taking fatalities dataframe adding the header row and taking the dataframe to a list
fatalities=[fatalities.columns.tolist()]+fatalities.values.tolist()
kml=make_points(fatalities,list=True,scale=.5)
parselist(kml,'young_drivers_weekend_night.kml')
Output Showing Fatalities related to Young Drivers, On the weekend, between 10PM-2AM
PipeKML gives you just what you need to display the data, nothing more. This enables you to write your own algorithms to deal with your own vector data. Also because your bringing data to memory then editing your not updating or editing any existing files. This makes it perfect for algorithm development/testing.
It also makes it super easy to slice data and display/package it in useful ways. The script below shows an example of what PipeKML can do in regards to viewing and pivoting data to view geospatially.
In other words the script below takes a table, querries out values I don't need, makes folders for each unique value in a field (hour in which crash occured), then displays different icons based on a value in another field. (crash causes)
from pipekml.points import *
import pandas as pd
import numpy as np
import itertools
#function that given a crash cause will return unique url
#i.e. for a list with causes and icon links to specific cause return link for a cause input
def get_icon(cause,iconmap):
for row in iconmap:
if str(row[0])==str(cause):
return row[1]
#working with a set of points and attributes (25 mb csv file)
#bringing into memory as a dataframe
fatalities=pd.read_csv('STSIFARS.csv')
#getting header values
header=fatalities.columns.values.tolist()
#querrying state field to west virginia
fatalities=fatalities[(fatalities.STANAME=='WEST VIRGINIA')]
#selecting only the counties were analyzing
fatalities=fatalities[((fatalities.CNTYNAME=='Cabell County')|(fatalities.CNTYNAME=='Kanawha County')|(fatalities.CNTYNAME=='Putnam County'))]
#pre-made icon map that maps each cause to a given url containg a png to be displayed
iconmap=[['Angle', 'http://maps.google.com/mapfiles/kml/pushpin/blue-pushpin.png'], ['Head-On', 'http://maps.google.com/mapfiles/kml/pushpin/grn-pushpin.png'], ['Not Collision with Motor Vehicle in Transport', 'http://maps.google.com/mapfiles/kml/pushpin/ltblu-pushpin.png'], ['Rear-End', 'http://maps.google.com/mapfiles/kml/pushpin/pink-pushpin.png'], ['Sideswipe', 'http://maps.google.com/mapfiles/kml/pushpin/purple-pushpin.png']]
#getting unique months crashes occured
uniquehours=np.unique(fatalities.HOUR).tolist()
totalkml=[]
#iterating through each unique hour block
for row in uniquehours:
#slicing each unique hour iterated through
slicedhours=fatalities[(fatalities.HOUR==str(row))]
#getting unique crash causes for each sliced hour interval
uniquecrashcauses=np.unique(slicedhours['VAR23C']).tolist()
temporarylist=[]
hours=str(row)
#iterating through each crash cause
for row in uniquecrashcauses:
#getting icon url from function made above
icon=get_icon(str(row),iconmap)
#slicing by crash the already sliced dataframe by crash cause
slicedcauses=slicedhours[(slicedhours.VAR23C==str(row))].values.tolist()
#adding a header to the output list
slicedcauses=[header]+slicedcauses
#sending into pipekml for output kml lines
kml=make_points(slicedcauses,list=True,icon=str(icon),scale=1)
kml=packagepoints(kml) #removing syntactic header and footer for folder integration
#adding the sliced causes list to the temporarylist
temporarylist+=kml
#packaging list into a newfolder
newfolder=folder(str(hours),'Crashes Occuring in hours: '+str(hours),temporarylist)
#adding the list to totalkml
totalkml+=newfolder
#Now that totalkml has all slices of unique hours folders containing unique icons for each crash cause
#we can now package and write to a file
totalkml=packagefinal('Traffic_Fatalities_Sliced_by_Hour',totalkml)
#writing to kml file
parselist(totalkml,'traffic_fatalities.kml')
Output of Code Above take note of Folders/Icons
Alignments Module
How It Works
Alignment data now supports raw ingestion, meaning one can simply input a list of csv file locations or a list into the make_route() function and it will make a route with the given alignment by parsing through the header (requiring a labeled header at the top of csv file) and finding each row positions for latitudes, longitudes, and elevations and then parsing said rows into geometric alignment or path data. The headers only require the text 'lat', 'long', or 'elev' in them to be parsed into geometric data meaning sometimes it may assume a row is alignment data when it is not. Being aware that alignment data sometimes is sometimes stored with two points per line the script also supports two points per line by assuming the alignment data field that comes first is the preceding point.
Another important update was made regarding file structure and module usage assumptions being that if you have exemplary/unused data stored on every line that is repitive and repeats throughout the csv alignment file this module will assume the repeating data will be added to the alignment data's kml data. In other words if a value repeats it will assume that attribute/field should be associated with the entire alignment data displayed in the popup box.
Alignments Module - Functions
make_route(filelocationlist[,segmentinfocsv,indexs,color,scale,folder,description])
parselist(total,filename)-given the function above and a file name (hopefully ending in .kml) writes kml file.
packageroutes(makerouteoutput)-takes the make_route() output and returns a list without the starting and ending wrapper of most csv files so they can be added together with subsequent list or in different folders. Use this if you want your directories in specific locations
packagefinal(nameinside,list)-replaces that syntatic sugar of the first few rows and last couple given an in filename for inside the kml and a list of rows that are xml/kml structured. essentially packageroutes/packagepoints and packagefinal are so structured so that if you take it out of its wrapper it must have packagefinal performed on the list before sending into parselist.
folder(nameoffolder,description,list)given an output of make_route or make_points() that has had its wrapper removed (packageroutes/packagepoints) returns a file director for a objects in the given list
All the kwarg argument, and argument defination below is for the make_route() function mostly.
- filelocationlist - list of csv file locations to ingest that contain desired road alignment data
Function arguments below are the ones that directly affect the way in which the data is handled.
segmentinfocsv - kwarg arg, csv file location containing information about all the desired alignments you wish to parse into kml (i.e. a csv file containg all the alignment data)
indexs - kwarg arg, unique identifier within row of segmentinfocsv so that the rows data can be drawn with the associated position in the filelocationlist and parsed into into data associated with the route from segmentinfocsv (i.e. list of keys to draw data from the segmentinfocsv file into associated alignment )
list=True - kwarg, technically just list is the kwarg arg but the only time you will use this arguement is if your inputting a raw list in where the csv file location normally should be.
Function arguments below are for aesthetics and nameing/description
- color - kwarg arg, currently can take inputs of red, light green, orange, and white can support much more but no way of gathering color ids in bulk (that I know of)
- scale - kwarg arg, adjusts the depth of your line I suggest about 5 for a line thats pretty easy to click but not to large for most applications
- folder - kwarg arg, inputs the folder name desired for the paths contained within the filelocationlist segments
- description - kwarg arg, inputs the description for the given folder name
Points Module
How It Works
Not unlike the alignment module the point module recently was updated to ingest csv file or lists with greater ease. It as you might have guessed uses the header in each row for data to be displayed with each point as well as the values in the rows that each represent a point. Meaning, its looking for both 'lat', 'long', and/or 'elevation' in each header value, it also accepts a 'location' header and value with the syntax '(lat, long)' or 'lat, long'. This module takes a csv file or listwith point rows and translates each row into a kml structure with data represent that row, differing from the alignment module because of the fact that in the points module one kml element is in each row of a table, but in the alignments module each table is one kml element itself. The function make_points() is used to make points from a respective csv file.
The main difference between this module and alignment module in the way it functions is that it can take a 'location' header with values commonly in the syntax '(latitude, longitude)' or 'latitude, longitude' and parse them into the KML correctly. I found this implementation helpful as a lot of files or tables simply have a 'LOCATION' header with the values placed in a string like the examples above.
Points Module - Functions
make_points(pointcsvfile[indexs,color,scale,folder,description,icon,icons])
parselist(total,filename)-given the function above and a file name (hopefully ending in .kml) writes kml file.
packageroutes(makepointsoutput)-takes the make_points() output and returns a list without the starting and ending wrapper of most csv files so they can be added together with subsequent list or in different folders. Use this if you want your directories in specific locations
packagefinal(nameinside,list)-replaces that syntatic sugar of the first few rows and last couple given an in filename for inside the kml and a list of rows that are xml/kml structured. essentially packageroutes/packagepoints and packagefinal are so structured so that if you take it out of its wrapper it must have packagefinal performed on the list before sending into parselist.
folder(nameoffolder,description,list)given an output of make_route or make_points() that has had its wrapper removed (packageroutes/packagepoints) returns a file director for a objects in the given list
All the kwarg argument, and argument defination below is for the make_points() function mostly.
- pointcsvfile - string containing file location of table to parse into a kml
The kwarg argument below 'indexs' should be used when you want to querry points based on a list of unique index that exist only in the desired row. This saves from having to write a new csv file and then take the new csv file and turn it into a kml by taking the intial csv and only grabbing the rows in the desired list of indexs. If your not to concerned with performence writing to a new csv file would be suggested just for ease of use and clarity.
- indexs - kwarg arg, given a list of unique keys will find unique row in list corresponding to each key and only write those rows to a csv file, basically querry values in the csv file without having to rewrite a new csvfile with just the desired rows.
- list=True - kwarg arg, technically just list is the kwarg arg but the only time you will use this arguement is if your inputting a raw list in where the csv file location normally should be.
Function arguments below are for aesthetics and nameing/description
- color - kwarg arg, currently can take inputs of red, light green, orange, and white can support much more but no way of gathering color ids in bulk (that I know of)
- scale - kwarg arg, adjusts the depth of your line I suggest about 5 for a line thats pretty easy to click but not to large for most applications
- folder - kwarg arg, inputs the folder name desired for the paths contained within the filelocationlist segments
- description - kwarg arg, inputs the description for the given folder name
- icon - kwarg arg, inputs the href or link for a photo of the icon wanting to be used please note only href icons are supported at this time, not the icon keys that default google icons uses. To get the href of the google default icons go here, then simply use that link for icon value. If icon argument is filled it will assume all points are desired with this icon.
- icons - kwarg arg, inputs a list of hrefs or links fora photo of an icon wanting to be used. This argument assumes that the size of the list of icons is the same as the size of the list of point rows in the csv file or the size of the list of the indexs to be written into the kml file. (i.e. the icons list size needs to be the same size as the list of indexs or the total points within the file.)
Blocks Module
The blocks module takes a list or csv file and parses it into a set of blocks from the four corners of each block that exist in each row. Its meant mainly to only support colors at this time but it could be good for visualizing clusters or other things. The blocks module assumes you have a structure like the one below for each block:
COUNT | LAT1 | LONG1 | LAT2 | LONG2 | LAT3 | LONG3 | LAT4 | LONG4 |
---|---|---|---|---|---|---|---|---|
1 | 41.295 | -87.832 | 41.31 | -87.832 | 41.295 | -87.817 | 41.31 | -87.817 |
An example of random colors generated from a high definition parition of Chicago.
The table below shows the default labels that can be used and the the actual image associated with them.
Label | Corresponding Image |
---|---|
yellow | |
white | |
red | |
pink | |
orange | |
light green | |
blue | |
light blue |
Now the "heatmap" table called using "heatmap=True" in kwargs uses a different approach right now it has 7 levels and its labels correspond to the following images.
Block Tier/Label | Corresponding Image |
---|---|
block1 | |
block2 | |
block3 | |
block4 | |
block5 | |
block6 | |
block7 |
Blocks Module - Functions
make_blocks(csvfile[color,colorfield,heatmap,list])-given a structure like the one you see above (4 lat/longs on per line) will take that in a csvfile form or list form and turn it in to corresponding kml lines
parselist(total,filename)-given the function above and a file name (hopefully ending in .kml) writes kml file.
packageblocks(makepointsoutput)-takes the make_blocks() output and returns a list without the starting and ending wrapper of most kml files so they can be added together with subsequent list or in different folders. Use this if you want your directories in specific locations
packagefinal(nameinside,list)-replaces that syntatic sugar of the first few rows and last couple given an in filename for inside the kml and a list of rows that are xml/kml structured. essentially packageroutes/packagepoints and packagefinal are so structured so that if you take it out of its wrapper it must have packagefinal performed on the list before sending into parselist.
folder(nameoffolder,description,list)given an output of make_route or make_points() that has had its wrapper removed (packageroutes/packagepoints/packageblocks) returns a file director for a objects in the given list
All the kwarg argument, and argument definition below is for the make_blocks() function mostly.
- list=True - kwarg arg, technically just list is the kwarg arg but the only time you will use this arguement is if your inputting a raw list in where the csv file location normally should be.
Function arguments below are for aesthetics and nameing/description
- color - kwarg arg, when this kwarg is given it assumes you want all the colors in the csv file or list given to be this color
- folder - kwarg arg, inputs the folder name desired for the paths contained within the filelocationlist segments
- description - kwarg arg, inputs the description for the given folder name
- colorfield - kwarg arg, assuming you have a field with colors each block should be give the its position in the row as an integer and it will process all the colors at once (the counterpart to the color argument)
- heatmap=True-boolean to turn the color map from the normal one above to the heatmap
Example 1 - Sample Dataset Provided
This example uses a geospatial dataset I've used extensively and just shows a small sample of some of the things I've done with it as well as some of the ways you can integrate the data into a kml file in a useful manner.
To view the example code please go to the directory and view the example1.py script.
Output from Example 1 Displayed Below (w/ MUTCD signage integration)
Example 2 - Public Dataset Pandas/matplotlib Integration
This example utilizes pandas and numpy as well as the modules that exist in this repository. This shows a simple imperitive script for how to break down structures for use with separation via folder directories. I think it shows that this module can be pretty useful for displaying geospatial data.
To view the example code please go to the directory and view the example2.py script.
Example 3 - Randomly Generating Colors on Earth
This example is simply to show a general over of the things that can be done with the block overlay.
See example3.py script here.
from pipekml.blocks import *
from random import randint
#colors blocks can be at this time
colors=['yellow','white','red','pink','orange','blue','light green','light blue']
#reading the squares into memory
data=read('squares.csv')
newlist=[data[0]]
#randomizing a color for each block and adding to the end of each row
for row in data[1:]:
#randomly generating a number corresponding to a row position in colors list
color=colors[randint(0,7)]
newrow=row+[color]
newlist.append(newrow)
#sending into make blocks
#note list=True argument if your not using a csv file
#the colorfield argument corresponds to the row position of the color column
#becaue we just added to the end we can use -1
a=make_blocks(newlist,list=True,colorfield=-1)
parselist(a,'random_colors.kml')
Output of Colors Generated Displayed Below
Gallery
Complex KML Structures (w/MUTCD signage integration)
Block Image Overlay Aggregation/Pivoting below Shows Crime Clusters in Chicago
Future Updates/Past Uses I've found
Plane Flight Trajectory for Metrojet Flight 9268 showing it most likely broke up in air.
Click here or image below for YouTube video.
Ordinal KML file representing a path from point A to point B with data associated with traffic aggregated.
Contribute/Feedback
If you like to give me feedback or contribute your more then welcome, if you have feedback or would like to collaborate on a new project email me at murphy214@live.marshall.edu.
History/Versions
This module was directed ported from Routing-KML-Maker on 11/25/15 the reason being I was afraid simply changing the repositories name would 404 links to the module so instead I just created a new one with a name that I felt fitting for what this module really does.
No huge structure or changes are planned just continued updates.
Version | Description of Update | Date |
---|---|---|
Version 1.0 | Ported Routing-KML-Maker | 11/24/15 |
Version 1.1 | Name change to module lower case | 11/29/15 |
Version 1.2 | Added more functionality | 12/5/15 |
Version 1.3 | Added Blocks module | 12/27/15 |
License Information
I use the standard apache license agreement from the little I've gathered on license's actually mean or if its even applicable to my project.
Want more information? Find detailed information about what each license means here.