Tag Clustering for Tag Maps


Keywords
clustering, datascience, landscape, perception, socialmedia, visual-analytics
License
GPL-3.0
Install
pip install tagmaps==0.22.34

Documentation

PyPI version pylint pipeline Conda Version Conda Pipeline Documentation

Tag Maps

Spatio-Temporal Tag and Photo Location Clustering for generating Tag Maps

Tag Maps are similar to Tag Clouds, but Tag Maps use the spatial information that is attached to geotagged photographs, in addition to tag frequency, to visualize tags on a map. This library uses the single-linkage tree that is available from HDBSCAN to cut trees at a specific, automatic or user-defined distance for all available tags in the given dataset. Afterwards, alpha shapes are generated as a means to allow 'soft' placement of tags on a map, according to their area of use. Two shapefiles are generated that can be used to visualize results, for example, in ESRI ArcGIS or Mapnik.

Tag Map Example

Based on the papers:

Dunkel, A. (2015). Visualizing the perceived environment using crowdsourced photo geodata. Landscape and Urban Planning, 142. DOI / PDF

Dunkel, A. (2016). Assessing the perceived environment through crowdsourced spatial photo content for application to the fields of landscape and urban planning. Thesis, TU Dresden Landscape and Environmental Planning. DOI / PDF

Dunkel, A. (2020). Tag Maps in der Landschaftsplanung. In book: Handbuch Methoden Visueller Kommunikation in der Räumlichen Planung. DOI

Overview of processing steps (Toronto High Park example):

  • a) individual photo locations (raw data)
  • b) photo locations combined to clusters
  • c) tag location clustering (HDBSCAN) and alpha-shape generation
  • d) soft placement of all relevant tag clusters using alpha shapes

Tag Map Example

The label placement based on descending importance is currently implemented in ArcGIS and Mapnik. See the folder resources for information regarding ArcGIS and a Jupyter Notebook for Mapnik. The following animation illustrates the ArcMap label placement algorithm for the TU Dresden Campus.

Label Placement Example

Installation

The recommended way to install the package is with conda install tagmaps -c conda-forge.

For a detailed guide to setup tagmaps package in Windows 10, see the documentation .

Documentation

See the tagmaps documentation for additional information, guides and tutorials. There is also an external API reference available.

Quickstart

  1. Clone resources folder somewhere locally
    • git clone https://github.com/Sieboldianus/TagMaps.git && cd TagMaps && git filter-branch --subdirectory-filter resources
  2. Place geotagged data in /01_Input sub-folder
    • information on how to structure data is available in the documentation
  3. Run tagmaps within folder resources. Output files will be saved to /02_Output
    • 2 shapefiles in auto-selected UTM projection, one containing all tag cluster and one with the overall location clusters
  4. Visualize shapefiles, e.g. using ESRI ArcGIS
    • download BasemapLayout_World.mxd from resources folder and replace missing links with 2 resulting shape-files in /02_Output
    • adjust minimum and maximum font sizes, weighting formula or other metrics to your needs.

Some background:

Tag Maps package can be used with any tagged & spatially referenced data, but it has been specifically developed with Social Media data in mind (Flickr, Twitter etc.). There are two ways to load input data:

  1. Unfiltered raw data

    • Use tagmaps.add_record(record) where record is of type PostStructure (see shared_structure.py)
    • How you clean up data totally depends on the type, have a look at LoadData class in load_data.py for Twitter and Flickr cleanup
  2. Filtered data

    • the result from 1 is a UserPostLocation (UPL), which is a reference of type 'CleanedPost'. A UPL means that all posts of a single user at a single coordinate are merged, e.g. a reduced list of terms, tags and emoji based on global occurrence (i.e. no duplicates).
  3. The filtered data that is used for tagmaps can be exported using tagmaps.write_cleaned_data(). Since this will remove all terms/tags/emoji that do not appear in the top 1000 (e.g.) occurring global list of terms, this will produce a highly pseudonymized set of information, with only collectively relevant terms remaining. The default value (1000) can be adjusted using the max_items argument, e.g. the smaller max_items, the higher is the effect of anonymization/generalization.

Code

The code has been completely refactored in January 2019, but there are still some missing pieces. Particularly the API (that is: import tagmaps) is still in an early stage. See method main() in main.py for examples on how to use tag maps package.

Resources

Contributors

Some future goals:

  • include topic modeling
  • improve automatic detection of general vs specific tags for an area (e.g. chi square)
  • improve unit testing (pytest) for tagmaps package
  • move from tkinter interface to browser based solution

Built With

This project includes and makes use of several other projects/libraries/frameworks:

Alpha Shapes Kevin Dwyer/ Sean Gillies Generating Concave Hull for Point Clouds

HDBSCAN McInnes, J. Healy, S. Astels - BSD licensed A high performance implementation of HDBSCAN clustering.

Shapely Manipulation and analysis of geometric objects

SciPy and Convex Hull Simple shapes for point clusters are generated using SciPy's excellent Convex Hull functions

Fiona OGR's neat and nimble API for Python programmers.

Mapnik Mapnik combines pixel-perfect image output with lightning-fast cartographic algorithms, backing OpenStreetMap

License

GNU GPLv3

Changelog & Download

This is a high-level summary of version progress. See CHANGELOG.md for a full list of changes.

2022-05-10: TagMaps v0.22.0

  • the project has finally migrated to a pyproject.toml-only based packaging system, as described in the declarative config (pyproject.toml)
  • the code structure now follows the src-layout.
  • fiona was pinned to 1.8.22 in conda until #213 (Windows installations only) is solved

2022-07-27: TagMaps v0.21.0

2021-02-22: TagMaps v0.20.10

  • fix emoji grapheme detection issue with emoji>=1.01
  • several fixes in cx-freeze build, re-compile with python 3.9

2020-01-24: TagMaps v0.20.4

  • mainly improvements of type annotations and code legibility
  • include type hints in api-docs
  • migration from namedtuples to (awesome) new dataclass from Python 3.7 (this is the minimum requirement from v0.20.2 onwards)
  • fix projection resulting in flipped geometries for pyproj>2.0.0
  • fix various other small bugs

2019-05-08: TagMaps v0.17.6

  • as of this version, tagmaps package is available on conda-forge
  • fixed a bug with newer versions of pyproj (>2.0.0) that would result in very slow projection performance

2019-03-08: TagMaps v0.17.4

  • First version of public API, e.g. load tagmaps to other packages with import tagmaps or from tagmaps import TagMaps
  • Refactor of LoadData and PrepareData in separate classes, use of contextmanager/ pipeline generator
  • Improved generation of Alpha Shapes
  • Basic system integration test pipeline
  • Jupyter Notebook compatibility

2019-01-23: TagMaps v0.11.1

  • complete refactor of code with improved encapsulation, code now largely follows PEP conventions
  • bugfix: emoji handling now accurately recognizes grapheme clusters consisting of multiple unicode codepoints.
  • interface: add feature to filter based on toplists for tags, emoji and locations
  • added sample CC-BY dataset

2018-01-31: TagMaps v0.9.2

  • because Tag Maps can be generated from local to regional to continental scale, finding an algorithm that fits all was not straight forward. The current implementation will produce shapes for all of these scales without any user input.
  • this alpha shape implementation is motivated from Kevin Dwyer/ Sean Gillies great base code
  • auto-projection from geographic to projected Coordinate System: select the most suitable UTM Zone for projecting data.

2018-01-17: TagMaps v0.9.1

  • first build with python
  • initial commit, still lots of unnecessary comments/code parts left in code

2010-03-30: TagMaps v0.0.1

  • first implementation of tagmaps concept in ArcGIS Model Builder