Distant Viewing Toolkit for the Analysis of Visual Culture
The Distant Viewing Toolkit is a Python package that facilitates the computational analysis of still and moving images. The most recent version of the package focuses on providing a minimal set of functions that require only a small set of dependencies. Examples of how to make use of the toolkit are given in the following section.
For more information about setting up the toolkit on your own machine, please see INSTALL.md. More information about the toolkit and project is available on the following pages:
- Example analysis using aggregated metadata: "Visual Style in Two Network Era Sitcoms"
- Theory of the project: "Distant Viewing: Analyzing Large Visual Corpora."
- Software Whitepaper: A Python Package for the Analysis of Visual Culture
If you have any trouble using the toolkit, please open a GitHub issue. If you have additional questions or are interested in collaborating, please contact us at tarnold2@richmond.edu and ltilton@richmond.edu.
Example Usage
To use the toolkit on a still image, we first use the load_image
function to load the image in Python. Then, we create an annotation
model; below we'll use an annotation that detects and identifies
faces. Finally, we apply the annotation to the image and save the
results.
import dvt
img = dvt.load_image("input/obama2.jpg")
anno_face = dvt.AnnoFaces()
out_face = anno_face.run(img, visualize=True)
The object out_face
is a dictionary with elements that tell us
about the detected faces. These have been designed so that they
can be easily converted to a Panda's data frame, as follows.
import pandas as pd
pd.DataFrame(out_face['boxes'])
face_id x xend y yend prob
0 0 992 1112 292 458 1.000000
1 1 631 749 237 397 1.000000
2 2 422 589 232 470 0.999998
3 3 1124 1247 161 330 0.999997
4 4 1161 1278 719 861 0.921809
The algorithm has detected five faces, four with a very high confidence
and a fifth with a reasonable level of confidence. We can look at the output
of the algorithm by saving the annotated image using the save_image
function.
dvt.save_image("faces.png", out_face['img'])
Which produces an image like this:
You can see that the algorithm has correctly identified four faces, but that the fifth is actually a shadow. While many computer vision algorithms will show you the "perfect" examples that seem to work without errors, keep in mind that while good, the algorithms still make frequent errors.
Another analysis that we can do with the face detection annoations is use the output to identify the individuals in an image. Let's run the annotation over another image consisting of a portrait of President Obama. We can then see how close this face is to those detected in the family photo.
import numpy as np
img_portrait = dvt.load_image("input/obama1.jpg")
out_portrait = anno_face.run(img_portrait)
np.sum(out_portrait['embed'][0] * out_face['embed'], axis=1)
array([-0.05723718, 0.04770625, 0.86795366, 0.11890081, 0.05184552],
dtype=float32)
You can see that the closest image, by far, is the third face. Looking back at the metadata, the third face has the lowest x value, and therefore (correctly) identifies Obama as being on the far left of the frame.
Other Annotations and Inputs
There are currently four different annotations that extract information from a still image. These work just like the face annotation above and, include the following:
anno_keypoints = dvt.AnnoKeypoints()
anno_segment = dvt.AnnoSegmentation()
anno_embed = dvt.AnnoEmbed()
anno_face = dvt.AnnoFaces()
There is also a special annotation type that takes a path to a video file and estimates the location of the video shots.
anno_breaks = dvt.AnnoShotBreaks()
out_breaks = anno_breaks.run("input/tm_short.mp4")
Finally, there is also a helper function extract frames from a video file. The example below shows how to select every 25th frame from a video file and applies the face annotation to each frame.
anno_face = dvt.AnnoFaces()
output = []
for img, frame, msec in dvt.yield_video("input/tm_short.mp4"):
if (frame % 25) == 0:
anno = anno_face.run(img, visualize=True)
if anno:
anno['frame'] = frame
output += [anno]
We are working on longer tutorials that we hope to release in early 2023. These will be posted here as soon as they are ready.
The Distant Viewing Toolkit is supported by the National Endowment for the Humanities through a Digital Humanities Advancement Grant.
Citation
If you make use of the toolkit in your work, please cite the relevant papers describing the tool and its application to the study of visual culture:
@article{,
title = "Distant Viewing: Analyzing Large Visual Corpora",
author = "Arnold, Taylor B and Tilton, Lauren",
journal = "Digital Scholarship in the Humanities",
year = "2019",
doi = "10.1093/digitalsh/fqz013",
url = "http://dx.doi.org/10.1093/digitalsh/fqz013"
}
@article{,
title = "Visual Style in Two Network Era Sitcoms",
author = "Arnold, Taylor B and Tilton, Lauren and Berke, Annie",
journal = "Cultural Analytics",
year = "2019",
doi = "10.22148/16.043",
url = "http://dx.doi.org/10.22148/16.043"
}
Contributing
Contributions, including bug fixes and new features, to the toolkit are welcome. When contributing to this repository, please first discuss the change you wish to make via a GitHub issue or email with the maintainers of this repository before making a change. Small bug fixes can be given directly as pull requests.