John Wiedenhöft 2025-01-22
Kite-square plots (Figure 1) are a convenient way to visualize contingency tables, uniting various quantities of interest (Table 1). They get their name for two reasons:
- If the variables are independent, the plot resembles a kite inside a square (Figure 1 (a)). The more dependent the variables are, the more the plot deviates from that shape (Figure 1 (b)). This allows the user to quickly grasp variable dependence visually.
- It rhymes with
$\chi^2$ , a measure of statistical dependence and statistic in the eponymous test, which is visualized directly in the plot as the area of so-called patches (Figure 4 (b)).
Figure 1: Kite-square plots for independent and dependent variables.
The R package kitesquare
implements these plots using ggplot2
. It is
available at https://github.com/HUGLeipzig/kitesquare.
The relationship between two categorical random variables, say
From either form, a number of interesting and statistically relevant quantities can be computed (Table 1).
Table 1: Different quantities derived from contingency tables.
quantity | unnormalized (counts) | normalized (probabilities, percentages) |
---|---|---|
marginal | ||
expected joint | ||
observed joint | ||
(observed) conditional |
Visualizing subsets of these quantities is easy. For instance, observed
quantities are often shown using heatmaps, with each cell representing a
unique combination of values of
Kite-square plots attempt to solve these issues, displaying all relevant quantities in a sensible way while minimizing visual clutter, and providing a gestalt from which the user can quickly grasp the degree of dependence between the variables.
The following sections explain the visual elements of a kite-square plot in detail.
The corners of the kite
(Figure 2 (a)) represent
the theoretical, expected joint probabilities of
The spars
(Figure 2 (b)) represent
the actual observed joint probabilities
Figure 2: Elements related to joint quantities.
In the case of independence, the points are exactly at the corners of
the kite, since
The square
(Figure 3 (a)) is
comprised if line segments intersecting the axes at the value of their
respective marginal counts or probabilities. For instance, the
corners of cell
The end points of the bars
(Figure 3 (b)) indicate
conditional probabilities
Figure 3: Elements related to conditional and marginal probabilities.
In the case of independence, the bars match the side of the square
perfectly, since in that case
Figure 4: Additional plot elements.
Note that the axis labels are colored according to the bars with which
they are associated. For clarity, kite-square plots have a colored point
at the intersections of bars and axes, representing marginal
probabilities/counts
(Figure 4 (a)); notice
that the intersections for
Intuitively, the discrepancy between the square and the bars provides a
measure of association between
with
we have
and hence
In other words, the edges of each patch represent the difference
between a expected (marginal) and observed conditional, and the area
represents the contribution of each cell to the total
Creating kite-square plots in R is easy:
kitesquare(df, X, Y, count)
The function kitesquare()
expects a contingency table as a data frame
or tibble df
in long form, i.e. one column for each variable
containing the different category labels, as well as a column contaning
counts (see Table 2
for the tables that generate
Figure 1). The second
and third arguments are the names of columns contaning the categories
for each variable. The fourth argument is the name of the count column.
The table may contain multiple lines per category combination; the
counts are added together in that case. Missing category combinations
are assumed to have a count of 0. The count column is optional; if none
is provided, the number of occurrences of each category combination is
assumed as counts instead.
Table 2: Contingency tables with counts for variables
(a) Independent variables
X | Y | count |
---|---|---|
A | U | 10 |
A | V | 15 |
B | U | 30 |
B | V | 45 |
(b) Dependent variables
X | Y | count |
---|---|---|
A | U | 30 |
A | V | 15 |
B | U | 30 |
B | V | 135 |
Individual plotting elements can be turned on and off be setting the following arguments to TRUE or FALSE:
kite
spars
square
chi2
bars_x
bars_y
bars
intersect_x
intersect_y
intersect
Axes can be labeled as percentages or counts by setting normalize
to
TRUE
or FALSE
, respectively.
For 2x2 tables, the kite-square plot is centered by default, i.e. the left and bottom axes are reversed so that the elements of each cell meet in the middle. This is not possible for variables wit more than two levels. The Boolean options
center_x
center_y
center
control whether
(Figure 5) or not
(Figure 6) centering
should be performed for binary
fill_x
fill_y
fill
kitesquare(df_2x4, X, Y, count, fill=TRUE)
kitesquare(df_2x4, X, Y, count, fill=TRUE, center=FALSE)
For details and further plotting options, please refer to the function
documentation using ?kitesquare
.