This project was originally made to calculate which country's shape most resembles a square. The report generated by the last version shows the results.
It was inspired by the Map Men video on the same topic, where they conclude that Egypt is the most square country. This conclusion seemed ridiculous to me, since although Egypt has two straight borders along the West and South, the North-Eastern parts are very unbalanced, and it certainly seems that other countries like Cote d'Ivor which have less straight borders but a more balanced box-like shape should be considered more square. The main problem I suspected was that their evaluation of "squareness" is not formally quantified, but instead they simply look at the shapes to determine how square they are. Hence, I used a quantitative approach to measure squareness, which is described further in Approach. This motivation backfired completely when it turned out Egypt really is the most square, even with my quantitative method, but results are results so I'm sharing this anyway.
Although the original goal was to test countries for squareness, the tool is easily extended to use shapes other than squares. See Shapes for more information.
The actual synopsis for the tool is
usage: Country Shape Tester [-h] [--report-output REPORT_OUTPUT] [--json-output JSON_OUTPUT] [--country-file COUNTRY_FILE] [--target-countries TARGET_COUNTRIES [TARGET_COUNTRIES ...]] [--shape {square}] [--method {basin-hop,dual-annealing}] [--tolerance TOLERANCE]
Program to test which country is best approximated by a certain shape.
options:
-h, --help show this help message and exit
--report-output REPORT_OUTPUT
Path to write the markdown report to. (default is 'report')
--json-output JSON_OUTPUT
File to write the resulting scores and optimal shape parameters to.
--country-file COUNTRY_FILE
GeoJSON file containing the country shapes. (default is 'country_shapes.geojson')
--target-countries TARGET_COUNTRIES [TARGET_COUNTRIES ...]
List of countries to test. (default is all countries in the file)
--shape {square} Name of shape to use, in snake_case. (default is 'square')
--method {basin-hop,dual-annealing}
Optimization method to use: Dual Annealing (default) or Basin-Hopping (faster).
--tolerance TOLERANCE
Tolerance for simplifying the country shapes, as a portion of the minimum of the width and height. (default is 0.01, 0 for no simplification)
The tool always outputs a report like the one in report.the
--json-output
option it can also output the results directly (name of the
country, parameters of the optional shape, and error score).
You can give a different country file in GeoJSON format with the
--country-file
option, specify only some countries from the file you want
results for with the --target-countries
option, and choose which shape you
want to fit on the countries with the --shape
option.
The --method
option lets you choose an optimization algorithm. Both
Basin-Hopping and Dual Annealing are "global" optimization algorithms, which
means they are chosen to be able to produce a relatively good estimate of the
global best solution, instead of converging on some local optimum that may not
be very good. This is important since countries with many disconnected parts
essentially have many local optima separated by valleys where there is so much
non-land any shape placed there would have an error of almost 100%. The
Basin-Hopping algorithm runs somewhat faster, but the Dual Annealing algorithm
tends to produce more reliable results, especially on small island nations.
The --tolerance
option specifies the error tolerance for simplifying the
country shapes with the Douglas-Peucker algorithm. The shapes are simplified
since small details, especially on the coastline for detailed maps, can
significantly slow down the calculations but do not significantly impact to what
degree the country matches a shape. The tolerance specifies the maximum distance
between the simplification and the actual shape at any point, relative to the
minimum of the width and height of the country. For instance, if the rectangular
bounding box of a country is wider than it is tall and the tolerance has the
default value of 0.01, then the maximum error is 1% of the height of the
bounding box.
New shapes can be defined in the shapes directory. You can also see
the Square
class in square.py
for reference.
Every shape should be defined by a class with the name of that shape in
PascalCase, which should be in a separate file with the name of that shape in
snake_case followed by .py
. For instance, for an equilateral triangle this
would be the class EquilateralTriangle
in equilateral_triangle.py
.
Every shape should be defined by an array of numbers, which are the variables to optimize over for the optimization algorithm. For the square, these numbers are:
- the x-coordinate of the bottom-left point,
- the y-coordinate of the bottom,
- the side length of the square,
- and the rotation of the square with respect to the midpoint (in degrees),
respectively. However, it does not matter what these numbers are or how many there are, as long as you use them consistently. This representation is used in the definition of the 3 required methods, explained in the next paragraph.
The shape class needs to implement 3 methods. The first two take as arguments
the sides of the "bounding box" of the country's shape, so the leftmost
x-coordinate minx
, the lowest y-coordinate miny
, the rightmost x-coordinate
maxx
, and the highest y-coordinate maxy
in the shape, respectively.
first_guess(minx, miny, maxx, maxy)
returns a first guess for the parameters, which should be values for the parameters that cover the country's entire bounding box, otherwise the optimizer may not converge on a good solution. For the square the x-coordinate for the bottom left point becomesminx
and the y-coordinate becomesminy
, the side length becomes the maximum ofmaxx - minx
andmaxy - miny
and the rotation is 0.get_bounds(minx, miny, maxx, maxy)
returns a list of tuples that give the bounds on each of the shape's numbers so that the ranges cover the entire bounding box. For the square the bounds are(minx, maxx)
for the x-coordinate of the bottom left point,(miny, maxy)
for the y-coordinate,(0, min(maxx - minx, maxy - miny))
for the side length, and(0, 360)
for the rotation.from_parameters(parameters)
takes the array of numbers used to define the shape, and returns aGeometry
object from the shapely library that represents this shape. How this is done depends completely on what shape you are implementing. For the square the method is to calculate the positions of the four corner points, then create aPolygon
object with the list of those points, and finally rotate it around the midpoint with the.rotate
method. For simple polygons this process may be somewhat similar, but for more complex shapes it can look quite different. For instance, for circular shapes you may want to use the.buffer
method on aPoint
, for shapes with two or more disconnected parts you may have to use aMultiPolygon
object instead, and generally for more complex shapes it may be necessary to.union
or.intersect
multiple simpler shapes together.
To be sure you implemented the correct methods, you can make your class inherit
the abstract Shape
class in shape.py
, but this is not a strict requirement.
Be sure to check the existing shape classes for reference if you have doubts.
The dataset of country shapes was obtained from opendatasoft, although the tool can also accept other GeoJSON data sets.
The tool finds out how well a country matches a shape by measuring the "Jaccard distance" between the country and the shape, which is given by the area of the symmetric difference of the shapes (area in one shape but not both) divided by the area of the union of the shapes (area in either shape or both).
The Jaccard distance is used as the cost function for the optimization algorithm, which tries to optimize the parameters of the shape to minimize the Jaccard distance. Once the optimization is complete the distance from the "best fit" shape is taken as the measure of how much a country differs from the shape. The main idea here is that the more a country looks like e.g. a square, the closer it must be to the best fitting, or most similar, square.
Finally, these error scores and images of the country with the best fitting shape are output to the report, and optionally to a separate JSON file.