Skip to content

SergeJohanns/square-countries

Repository files navigation

On the Squareness of Nations

This project was originally made to calculate which country's shape most resembles a square. The report generated by the last version shows the results.

Motivation

It was inspired by the Map Men video on the same topic, where they conclude that Egypt is the most square country. This conclusion seemed ridiculous to me, since although Egypt has two straight borders along the West and South, the North-Eastern parts are very unbalanced, and it certainly seems that other countries like Cote d'Ivor which have less straight borders but a more balanced box-like shape should be considered more square. The main problem I suspected was that their evaluation of "squareness" is not formally quantified, but instead they simply look at the shapes to determine how square they are. Hence, I used a quantitative approach to measure squareness, which is described further in Approach. This motivation backfired completely when it turned out Egypt really is the most square, even with my quantitative method, but results are results so I'm sharing this anyway.

Usage

Although the original goal was to test countries for squareness, the tool is easily extended to use shapes other than squares. See Shapes for more information.

The actual synopsis for the tool is

usage: Country Shape Tester [-h] [--report-output REPORT_OUTPUT] [--json-output JSON_OUTPUT] [--country-file COUNTRY_FILE] [--target-countries TARGET_COUNTRIES [TARGET_COUNTRIES ...]] [--shape {square}] [--method {basin-hop,dual-annealing}] [--tolerance TOLERANCE]

Program to test which country is best approximated by a certain shape.

options:
  -h, --help            show this help message and exit
  --report-output REPORT_OUTPUT
                        Path to write the markdown report to. (default is 'report')
  --json-output JSON_OUTPUT
                        File to write the resulting scores and optimal shape parameters to.
  --country-file COUNTRY_FILE
                        GeoJSON file containing the country shapes. (default is 'country_shapes.geojson')
  --target-countries TARGET_COUNTRIES [TARGET_COUNTRIES ...]
                        List of countries to test. (default is all countries in the file)
  --shape {square}      Name of shape to use, in snake_case. (default is 'square')
  --method {basin-hop,dual-annealing}
                        Optimization method to use: Dual Annealing (default) or Basin-Hopping (faster).
  --tolerance TOLERANCE
                        Tolerance for simplifying the country shapes, as a portion of the minimum of the width and height. (default is 0.01, 0 for no simplification)

The tool always outputs a report like the one in report.the --json-output option it can also output the results directly (name of the country, parameters of the optional shape, and error score).

You can give a different country file in GeoJSON format with the --country-file option, specify only some countries from the file you want results for with the --target-countries option, and choose which shape you want to fit on the countries with the --shape option.

The --method option lets you choose an optimization algorithm. Both Basin-Hopping and Dual Annealing are "global" optimization algorithms, which means they are chosen to be able to produce a relatively good estimate of the global best solution, instead of converging on some local optimum that may not be very good. This is important since countries with many disconnected parts essentially have many local optima separated by valleys where there is so much non-land any shape placed there would have an error of almost 100%. The Basin-Hopping algorithm runs somewhat faster, but the Dual Annealing algorithm tends to produce more reliable results, especially on small island nations.

The --tolerance option specifies the error tolerance for simplifying the country shapes with the Douglas-Peucker algorithm. The shapes are simplified since small details, especially on the coastline for detailed maps, can significantly slow down the calculations but do not significantly impact to what degree the country matches a shape. The tolerance specifies the maximum distance between the simplification and the actual shape at any point, relative to the minimum of the width and height of the country. For instance, if the rectangular bounding box of a country is wider than it is tall and the tolerance has the default value of 0.01, then the maximum error is 1% of the height of the bounding box.

Shapes

New shapes can be defined in the shapes directory. You can also see the Square class in square.py for reference.

Every shape should be defined by a class with the name of that shape in PascalCase, which should be in a separate file with the name of that shape in snake_case followed by .py. For instance, for an equilateral triangle this would be the class EquilateralTriangle in equilateral_triangle.py.

Every shape should be defined by an array of numbers, which are the variables to optimize over for the optimization algorithm. For the square, these numbers are:

  • the x-coordinate of the bottom-left point,
  • the y-coordinate of the bottom,
  • the side length of the square,
  • and the rotation of the square with respect to the midpoint (in degrees),

respectively. However, it does not matter what these numbers are or how many there are, as long as you use them consistently. This representation is used in the definition of the 3 required methods, explained in the next paragraph.

The shape class needs to implement 3 methods. The first two take as arguments the sides of the "bounding box" of the country's shape, so the leftmost x-coordinate minx, the lowest y-coordinate miny, the rightmost x-coordinate maxx, and the highest y-coordinate maxy in the shape, respectively.

  • first_guess(minx, miny, maxx, maxy) returns a first guess for the parameters, which should be values for the parameters that cover the country's entire bounding box, otherwise the optimizer may not converge on a good solution. For the square the x-coordinate for the bottom left point becomes minx and the y-coordinate becomes miny, the side length becomes the maximum of maxx - minx and maxy - miny and the rotation is 0.
  • get_bounds(minx, miny, maxx, maxy) returns a list of tuples that give the bounds on each of the shape's numbers so that the ranges cover the entire bounding box. For the square the bounds are (minx, maxx) for the x-coordinate of the bottom left point, (miny, maxy) for the y-coordinate, (0, min(maxx - minx, maxy - miny)) for the side length, and (0, 360) for the rotation.
  • from_parameters(parameters) takes the array of numbers used to define the shape, and returns a Geometry object from the shapely library that represents this shape. How this is done depends completely on what shape you are implementing. For the square the method is to calculate the positions of the four corner points, then create a Polygon object with the list of those points, and finally rotate it around the midpoint with the .rotate method. For simple polygons this process may be somewhat similar, but for more complex shapes it can look quite different. For instance, for circular shapes you may want to use the .buffer method on a Point, for shapes with two or more disconnected parts you may have to use a MultiPolygon object instead, and generally for more complex shapes it may be necessary to .union or .intersect multiple simpler shapes together.

To be sure you implemented the correct methods, you can make your class inherit the abstract Shape class in shape.py, but this is not a strict requirement. Be sure to check the existing shape classes for reference if you have doubts.

Approach

The dataset of country shapes was obtained from opendatasoft, although the tool can also accept other GeoJSON data sets.

The tool finds out how well a country matches a shape by measuring the "Jaccard distance" between the country and the shape, which is given by the area of the symmetric difference of the shapes (area in one shape but not both) divided by the area of the union of the shapes (area in either shape or both).

The Jaccard distance is used as the cost function for the optimization algorithm, which tries to optimize the parameters of the shape to minimize the Jaccard distance. Once the optimization is complete the distance from the "best fit" shape is taken as the measure of how much a country differs from the shape. The main idea here is that the more a country looks like e.g. a square, the closer it must be to the best fitting, or most similar, square.

Finally, these error scores and images of the country with the best fitting shape are output to the report, and optionally to a separate JSON file.

About

Python script to determine which country is shaped the most like a square

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages