Skip to content

TST: Replace test datasets with pyogrio-generated files where possible #441

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Aug 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
159 changes: 149 additions & 10 deletions pyogrio/tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
from pathlib import Path
from zipfile import ZIP_DEFLATED, ZipFile

import numpy as np

from pyogrio import (
__gdal_version_string__,
__version__,
Expand Down Expand Up @@ -126,28 +128,165 @@ def naturalearth_lowres_vsi(tmp_path, naturalearth_lowres):


@pytest.fixture(scope="session")
def test_fgdb_vsi():
return f"/vsizip/{_data_dir}/test_fgdb.gdb.zip"
Comment on lines -129 to -130
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it matter that we don't have a direct replacement for a zipped FGB?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it matters. We have other tests that use the /vsizip/ interface for working with a zipped shapefile, which should be a reasonable proxy for zip files containing other formats.

def line_zm_file():
return _data_dir / "line_zm.gpkg"


@pytest.fixture(scope="session")
def test_gpkg_nulls():
return _data_dir / "test_gpkg_nulls.gpkg"
def curve_file():
return _data_dir / "curve.gpkg"


@pytest.fixture(scope="session")
def test_ogr_types_list():
return _data_dir / "test_ogr_types_list.geojson"
def curve_polygon_file():
return _data_dir / "curvepolygon.gpkg"


@pytest.fixture(scope="session")
def test_datetime():
return _data_dir / "test_datetime.geojson"
def multisurface_file():
return _data_dir / "multisurface.gpkg"


@pytest.fixture(scope="session")
def test_datetime_tz():
return _data_dir / "test_datetime_tz.geojson"
def test_gpkg_nulls():
return _data_dir / "test_gpkg_nulls.gpkg"


@pytest.fixture(scope="function")
def no_geometry_file(tmp_path):
# create a GPKG layer that does not include geometry
filename = tmp_path / "test_no_geometry.gpkg"
write(
filename,
layer="no_geometry",
geometry=None,
field_data=[np.array(["a", "b", "c"])],
fields=["col"],
)

return filename


@pytest.fixture(scope="function")
def list_field_values_file(tmp_path):
# Create a GeoJSON file with list values in a property
list_geojson = """{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": { "int64": 1, "list_int64": [0, 1] },
"geometry": { "type": "Point", "coordinates": [0, 2] }
},
{
"type": "Feature",
"properties": { "int64": 2, "list_int64": [2, 3] },
"geometry": { "type": "Point", "coordinates": [1, 2] }
},
{
"type": "Feature",
"properties": { "int64": 3, "list_int64": [4, 5] },
"geometry": { "type": "Point", "coordinates": [2, 2] }
},
{
"type": "Feature",
"properties": { "int64": 4, "list_int64": [6, 7] },
"geometry": { "type": "Point", "coordinates": [3, 2] }
},
{
"type": "Feature",
"properties": { "int64": 5, "list_int64": [8, 9] },
"geometry": { "type": "Point", "coordinates": [4, 2] }
}
]
}"""

filename = tmp_path / "test_ogr_types_list.geojson"
with open(filename, "w") as f:
_ = f.write(list_geojson)

return filename


@pytest.fixture(scope="function")
def nested_geojson_file(tmp_path):
# create GeoJSON file with nested properties
nested_geojson = """{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [0, 0]
},
"properties": {
"top_level": "A",
"intermediate_level": {
"bottom_level": "B"
}
}
}
]
}"""

filename = tmp_path / "test_nested.geojson"
with open(filename, "w") as f:
_ = f.write(nested_geojson)

return filename


@pytest.fixture(scope="function")
def datetime_file(tmp_path):
# create GeoJSON file with millisecond precision
datetime_geojson = """{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": { "col": "2020-01-01T09:00:00.123" },
"geometry": { "type": "Point", "coordinates": [1, 1] }
},
{
"type": "Feature",
"properties": { "col": "2020-01-01T10:00:00" },
"geometry": { "type": "Point", "coordinates": [2, 2] }
}
]
}"""

filename = tmp_path / "test_datetime.geojson"
with open(filename, "w") as f:
_ = f.write(datetime_geojson)

return filename


@pytest.fixture(scope="function")
def datetime_tz_file(tmp_path):
# create GeoJSON file with datetimes with timezone
datetime_tz_geojson = """{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": { "datetime_col": "2020-01-01T09:00:00.123-05:00" },
"geometry": { "type": "Point", "coordinates": [1, 1] }
},
{
"type": "Feature",
"properties": { "datetime_col": "2020-01-01T10:00:00-05:00" },
"geometry": { "type": "Point", "coordinates": [2, 2] }
}
]
}"""

filename = tmp_path / "test_datetime_tz.geojson"
with open(filename, "w") as f:
f.write(datetime_tz_geojson)

return filename


@pytest.fixture(scope="function")
Expand Down
45 changes: 32 additions & 13 deletions pyogrio/tests/fixtures/README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,28 @@
# Test datasets

## Natural Earth lowres
## Obtaining / creating test datasets

`naturalearth_lowres.shp` was copied from GeoPandas.
If a test dataset can be created in code, do that instead. If it is used in a
single test, create the test dataset as part of that test. If it is used in
more than a single test, add it to `pyogrio/tests/conftest.py` instead, as a
function-scoped test fixture.

If you need to obtain 3rd party test files:

- add a section below that describes the source location and processing steps
to derive that dataset
- make sure the license is compatible with including in Pyogrio (public domain or open-source)
and record that license below

Please keep the test files no larger than necessary to use in tests.

## FGDB test dataset
## Included test datasets

### Natural Earth lowres

`naturalearth_lowres.shp` was copied from GeoPandas.

`test_fgdb.gdb.zip`
Downloaded from http://trac.osgeo.org/gdal/raw-attachment/wiki/FileGDB/test_fgdb.gdb.zip
License: public domain

### GPKG test dataset with null values

Expand Down Expand Up @@ -75,15 +90,19 @@ NOTE: Reading boolean values into GeoPandas using Fiona backend treats those
values as `None` and column dtype as `object`; Pyogrio treats those values as
`np.nan` and column dtype as `float64`.

### GPKG test with MultiSurface

This was extracted from https://prd-tnm.s3.amazonaws.com/StagedProducts/Hydrography/NHDPlusHR/Beta/GDB/NHDPLUS_H_0308_HU4_GDB.zip
`NHDWaterbody` layer using ogr2ogr:

```bash
ogr2ogr test_mixed_surface.gpkg NHDPLUS_H_0308_HU4_GDB.gdb NHDWaterbody -where '"NHDPlusID" = 15000300070477' -select "NHDPlusID"
```
License: same as Pyogrio

### OSM PBF test

This was downloaded from https://github.com/openstreetmap/OSM-binary/blob/master/resources/sample.pbf

License: [Open Data Commons Open Database License (ODbL)](https://opendatacommons.org/licenses/odbl/)

### Test files for geometry types that are downgraded on read

`line_zm.gpkg` was created using QGIS to digitize a LineString GPKG layer with Z and M enabled. Downgraded to LineString Z on read.
`curve.gpkg` was created using QGIS to digitize a Curve GPKG layer. Downgraded to LineString on read.
`curvepolygon.gpkg` was created using QGIS to digitize a CurvePolygon GPKG layer. Downgraded to Polygon on read.
`multisurface.gpkg` was created using QGIS to digitize a MultiSurface GPKG layer. Downgraded to MultiPolygon on read.

License: same as Pyogrio
Binary file added pyogrio/tests/fixtures/curve.gpkg
Binary file not shown.
Binary file not shown.
Binary file added pyogrio/tests/fixtures/line_zm.gpkg
Binary file not shown.
Binary file added pyogrio/tests/fixtures/multisurface.gpkg
Binary file not shown.
Binary file not shown.
7 changes: 0 additions & 7 deletions pyogrio/tests/fixtures/test_datetime.geojson

This file was deleted.

8 changes: 0 additions & 8 deletions pyogrio/tests/fixtures/test_datetime_tz.geojson

This file was deleted.

Binary file removed pyogrio/tests/fixtures/test_fgdb.gdb.zip
Binary file not shown.
18 changes: 0 additions & 18 deletions pyogrio/tests/fixtures/test_nested.geojson

This file was deleted.

12 changes: 0 additions & 12 deletions pyogrio/tests/fixtures/test_ogr_types_list.geojson

This file was deleted.

Loading