Skip to content

DataFrame -> ESRI Shapefile: UTF-8/16 mangled to ?????????. DataFrame -> CSV: UTF-8/16 mangled to Latin-1 characters #99

@alecStewart1

Description

@alecStewart1

Hello!

Firstly, thank you for this package!

At work, we do a lot of stuff with Esri and we deal with shape files a lot. Initially I've read in a CSV file and a Shapefile into 2 different dataframes and combined them with vcat. The CSV file is from calculated the centroids for a polygon on a layer we have on ArcGIS, the shapefile contains points from another source:

import GeoDataFrames as GDF

centroid_df = GDF.read("/home/my-user/centroids.csv")
point_df = GDF.read("/home/my-user/points.shp")

combined_df = vcat(centroid_df, point_df, cols=:union)

GDF.write("/home/my-user/combined_points.shp", combined_df)

This does create a valid shapefile, but any columns that contain rows with items that are in or have Mandarin or Cyrillic script are shown as ????????? or "?????????" whenever I load the new combined shapefile with GeoDataFrames or into ArcGIS.

This is similar to writing to a CSV file, even with options=Dict("bom"=>"true"), in that Mandarin and Cyrillic script characters are mangled to seemingly Latin-1 characters:

# same dataframes as above

GDF.write("/home/my-user/combined_points.csv", combined_df, options=Dict("bom"=>"true"))

Is there an option I can pass to the driver for shapefiles, is there something I'm missing for both drivers, or is there something else I can do?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions