Skip to content

[WIP] shared perimeter-weighted contiguity #507

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 11 commits into from

Conversation

knaaptime
Copy link
Member

@knaaptime knaaptime commented Jan 12, 2023

supercedes #506 ; resolves #80

@knaaptime knaaptime requested a review from sjsrey January 12, 2023 00:10
@knaaptime knaaptime changed the title "rebase" onto latest shared perimeter-weighted contiguity Jan 12, 2023
@knaaptime
Copy link
Member Author

knaaptime commented Jan 12, 2023

@sjsrey the last commit implements the logic we talked about today. I can add a test case, or we could have you sanity-check again?

also not too sure what that new kwarg should be called

knaaptime and others added 3 commits January 12, 2023 08:26
Co-authored-by: Martin Fleischmann <martin@martinfleischmann.net>
Co-authored-by: Martin Fleischmann <martin@martinfleischmann.net>
Co-authored-by: Martin Fleischmann <martin@martinfleischmann.net>
@knaaptime knaaptime changed the title shared perimeter-weighted contiguity [WIP] shared perimeter-weighted contiguity Jan 13, 2023
@knaaptime
Copy link
Member Author

ive got a notebook to go with this, but running into issues when i create a perimeter-weighted W with use_index=True after setting a meaningful index. I think the issue has to do with how ids are handled in to_adjlist but still investigating

cc: @martinfleis @ljwolf

@knaaptime
Copy link
Member Author

knaaptime commented Jan 13, 2023

import geopandas as gpd
from libpysal.examples import load_example
from libpysal.weights import Rook

us = gpd.read_file(examples.get_path("us48.shp"))

old = Rook.from_dataframe(us)
new = Rook.from_dataframe(us.set_index('STATE_FIPS'), use_index=True)

Screenshot 2023-01-13 at 2 45 08 PM

Screenshot 2023-01-13 at 2 45 16 PM

i thnk the neighbor relationships here are correct, but the indices get screwed up (which causes issues for the perimeter weights, as they rely on a join)

@martinfleis
Copy link
Member

Did I mess up #477?

@knaaptime
Copy link
Member Author

cant say for certain, but i think all of that is ok. I think we may just need to update the to_adjlist method

@knaaptime
Copy link
Member Author

im not sure whether its the to_adjlist method to blame, actually. I can get those tables above to match if i reset the index prior to return in the current to_adjlist method, but the problem persists

this notebook runs fine as-is. But, if i explicitly set the index to something like STATE_FIPS in cell 25, then use_index=True with the perimeter weights, the code will run fine, but will produce incorrect results. Still not sure why

@sjsrey
Copy link
Member

sjsrey commented Jan 16, 2023

It could be due to to_adjlist relying on id_order

names = np.asarray(self.id_order)

which we have not yet deprecated.

@sjsrey
Copy link
Member

sjsrey commented Jan 16, 2023

Maybe changing:

focal_ix, neighbor_ix = self.sparse.nonzero()
names = np.asarray(self.id_order)
focal = names[focal_ix]
neighbor = names[neighbor_ix]
weights = self.sparse.data
adjlist = pandas.DataFrame(
{focal_col: focal, neighbor_col: neighbor, weight_col: weights}

to

        focal_ix, neighbor_ix = self.sparse.nonzero()
        weights = self.sparse.data
        adjlist = pandas.DataFrame(
            {focal_col: focal_ix, neighbor_col: neighbor_ix, weight_col: weights}

?

@sjsrey
Copy link
Member

sjsrey commented Jan 16, 2023

I think the to_adjlist issue is sorted with #510:

image

@knaaptime
Copy link
Member Author

that gets the alignment back in shape so the weights are correct, but we lose the index itself (in the updated example 'focal' is an integer sequence, not fips codes)

@sjsrey
Copy link
Member

sjsrey commented Jan 16, 2023

This is because to_adjlist is grabbing the indices from the w.sparse attribute, which only has integer indices.

We also don't have a way back to the df from new once new = Rook.from_dataframe(us.set_index('STATE_FIPS'), use_index=True) is created. We are going to deprecate id_order which currently has the values, so we can't/shouldn't use that.

We could get the STATE_FIPS values out from calling new.full() but that has a smell to it as it defeats the purpose of using the sparse attribute inside of to_adjlist.

In to_adjlist we could jettison focal_ix, neighbor_ix = self.sparse.nonzero()

and instead do something like:

focal_ix = []
neighbor_ix = []
for key, value in new.neighbors.items():
    focal_ix.extend([key] * len(value))
    neighbor_ix.extend(value)

But I think we should get @ljwolf's view on these issues.

@sjsrey sjsrey requested a review from ljwolf January 17, 2023 00:47
@knaaptime
Copy link
Member Author

got it

so in the new api, we take the ids kwarg and deprecate the idvariable and id_order. But then we never expose W.ids publicly. I'm not sure whether that was intentional, but it means when we go back and forth from sparse, we have touble redoing the logic e.g. here. We could use use w.neighbors.keys() in place of id_order there since we know the dict is sorted, (but in that case are we keeping id_order_set or some other indicator that we could test against? otherwise we need to update the dict blindly even if unnnecessary)

@sjsrey
Copy link
Member

sjsrey commented Jan 17, 2023

got it

so in the new api, we take the ids kwarg and deprecate the idvariable and id_order. But then we never expose W.ids publicly. I'm not sure whether that was intentional, but it means when we go back and forth from sparse, we have touble redoing the logic e.g. here. We could use use w.neighbors.keys() in place of id_order there since we know the dict is sorted, (but in that case are we keeping id_order_set or some other indicator that we could test against? otherwise we need to update the dict blindly even if unnnecessary)

This was a good catch, as it raises a number of issues that are related to the planned deprecation.

@knaaptime
Copy link
Member Author

if we are getting rid of id_order_set, and we've decided that exposing W.ids is redundant, i guess when roundtripping to sparse, we could do a simple test with something like list(w.neighbors.keys()) == list(range(len(w.neighbors)) and remap ids if false?

@sjsrey
Copy link
Member

sjsrey commented Jan 17, 2023

import geopandas as gpd
from libpysal.examples import load_example
from libpysal.weights import Rook

us = gpd.read_file(examples.get_path("us48.shp"))

old = Rook.from_dataframe(us)
new = Rook.from_dataframe(us.set_index('STATE_FIPS'), use_index=True)
Screenshot 2023-01-13 at 2 45 08 PM Screenshot 2023-01-13 at 2 45 16 PM

i thnk the neighbor relationships here are correct, but the indices get screwed up (which causes issues for the perimeter weights, as they rely on a join)

to_adjlist sorts before returning. If it did not sort, the returned df would align with what you are expecting:

   focal neighbor  weight
0      53       16     1.0
1      53       41     1.0
2      30       38     1.0
3      30       46     1.0
4      30       56     1.0
..    ...      ...     ...
205    12       01     1.0
206    12       13     1.0
207    26       55     1.0
208    26       18     1.0
209    26       39     1.0

We could add a kw argument to implement this.

@knaaptime
Copy link
Member Author

i think we also need to store (and expose publicly) the ids property. We will need that as a mapping any time we roundtrip through a WSP

@knaaptime
Copy link
Member Author

i think #511 solves the core issue, so this is probably ready as well.

@martinfleis martinfleis changed the base branch from master to main February 27, 2023 08:45
@ljwolf
Copy link
Member

ljwolf commented Dec 10, 2023

To make sure, this has already landed in libpysal.graph, right? This should be closed if I'm reading that correctly, imho, since we should probably freeze new features in libpysal.weights.

@knaaptime
Copy link
Member Author

yup, this is stale

@knaaptime knaaptime closed this Dec 10, 2023
)

# Putting it back to a matrix
if perimeter_standardize:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ljwolf do we want to keep the option to standardize when the boundary isnt exhausted? (so the denom is boundary_i instead of \sum{sharedboundary_ij})?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: shared perimeter contiguity weighting.
4 participants