Skip to content

Reimplement emd #48

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
May 6, 2025
Merged

Reimplement emd #48

merged 11 commits into from
May 6, 2025

Conversation

ghar1821
Copy link
Contributor

@ghar1821 ghar1821 commented Apr 16, 2025

Describe your changes

Reimplemented EMD. Current implementation relies on cytonormpy. We no longer need it after this changes.

Also added EMD mean and max computed across all donors - probably need a better name.

Checklist before requesting a review

  • I have performed a self-review of my code

  • Check the correct box. Does this PR contain:

    • Breaking changes
    • New functionality
    • Major changes
    • Minor changes
    • Bug fixes
  • Proposed changes are described in the CHANGELOG.md

  • CI Tests succeed and look good!

@ghar1821 ghar1821 requested a review from LuLeom April 16, 2025 23:01
@ghar1821 ghar1821 marked this pull request as ready for review April 16, 2025 23:01
# A relatively short label, used when rendering visualisarions (required)
label: EMD Mean
label: EMD Mean CT
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LuLeom I expanded the return values to 4. So now we have mean and max calculated across all cell types, then mean and max across donors (see EMD Mean DN). See comment on the implementation in script.py file. I'll explain there.

emd_mean_ct = np.nanmean(emd_per_donor_per_ct.drop(columns=['cell_type', 'donor']).values)
emd_max_ct = np.nanmax(emd_per_donor_per_ct.drop(columns=['cell_type', 'donor']).values)

emd_mean_dn = np.nanmean(emd_per_donor_all_ct.drop(columns=['cell_type', 'donor']).values)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the mean across donors is essentially the emd for all cell types then averaged across donors. I figured this will give a different emd information from emd_mean_ct in that emd_mean_ct is mean emd across the different cell types regardless of donors. While emd_mean_dn is mean emd across different donors regardless of cell types. I think it is a bit too complicated and we may have to think whether it makes sense to split it this way.

@ghar1821 ghar1821 merged commit dc032fd into main May 6, 2025
4 checks passed
@ghar1821 ghar1821 mentioned this pull request May 6, 2025
@rcannood rcannood deleted the reimplement_emd branch May 22, 2025 10:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant