Skip to content

Add test cases #227

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 61 commits into
base: develop
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
84ec619
WIP PDC historical data
sudan45 Apr 7, 2025
d5b1623
PDC historical extraction and chunking
sudan45 Apr 9, 2025
6c0fbea
Add extraction and transformation queue
sudan45 Apr 10, 2025
ae10752
Class based for gdacs.
Rup-Narayan-Rajbanshi Mar 27, 2025
9c36613
Merge pull request #243 from IFRCGo/feature/pdc-historical-data
Rup-Narayan-Rajbanshi Apr 10, 2025
49d932b
Merge pull request #244 from IFRCGo/feature/class-based-gdacs
sudan45 Apr 10, 2025
4d517ca
Use new geocoder
tnagorra Mar 28, 2025
bd7cb9d
amend! Use new geocoder
tnagorra Mar 28, 2025
3258287
amend! Use new geocoder
tnagorra Mar 28, 2025
6c2e4a9
amend! Use new geocoder
tnagorra Mar 28, 2025
8a99fe0
amend! amend! Use new geocoder
tnagorra Mar 28, 2025
633febc
Fix emdat after rebase.
Rup-Narayan-Rajbanshi Apr 11, 2025
c4fd024
Merge pull request #249 from IFRCGo/fix/historical-gdacs-pdc-geocoding
sudan45 Apr 11, 2025
bd61008
Chucking glide historical data
sudan45 Apr 2, 2025
521f0cb
Fix emdat historical data.
Rup-Narayan-Rajbanshi Apr 21, 2025
07b5d76
Fix emdat historical extraction.
Rup-Narayan-Rajbanshi Apr 22, 2025
ae78590
Merge pull request #251 from IFRCGo/fix/emdat-historical-extraction
sudan45 Apr 24, 2025
5a623f0
Rate limit for usgs
sudan45 Apr 11, 2025
5116568
fix(celery): Replace hardcoded queue names with variables
thenav56 Apr 12, 2025
62a7a9a
fix(usgs): Minor fixes
thenav56 Apr 13, 2025
f83847b
feat(logging): Show more info in logging
thenav56 Apr 13, 2025
75251aa
feat(model): Add ON_RETRY for ExtractionData Status
thenav56 Apr 13, 2025
c161f6a
feat(celery): Add RetryableTask
thenav56 Apr 13, 2025
1f411f9
refactor(usgs): Move extraction logic to handler
thenav56 Apr 13, 2025
fd5e0d3
feat(helm): Add queueDefaultResources
thenav56 Apr 13, 2025
95bf639
feat(extract): Add NoDataException support in BaseClass
thenav56 Apr 14, 2025
ed85e45
Handle end date and remove nodataexpection
sudan45 Apr 17, 2025
37f6a18
Fix issue for losses if not exists in pdc.
Rup-Narayan-Rajbanshi Apr 25, 2025
9150cbd
Merge pull request #250 from IFRCGo/feature/rate-limit-for-usgs
Rup-Narayan-Rajbanshi Apr 25, 2025
b684add
Update ifrc base extraction
sudan45 Apr 28, 2025
2b58860
Update ibtracs base extraction
sudan45 Apr 29, 2025
b299390
Update pagination limit
sudan45 Apr 29, 2025
82f7bca
Merge pull request #257 from IFRCGo/feature/ibtracs-base-extraction
Rup-Narayan-Rajbanshi Apr 30, 2025
158487a
Refactor the IDU source extraction based on version2
ranjan-stha Apr 29, 2025
f3d6367
Set add queue to False
ranjan-stha Apr 30, 2025
1d52a3b
Merge pull request #256 from IFRCGo/feature/ifrc-dref-base-extraction
Rup-Narayan-Rajbanshi Apr 30, 2025
bce9523
Refactor the GIDD source extraction based on version2
ranjan-stha Apr 29, 2025
6a1a372
Set add queue to False
ranjan-stha Apr 30, 2025
60b7fdd
Merge pull request #259 from IFRCGo/refactor-gidd-extraction-v2
Rup-Narayan-Rajbanshi Apr 30, 2025
d76aedd
Merge pull request #258 from IFRCGo/refactor-idu-extraction-v2
Rup-Narayan-Rajbanshi Apr 30, 2025
bc95b76
Refactor glide extraction, transform to get triggered by extraction_id.
Rup-Narayan-Rajbanshi Apr 28, 2025
600bc71
Refactor emdat extraction to get triggered by extraction id.
Rup-Narayan-Rajbanshi Apr 28, 2025
b824273
Refactor glide and emdat extraction.
Rup-Narayan-Rajbanshi Apr 29, 2025
01fd56f
Update PDC extractions
sudan45 Apr 27, 2025
047b38a
Update pdc extraction
sudan45 Apr 28, 2025
b83eddd
Add pydantic object in metadata
sudan45 Apr 30, 2025
2d72e01
Refactor extraction and transform method
sudan45 Apr 30, 2025
031f8d7
Update desiventar extraction
sudan45 Apr 30, 2025
2b55046
Update desiventar extraction and pydantic object
sudan45 Apr 30, 2025
7f270a7
Refactor the GFD source extraction based on version2
ranjan-stha Apr 30, 2025
ff4317e
Handle the rate limit exception
ranjan-stha May 2, 2025
4bb6ab3
Fix the types
ranjan-stha May 2, 2025
5f0a3de
Update the specific field
ranjan-stha May 5, 2025
5133a78
Merge pull request #262 from IFRCGo/refactor-gfd-extraction-v2
Rup-Narayan-Rajbanshi May 5, 2025
ccea71f
Refactor gdacs extractionV2.
Rup-Narayan-Rajbanshi Apr 30, 2025
9822b47
refactor corn jobs.
Rup-Narayan-Rajbanshi May 6, 2025
7c5a1d8
Merge pull request #263 from IFRCGo/fix/refactor-corn-job
sudan45 May 6, 2025
cec3fc8
Add additional package to build for arm64
Shhhhhubh Mar 21, 2025
367e56a
Setup for running tests
Shhhhhubh Mar 21, 2025
7b9279a
Basic setup to test IDU
Shhhhhubh Mar 21, 2025
bf8f878
Test case for all sources
Shhhhhubh Apr 16, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
__pycache__/
*.py[cod]
*$py.class
.DS_Store

# C extensions
*.so
Expand Down Expand Up @@ -32,6 +33,7 @@ MANIFEST
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
*.sqlite

# Installer logs
pip-log.txt
Expand Down Expand Up @@ -132,4 +134,7 @@ dmypy.json

# Media/Data
assets/
media/
*.gpkg

apps/etl/Dataset
4 changes: 2 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -20,14 +20,14 @@ RUN --mount=type=cache,target=/root/.cache/uv \
apt-get update -y \
&& apt-get install -y --no-install-recommends \
# Build required packages
gcc libc-dev gdal-bin libproj-dev \
build-essential gcc libc-dev gdal-bin libgdal-dev libproj-dev \
# Helper packages
procps \
wait-for-it \
# FIXME: Add condition to skip dev dependencies
&& uv sync --frozen --no-install-project --all-groups \
# Clean-up
&& apt-get remove -y gcc libc-dev libproj-dev \
&& apt-get remove -y gcc libc-dev libproj-dev build-essential libgdal-dev \
&& apt-get autoremove -y \
&& rm -rf /var/lib/apt/lists/*

Expand Down
49 changes: 49 additions & 0 deletions Dockerfile.save
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
0;9u<<<<<<< HEAD
FROM python:3.12-slim-bookworm AS base
COPY --from=ghcr.io/astral-sh/uv:0.5.29 /uv /uvx /bin/
||||||| parent of 074f51d (Upgrade to bookworm)
FROM python:3.12-slim-bullseye AS base
COPY --from=ghcr.io/astral-sh/uv:0.5.29 /uv /uvx /bin/
=======
FROM python:3.13-slim-bookworm AS base
COPY --from=ghcr.io/astral-sh/uv:0.6.8 /uv /uvx /bin/
>>>>>>> 074f51d (Upgrade to bookworm)

LABEL maintainer="Montandon Dev"
LABEL org.opencontainers.image.source="https://github.com/IFRCGo/montandon-etl/"

ENV PYTHONUNBUFFERED=1

ENV UV_COMPILE_BYTECODE=1
ENV UV_LINK_MODE=copy
ENV UV_PROJECT_ENVIRONMENT="/usr/local/"

WORKDIR /code

COPY libs /code/libs

RUN --mount=type=cache,target=/root/.cache/uv \
--mount=type=bind,source=uv.lock,target=uv.lock \
--mount=type=bind,source=pyproject.toml,target=pyproject.toml \
apt-get update -y \
&& apt-get install -y --no-install-recommends \
# Build required packages
build-essential gcc libc-dev gdal-bin libgdal-dev libproj-dev \
# Helper packages
procps \
wait-for-it \
<<<<<<< HEAD
&& uv sync --frozen --no-install-project --all-groups \
||||||| parent of 074f51d (Upgrade to bookworm)
&& uv sync --frozen --no-install-project --no-dev \
=======
&& uv lock --locked --offline \
# FIXME: Add condition to skip dev dependencies
&& uv sync --frozen --no-install-project --all-groups \
>>>>>>> 074f51d (Upgrade to bookworm)
# Clean-up
&& apt-get remove -y gcc libc-dev libproj-dev build-essential libgdal-dev \
&& apt-get autoremove -y \
&& rm -rf /var/lib/apt/lists/*

COPY . /code/
8 changes: 7 additions & 1 deletion apps/etl/admin.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ class TransformAdmin(EtlResourceAdminMixin, AdminReadOnlyMixin, DjangoQLSearchMi
"success_percentage",
"total_rows",
)
list_filter = ("status",)
list_filter = ("status", "extraction_id__source")
autocomplete_fields = ["extraction"]
search_fields = ["extraction"]

Expand Down Expand Up @@ -140,10 +140,16 @@ class PyStacLoadDataAdmin(AdminReadOnlyMixin, DjangoQLSearchMixin, admin.ModelAd
list_filter = (
"item_type",
"status",
"transform_id__extraction_id__source",
)
list_filter = ("item_type", "status", "transform_id__extraction_id__source")
autocomplete_fields = ["transform_id"]
search_fields = ["transform_id"]

@admin.display(description="Source", ordering="source")
def source(self, instance):
return ExtractionData.Source(instance.source).label

def get_queryset(self, request):
# NOTE: item contains heavy json data
return super().get_queryset(request).defer("item")
45 changes: 24 additions & 21 deletions apps/etl/etl_tasks/desinventar.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,14 @@
import logging

from celery import chain, shared_task
from celery import shared_task

from apps.etl.extraction.sources.desinventar.extract import DesinventarExtraction, DesInventarExtractionInputMetadata
from apps.etl.transform.sources.desinventar import DesinventarTransformHandler
from apps.etl.extraction.sources.desinventar.extract import (
DesInventarExtraction,
DesInventarExtractionMetadata,
DesInventarExtractionParamsMetadata,
DesInventarMetadataType,
)
from main.configs import etl_config

logger = logging.getLogger(__name__)

Expand Down Expand Up @@ -119,22 +124,20 @@
@shared_task
def ext_and_transform_desinventar_historical_data():
for country_code in country_code_iso3_list:
metadata = DesInventarExtractionInputMetadata(
country_code=country_code,
iso3=country_code,
).model_dump()
chain(
DesinventarExtraction.task.s(metadata),
DesinventarTransformHandler.task.s(),
).apply_async()

# FIXME: country_code should be region_code
url = f"{etl_config.DESINVENTAR_DATA_URL}/DesInventar/download/DI_export_{country_code}.zip"
DesInventarExtraction.init_extraction(
metadata=DesInventarExtractionMetadata(
url=url,
type=DesInventarMetadataType.QUERY,
params=DesInventarExtractionParamsMetadata(country_code=country_code, iso3=country_code),
)
)
for country_code, iso3 in additional_region_code_to_iso3_map.items():
metadata = DesInventarExtractionInputMetadata(
country_code=country_code,
iso3=iso3,
).model_dump()
chain(
DesinventarExtraction.task.s(metadata),
DesinventarTransformHandler.task.s(),
).apply_async()
url = f"{etl_config.DESINVENTAR_DATA_URL}/DesInventar/download/DI_export_{country_code}.zip"
DesInventarExtraction.init_extraction(
metadata=DesInventarExtractionMetadata(
url=url,
type=DesInventarMetadataType.QUERY,
params=DesInventarExtractionParamsMetadata(country_code=country_code, iso3=country_code),
)
)
72 changes: 40 additions & 32 deletions apps/etl/etl_tasks/emdat.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,14 @@

from celery import chain, shared_task

from apps.etl.extraction.sources.emdat.extract import EMDATExtraction, EmdatExtractionInputMetadata
from apps.etl.extraction.sources.emdat.extract import (
EmdatExtraction,
EmdatExtractionMetadata,
EmdatExtractionMetadataType,
EmdatExtractionParamsMetadata,
)
from apps.etl.transform.sources.emdat import EMDATTransformHandler
from apps.etl.utils import get_cluster_codes
from main.configs import etl_config

QUERY = """
Expand All @@ -13,11 +19,12 @@
$include_hist: Boolean
$from: Int
$to: Int
$classif: [String!]
) {
api_version
public_emdat(
cursor: { offset: $offset, limit: $limit }
filters: { include_hist: $include_hist, from: $from, to: $to }
filters: { include_hist: $include_hist, from: $from, to: $to , classif: $classif}
) {
total_available
info {
Expand Down Expand Up @@ -78,39 +85,40 @@
"""


# FIXME: Remove kwargs?
@shared_task
def ext_and_transform_emdat_latest_data(**kwargs):
# FIXME: Why are we getting data from etl_config.EMDAT_START_YEAR to get the latest data?
# Also, the filtering only filters using year so we might have lot of duplicate data
variables = EmdatExtractionInputMetadata.model_validate(
{
"limit": -1,
"from": etl_config.EMDAT_START_YEAR,
"to": datetime.now().year,
"include_hist": None,
}
).model_dump(by_alias=True)

chain(
EMDATExtraction.task.s(QUERY, variables),
EMDATTransformHandler.task.s(),
).apply_async()
extraction_object = EmdatExtraction.init_extraction(
metadata=EmdatExtractionMetadata(
params=EmdatExtractionParamsMetadata(
limit=-1,
from_=etl_config.EMDAT_START_YEAR,
to=datetime.now().year,
include_hist=None,
classif=get_cluster_codes(),
),
url=f"{etl_config.EMDAT_URL}/v1",
type=EmdatExtractionMetadataType.QUERY,
),
add_to_queue=False,
)
chain(EmdatExtraction.task.s(extraction_object.id), EMDATTransformHandler.task.s()).apply_async()


# FIXME: Remove kwargs?
@shared_task
def ext_and_transform_emdat_historical_data(**kwargs):
variables = EmdatExtractionInputMetadata.model_validate(
{
"limit": -1,
"from": None,
"to": None,
"include_hist": True,
}
).model_dump(by_alias=True)

chain(
EMDATExtraction.task.s(QUERY, variables),
EMDATTransformHandler.task.s(),
).apply_async()
for i in range(etl_config.EMDAT_START_YEAR, etl_config.EMDAT_END_YEAR + 1):
extraction_object = EmdatExtraction.init_extraction(
metadata=EmdatExtractionMetadata(
params=EmdatExtractionParamsMetadata(
limit=-1,
from_=i,
to=i,
include_hist=True,
classif=get_cluster_codes(),
),
url=f"{etl_config.EMDAT_URL}/v1",
type=EmdatExtractionMetadataType.QUERY,
),
add_to_queue=False,
)
chain(EmdatExtraction.task.s(extraction_object.id), EMDATTransformHandler.task.s()).apply_async()
Loading
Loading