Skip to content

Commit 22d9067

Browse files
authored
Dev (#873)
* New pipeline runner image. (#841) * Push * This works! * cleanup * lint * lint * Tweak spark config location * add dockerignore * vep bumps * Finish download * lines * correct path * bump vep version * path updates * Add reference genome to path * rename to REFERENCE_GENOME * refactor yet again * Finish * cleanup * redo reference data * more refactoring * more refactor * bump * absolute path * Fix var * default back to /vep_data * rearrange vep data * lint * small tweaks * rename * Benb/rg38 locus (#864) * rg38 locus * fix property * lint * update test * ruff * Fix test * empty release commit * a few bugs * fix config * Fix vep_data * fix bin * change up args again * ignore warnings * ht instead of mt * use contains instead of index * Fix bad commit
1 parent 81249e6 commit 22d9067

34 files changed

+516
-140
lines changed

.cloudbuild/vep-docker.cloudbuild.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
# Run locally with:
22
#
3-
# gcloud builds submit --quiet --substitutions='_VEP_VERSION=110' --config .cloudbuild/vep-docker.cloudbuild.yaml v03_pipeline/deploy
3+
# gcloud builds submit --quiet --substitutions='_REFERENCE_GENOME=GRCh38' --config .cloudbuild/vep-docker.cloudbuild.yaml v03_pipeline/deploy
44
steps:
55
- name: 'gcr.io/kaniko-project/executor:v1.3.0'
66
args:
7-
- --destination=gcr.io/seqr-project/vep-docker-image:${_VEP_VERSION}
8-
- --dockerfile=Dockerfile.vep
7+
- --destination=gcr.io/seqr-project/vep-docker-image:${_REFERENCE_GENOME}
8+
- --dockerfile=Dockerfile.vep_${_REFERENCE_GENOME}
99
- --cache=true
1010
- --cache-ttl=168h
1111

.dockerignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
*test*
2+
.git
3+
.vscode
4+
.idea

.github/workflows/prod-release.yml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -52,9 +52,11 @@ jobs:
5252
shell: bash
5353
run: |-
5454
gcloud storage rm -r gs://seqr-luigi/releases/prod/latest/ || echo 'No latest release'
55-
gcloud storage cp v03_pipeline/bin/* gs://seqr-luigi/releases/prod/latest/
55+
gcloud storage cp v03_pipeline/bin gs://seqr-luigi/releases/prod/latest/bin/
56+
gcloud storage cp v03_pipeline/var/vep_config gs://seqr-luigi/releases/prod/latest/var/vep_config
5657
gcloud storage cp dist/*.whl gs://seqr-luigi/releases/prod/latest/pyscripts.zip
57-
gcloud storage cp v03_pipeline/bin/* gs://seqr-luigi/releases/prod/$TAG_NAME/
58+
gcloud storage cp v03_pipeline/bin gs://seqr-luigi/releases/prod/$TAG_NAME/bin/
59+
gcloud storage cp v03_pipeline/var/vep_config gs://seqr-luigi/releases/prod/$TAG_NAME/var/vep_config
5860
gcloud storage cp dist/*.whl gs://seqr-luigi/releases/prod/$TAG_NAME/pyscripts.zip
5961
6062
- name: Create tag

requirements.in

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
elasticsearch==7.9.1
22
google-api-python-client>=1.8.0
3-
hail==0.2.130
3+
hail==0.2.132
44
luigi>=3.4.0
55
gnomad==0.6.4
66
google-cloud-storage>=2.14.0

requirements.txt

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -148,7 +148,7 @@ grpcio==1.63.0
148148
# grpcio-status
149149
grpcio-status==1.48.2
150150
# via google-api-core
151-
hail==0.2.130
151+
hail==0.2.132
152152
# via -r requirements.in
153153
hdbscan==0.8.33
154154
# via gnomad
@@ -221,7 +221,7 @@ numpy==1.26.2
221221
# scipy
222222
oauthlib==3.2.2
223223
# via requests-oauthlib
224-
orjson==3.9.10
224+
orjson==3.10.6
225225
# via hail
226226
packaging==23.2
227227
# via
@@ -254,12 +254,13 @@ protobuf==3.20.2
254254
# googleapis-common-protos
255255
# grpc-google-iam-v1
256256
# grpcio-status
257+
# hail
257258
# proto-plus
258259
ptyprocess==0.7.0
259260
# via pexpect
260261
pure-eval==0.2.2
261262
# via stack-data
262-
py4j==0.10.9.5
263+
py4j==0.10.9.7
263264
# via pyspark
264265
pyasn1==0.5.1
265266
# via
@@ -276,12 +277,10 @@ pygments==2.17.2
276277
# ipython
277278
# rich
278279
pyjwt[crypto]==2.8.0
279-
# via
280-
# msal
281-
# pyjwt
280+
# via msal
282281
pyparsing==3.1.1
283282
# via httplib2
284-
pyspark==3.3.3
283+
pyspark==3.5.1
285284
# via hail
286285
python-daemon==3.0.1
287286
# via luigi

v03_pipeline/api/__init__.py

Whitespace-only changes.

v03_pipeline/api/__main__.py

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
from aiohttp import web
2+
3+
from v03_pipeline.api.app import init_web_app
4+
from v03_pipeline.lib.logger import get_logger
5+
6+
7+
def run():
8+
app = init_web_app()
9+
logger = get_logger(__name__)
10+
web.run_app(
11+
app,
12+
host='0.0.0.0', # noqa: S104
13+
port=5000,
14+
access_log=logger,
15+
)
16+
17+
18+
run()

v03_pipeline/api/app.py

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
from aiohttp import web
2+
3+
from v03_pipeline.lib.tasks import * # noqa: F403
4+
5+
6+
async def status(_: web.Request) -> web.Response:
7+
return web.json_response({'success': True})
8+
9+
10+
async def init_web_app():
11+
app = web.Application()
12+
app.add_routes(
13+
[
14+
web.get('/status', status),
15+
],
16+
)
17+
return app
Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
#!/bin/bash
2+
3+
#
4+
# VEP init action for dataproc
5+
#
6+
# adapted/copied from
7+
# https://github.com/broadinstitute/gnomad_methods/blob/main/init_scripts/vep105-init.sh
8+
# and gs://hail-common/hailctl/dataproc/0.2.128/vep-GRCh38.sh
9+
#
10+
# NB: This is code used for initializing a dataproc cluster and runs as an intialization
11+
# action when the rest of our code is unavailable.
12+
#
13+
14+
set -x
15+
16+
export PROJECT="$(gcloud config get-value project)"
17+
export ENVIRONMENT="$(/usr/share/google/get_metadata_value attributes/ENVIRONMENT)"
18+
export VEP_CONFIG_PATH="$(/usr/share/google/get_metadata_value attributes/VEP_CONFIG_PATH)"
19+
export REFERENCE_GENOME="$(/usr/share/google/get_metadata_value attributes/REFERENCE_GENOME)"
20+
21+
# Install docker
22+
apt-get update
23+
apt-get -y install \
24+
apt-transport-https \
25+
ca-certificates \
26+
curl \
27+
gnupg2 \
28+
software-properties-common \
29+
tabix
30+
curl -fsSL https://download.docker.com/linux/debian/gpg | sudo apt-key add -
31+
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/debian $(lsb_release -cs) stable"
32+
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/debian $(lsb_release -cs) stable"
33+
apt-get update
34+
apt-get install -y --allow-unauthenticated docker-ce
35+
36+
# https://github.com/hail-is/hail/issues/12936
37+
sleep 60
38+
sudo service docker restart
39+
40+
gcloud storage cp gs://seqr-luigi/releases/$ENVIRONMENT/latest/var/vep_config/vep-$REFERENCE_GENOME.json $VEP_CONFIG_PATH
41+
42+
cat >/vep.c <<EOF
43+
#include <unistd.h>
44+
#include <stdio.h>
45+
46+
int
47+
main(int argc, char *const argv[]) {
48+
if (setuid(geteuid()))
49+
perror( "setuid" );
50+
51+
execv("/vep.bash", argv);
52+
return 0;
53+
}
54+
EOF
55+
gcc -Wall -Werror -O2 /vep.c -o /vep
56+
chmod u+s /vep
57+
58+
gcloud storage cp gs://seqr-luigi/releases/$ENVIRONMENT/latest/bin/download_vep_data.bash /download_vep_data.bash
59+
chmod +x /download_vep_data.bash
60+
./download_vep_data.bash $REFERENCE_GENOME
61+
62+
gcloud storage cp gs://seqr-luigi/releases/$ENVIRONMENT/latest/bin/vep /vep.bash
63+
chmod +x /vep.bash
64+
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
#!/usr/bin/env bash
2+
3+
set -eux
4+
5+
REFERENCE_GENOME=$1
6+
SEQR_REFERENCE_DATA=/seqr-reference-data
7+
8+
case $REFERENCE_GENOME in
9+
GRCh38)
10+
;;
11+
GRCh37)
12+
;;
13+
*)
14+
echo "Invalid reference genome $REFERENCE_GENOME, should be GRCh37 or GRCh38"
15+
exit 1
16+
esac
17+
18+
mkdir -p $SEQR_REFERENCE_DATA/$REFERENCE_GENOME;
19+
gcloud storage cp -r "gs://seqr-reference-data/v03/$REFERENCE_GENOME/*" $SEQR_REFERENCE_DATA/$REFERENCE_GENOME/

0 commit comments

Comments
 (0)