omics-datascience
diff --git a/‎.gitignore
Lines changed: 4 additions & 1 deletion b/‎.gitignore
Lines changed: 4 additions & 1 deletion
diff --git a/‎CONTRIBUTING.md
Lines changed: 6 additions & 0 deletions b/‎CONTRIBUTING.md
Lines changed: 6 additions & 0 deletions
diff --git a/‎DEPLOYING.md
Lines changed: 91 additions & 40 deletions b/‎DEPLOYING.md
Lines changed: 91 additions & 40 deletions
diff --git a/‎Dockerfile
Lines changed: 6 additions & 5 deletions b/‎Dockerfile
Lines changed: 6 additions & 5 deletions
diff --git a/‎ModulectorBackend/settings.py
Lines changed: 5 additions & 1 deletion b/‎ModulectorBackend/settings.py
Lines changed: 5 additions & 1 deletion
diff --git a/‎ModulectorBackend/settings_ci.py
Lines changed: 2 additions & 3 deletions b/‎ModulectorBackend/settings_ci.py
Lines changed: 2 additions & 3 deletions
diff --git a/‎ModulectorBackend/settings_prod.py
Lines changed: 13 additions & 3 deletions b/‎ModulectorBackend/settings_prod.py
Lines changed: 13 additions & 3 deletions
@@ -144,5 +144,8 @@ src/api_service/experiments/venv
 docker-compose.yml
 /secretkey.txt
 /modulector/database_versions/
-/modulector/files/
+modulector/files/EPIC-8v2-0_A1.csv
+modulector/files/mirDIP_Unidirectional_search_v.5.txt
 *.sql.gz
+docker-compose.mauri_dev.yml
+modulector/files/tmp_db.csv
@@ -29,6 +29,12 @@ The entire contributing process consists in the following steps:
     1. `python3 -m venv venv`
     1. `source venv/bin/activate` (this command must be run every time you want to start the Django server, otherwise we won't have the dependencies available)
     1. `pip install -r config/requirements.txt`
+1. Database download:  
+For reasons of size there are 2 databases that are not in the Modulector repository and you have to download them manually. The following databases must be downloaded:  
+    1. mirDIP version 5.2. Download [mirDIP_Unidirectional_search_v.5.txt](https://ophid.utoronto.ca//mirDIPweb/mirDIP_Unidirectional_search_v_5_2.zip) file.  
+    1. [EPIC-8v2-0_A1.csv](https://support.illumina.com/content/dam/illumina-support/documents/downloads/productfiles/methylationepic/MethylationEPIC_v2%20Files.zip) file of the Infinium MethylationEPIC array version 2.0.  
+Once the two files are downloaded, unzip and move them into the `modulector/files/` directory. 
+Note: there may be more than one file inside the ZIP. Be sure to move only the two files mentioned above.
 1. Apply migrations and create super user:
     1. `python3 manage.py makemigrations`
     1. `python3 manage.py migrate`
 
@@ -16,23 +16,26 @@ Below are the steps to perform a production deploy.
 1. Set the environment variables that are empty with data. They are listed below by category:
     - Django:
         - `DJANGO_SETTINGS_MODULE`: indicates the `settings.py` file to read. In production we set in `docker-compose_dist.yml` the value `ModulectorBackend.settings_prod` which contains several production properties.
+        - `ALLOWED_HOSTS`: list of allowed host separated by commas. Default `['web', '.localhost', '127.0.0.1', '[::1]']`.
+        - `ENABLE_SECURITY`: set the string `true` to enable Django's security mechanisms. In addition to this parameter, to have a secure site you must configure the HTTPS server, for more information on the latter see the section [Enable SSL/HTTPS](#enable-sslhttps). Default `false`.
+        - `CSRF_TRUSTED_ORIGINS`: in Django >= 4.x, it's mandatory to define this in production when you are using Daphne through NGINX. The value is a single host or list of hosts separated by commas. 'http://', 'https://' prefixes are mandatory. Examples of values: 'http://127.0.0.1', 'http://127.0.0.1,https://127.0.0.1:8000', etc. You can read more [here](#csrf-trusted-issue).
         - `SECRET_KEY`: Django's secret key. If not specified, one is generated with [generate-secret-key application](https://github.com/MickaelBergem/django-generate-secret-key) automatically.
         - `MEDIA_ROOT`: absolute path where will be stored the uploaded files. By default `<project root>/uploads`.
         - `MEDIA_URL`: URL of the `MEDIA_ROOT` folder. By default `<url>/media/`.
         - `CUSTOM_ALLOWED_HOSTS`: list of allowed hosts (separated by commas) to access to Modulector (
           ex. `192.168.11.1,10.10.10.2,localhost`). If it is not defined, `web` (which is the alias of the Modulector host running in Docker) is used.
-    - Healthchecks and alerts:
-        - `HEALTH_URL` : indicates the url that will be requested on Docker healthchecks. By default it is http://localhost:8000/drugs/. The healthcheck makes a GET request on it. Any HTTP code value greatear or equals than 400 is considered an error.
-        - `HEALTH_ALERT_URL` : if you want to receive an alert when healthchecks failed, you can set this variable to a webhook endpoint that will receive a POST request and a JSON body with the field **content** that contains the fail message.
-1. Set the environment variables for the database connection if the default values don't match your `db` service scenario:
+    - Postgres:
         - `POSTGRES_USERNAME` : Database username. By default the docker image uses `modulector`.
         - `POSTGRES_PASSWORD` : Database username's password. By default the docker image uses `modulector`.
         - `POSTGRES_PORT` : Database server listen port. By default the docker image uses `5432`.
         - `POSTGRES_DB` : Database name to be used. By default the docker image uses `modulector`.
+    - Healthchecks and alerts:
+        - `HEALTH_URL` : indicates the url that will be requested on Docker healthchecks. By default it is http://localhost:8000/drugs/. The healthcheck makes a GET request on it. Any HTTP code value greatear or equals than 400 is considered an error.
+        - `HEALTH_ALERT_URL` : if you want to receive an alert when healthchecks failed, you can set this variable to a webhook endpoint that will receive a POST request and a JSON body with the field **content** that contains the fail message.
 1. Go back to the project's root folder and run the following commands:
     - Docker Compose:
-        - Start: `docker-compose up -d`. The service will available in `127.0.0.1`.
-        - Stop: `docker-compose down`
+        - Start: `docker compose up -d`. The service will available in `127.0.0.1`.
+        - Stop: `docker compose down`
     - [Docker Swarm](https://docs.docker.com/engine/swarm/):
         - Start: `docker stack deploy --compose-file docker-compose.yml modulector`
         - Stop: `docker stack rm modulector`
@@ -44,8 +47,35 @@ Below are the steps to perform a production deploy.
 
 ### Start delays
 
-Due the database restoration in the first start, the container modulectordb may take a while to be up an ready. We can follow the status of the startup process in the logs by doing: `docker-compose logs --follow`.
-Sometimes this delay makes django server throws database connection errors. If it is still down and not automatically fixed when database finally's up, we can restart the services by doing: `docker-compose up -d`.
+Due to the database restoration in the first start, the container modulectordb may take a while to be up an ready. We can follow the status of the startup process in the logs by doing: `docker compose logs --follow`.
+Sometimes this delay makes django server throws database connection errors. If it is still down and not automatically fixed when database is finally up, we can restart the services by doing: `docker compose up -d`.
+
+
+## Enable SSL/HTTPS
+
+To enable HTTPS, follow the steps below:
+
+1. Set the `ENABLE_SECURITY` parameter to `true` as explained in the [Instructions](#instructions) section.
+1. Copy the file `config/nginx/multiomics_intermediate_safe_dist.conf` and paste it into `config/nginx/conf.d/` with the name `multiomics_intermediate.conf`.
+1. Get the `.crt` and `.pem` files for both the certificate and the private key and put them in the `config/nginx/certificates` folder.
+1. Edit the `multiomics_intermediate.conf` file that we pasted in point 2. Uncomment the lines where both `.crt` and `.pem` files must be specified.
+1. Edit the `docker-compose.yml` file so that the `nginx` service exposes both port 8000 and 443. Also you need to add `certificates` folder to `volumes` section. It should look something like this:
+
+```yaml
+...
+nginx:
+    image: nginx:1.19.3
+    ports:
+        - 80:8000
+        - 443:443
+    # ...
+    volumes:
+        ...
+        - ./config/nginx/certificates:/etc/nginx/certificates
+...
+```
+
+6. Redo the deployment with Docker.
 
 
 ## Perform security checks
@@ -65,16 +95,16 @@ python3 manage.py check --deploy --settings ModulectorBackend.settings_prod
 
 ## Restart/stop the services
 
-If the configuration of the `docker-compose.yml` file has been changed, you can apply the changes without stopping the services, just running the `docker-compose restart` command.
+If the configuration of the `docker-compose.yml` file has been changed, you can apply the changes without stopping the services, just running the `docker compose restart` command.
 
-If you want to stop all services, you can execute the command `docker-compose down`.
+If you want to stop all services, you can execute the command `docker compose down`.
 
 
 ## See container status
 
 To check the different services' status you can run:
 
-`docker-compose logs <service's name>`
+`docker service logs <service's name>`
 
 Where  *\<service's name\>* could be `nginx`, `web` or `db`.
 
@@ -85,52 +115,69 @@ Where  *\<service's name\>* could be `nginx`, `web` or `db`.
 
 In order to create a database dump you can execute the following command:
 
-`docker exec -t [name of DB container] pg_dump [db name] --data-only | gzip > modulector.sql.gz`
+`docker exec -t [name of DB container] pg_dump [db name] --no-owner -U modulector | gzip > modulector.sql.gz`
 
-That command will create a compressed file with the database dump inside. **Note** that `--data-only` flag is present as DB structure is managed by Django Migrations so they are not necessary.
+That command will create a compressed file with the database dump inside.
 
 
 ### Import
 
-Use the followings steps if you manually set your postgres environment. Otherwise, you can just use the `modulector-db:<version>` and avoid all this steps. If you move between release versions it's very probable that the db image exists with the previous name mentioned.
+You can use set Modulector DB in three ways.
 
-1. **Optional but recommended**: due to major changes, it's probably that an import thrown several errors when importing. To prevent that you could do the following steps before doing the importation:
-    1. Drop all the tables from the DB:
-        1. Log into docker container: `docker container exec -it [name of DB container] bash`
-        1. Log into Postgres: `psql -U [username] -d [database]`
-        1. Run to generate a `DELETE CASCADE` query for all
-           tables: `select 'drop table if exists "' || tablename || '" cascade;' from pg_tables where schemaname = 'public';`
-        1. (**Danger, will drop tables**) Run the generated query in previous step to drop all tables
-    1. Run the Django migrations to create the empty tables with the correct structure: `docker exec -i [name of django container] python3 manage.py migrate`
-1. Download `.sql.gz` from [Modulector releases pages](https://github.com/multiomics-datascience/modulector-backend/releases) or use your own export file
-1. Restore the db running:
 
-`zcat modulector.sql.gz | docker exec -i [name of DB container] psql [db name]`
+### Using official Docker image (recommended)
+
+You can just use the [modulector-db][modulector-db-docker] and avoid all kind of importations steps. This is the default setting in `docker-compose_dist.yml`.
+
 
-That command will restore the database using a compressed dump as source
+### Importing an existing database dump
+
+1. Comment the service in your `docker-compose.yml` that uses the `omicsdatascience/modulector-db` image.
+1. Start up a PostgreSQL service. You can use the same service listed in the `docker-compose.dev.yml` file. 
+1. **Optional but recommended (You can omit these steps if it's the first time you are deploying Modulector)**: due to major changes, it's probably that an import thrown several errors when importing. To prevent that you could do the following steps before doing the importation:
+    1. Drop all the tables from the DB:
+        1. Log into docker container: `docker container exec -it [name of the DB container] bash`
+        1. Log into Postgres: `psql -U [username]`
+        1. (**Danger, will drop all the data**) Remove the `modulector` database: `DROP DATABASE modulector;`
+        1.  Create an empty database: `CREATE DATABASE modulector;`
+1. Download `.sql.gz` from [Modulector releases pages](https://github.com/multiomics-datascience/modulector-backend/releases) or use your own export file.
+1. Restore the db: `zcat modulector.sql.gz | docker exec -i [name of the DB container] psql modulector -U modulector`
+
+That command will restore the database using a compressed dump as source.
 
 
-### Regenerate data
+### Regenerating the data manually
 
-Use the followings steps if you manually set your postgres environment. Otherwise, you can just regenerate all the db data deleting or stoping the db container and bring it up. It's because the image has all the data you need. But if you are deploying your custom postgres the next steps are valid.
+1. Download the files for the mirDIP database (version 5.2) and the Illumina 'Infinium MethylationEPIC 2.0' array. The files can be freely downloaded from their respective web pages.  
+   **For the mirDIP database**:
+   - Go to the [MirDIP download web page](https://ophid.utoronto.ca/mirDIP/download.jsp) and download the file called *"mirDIPweb/mirDIP Unidirectional search ver. 5.2"*.
+   - Unzip the file.
+   - Find the file called *"mirDIP_Unidirectional_search_v.5.txt"* and move it into the **"modulector/files/"** directory.  
 
-**If you need to regenerate all, or some of the data**
+   **For the EPIC Methylation array**:
+   - Go to the [Illumina product files web page](https://support.illumina.com/downloads/infinium-methylationepic-v2-0-product-files.html) and download the ZIP file called "*Infinium MethylationEPIC v2.0 Product Files (ZIP Format)*".
+   - Unzip the file.
+   - Within the unzipped files you will find one called "*EPIC-8v2-0_A1.csv*". Move this file to the directory **"modulor/files/"**.  
+   
+   **NOTE:** The total weight of both files is about 5 GB.
+1. Start up a PostgreSQL service. You can use the same service listed in the docker-compose.dev.yml file.
+1. Run the migrations. Use `python3 manage.py migrate` to run all the migrations (**NOTE:** this can take a long time to finish)
 
-1. Download `files.zip` from [Modulector releases pages](https://github.com/multiomics-datascience/modulector-backend/releases) and place it inside the files folder **(this folder is ignored by git)**
-2. http://127.0.0.1:8000/process/?commands=
-3. If you don't sent the query param all the commands will be executed
-4. If you want a specific set you can combine the following `drugs, mature_mirna, diseases, ref_seq, gene, sequence, mirDIP`
 
+## Update databases
 
-## If you are using your own postgres server
+Modulector currently works with the mirDIP (version 5.2) and miRBase (version 22.1) databases for miRNA data, and with information from the Illumina 'Infinium MethylationEPIC 2.0' array  for information about methylation sites.  
+If new versions are released for these databases and you want to update them, follow these steps:  
 
-It's important that if you are using another postgres server, and not the modulector-db image for getting up the services, you must provide the next parameters on db and web services to assure their communication.
+ - For **mirDIP** and **Illumina EPIC array** you must follow the same steps described in the [Regenerating the data manually](#regenerating-the-data-manually) section, replacing the named files with the most recent versions that have been published on their sites .
+ - For **miRBase**, follow the instructions below:
+   1. Go to the [_Download_ section on the website][mirbase-download-page].
+   1. Download the files named _hairpin.fa_ and _mature.fa_ from the latest version of the database.
+   1. Replace the files inside the _modulector/files/_ directory with the ones downloaded in the previous step.
+   1. Start up a PostgreSQL service. You can use the same service listed in the _docker-compose.dev.yml_ file.
+   1. Run the command `python3 manage.py migrate` to apply all the migrations (**NOTE:** this can take a long time to finish).
 
-- `POSTGRES_USERNAME`: DB username. **Must be equal to** `POSTGRES_USER` in `db` service.
-- `POSTGRES_PASSWORD`: DB user's password. **Must be equal to** `POSTGRES_PASSWORD` in `db` service.
-- `POSTGRES_HOST`: DB host.
-- `POSTGRES_PORT`: DB host's port.
-- `POSTGRES_DB`: DB's name. **Must be equal to** `POSTGRES_DB` in in `db` service.
+**Note:** These updates will work correctly as long as they maintain the format of the data in the source files.
 
 
 ## Configure your API key
@@ -140,3 +187,7 @@ When we notify user about updates of pubmeds they are subscribed to we interact
 
 ## Cron job configuration
 For cron jobs we use the following [library](https://github.com/kraiz/django-crontab). In our settings file we configured our cron jobs inside the `CRONJOBS = []`
+
+
+[modulector-db-docker]: https://hub.docker.com/r/omicsdatascience/modulector-db
+[mirbase-download-page]: https://www.mirbase.org/ftp.shtml
@@ -1,4 +1,4 @@
-FROM python:3.8-buster
+FROM python:3.11-bullseye
 
 # Default value for deploying with modulector-db image
 ENV POSTGRES_USERNAME "modulector"
@@ -11,16 +11,17 @@ RUN mkdir /src
 WORKDIR /src/
 ENV BASEDIR=/src
 
+# Install app python requirements
+ADD config/requirements.txt /config/
+RUN pip3 install -r /config/requirements.txt
+
 # Copy all source data
 COPY . .
 
-# Install app python requirements
-RUN pip3 install -r config/requirements.txt
-
 RUN echo 0 > tools/healthcheck/tries.txt
 HEALTHCHECK CMD python tools/healthcheck/check.py
 CMD ["/bin/bash","-c","tools/run.sh"]
 
-# modulector port
+# Modulector port
 EXPOSE 8000
 
@@ -15,6 +15,10 @@
 # Modulector version
 VERSION: str = '1.4.4'
 
+# Default primary key field type
+# https://docs.djangoproject.com/en/4.0/ref/settings/#default-auto-field
+DEFAULT_AUTO_FIELD = 'django.db.models.BigAutoField'
+
 # Build paths inside the project like this: os.path.join(BASE_DIR, ...)
 BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
 
@@ -27,7 +31,7 @@
 # SECURITY WARNING: don't run with debug turned on in production!
 DEBUG = True
 
-ALLOWED_HOSTS = ['.localhost', '127.0.0.1', '[::1]','*']
+ALLOWED_HOSTS = ['web', '.localhost', '127.0.0.1', '[::1]']
 
 # Modulector unsubscribe endpoint
 UNSUBSCRIBE_URL = 'http://localhost:8000/unsubscribe-pubmeds/?token='
 
@@ -27,7 +27,7 @@
 # SECURITY WARNING: don't run with debug turned on in production!
 DEBUG = True
 
-ALLOWED_HOSTS = ['.localhost', '127.0.0.1', '[::1]','*']
+ALLOWED_HOSTS = ['.localhost', '127.0.0.1', '[::1]', '*']
 
 # Modulector unsubscribe endpoint
 UNSUBSCRIBE_URL = 'http://localhost:8000/unsubscribe-pubmeds/?token='
@@ -101,8 +101,7 @@
 DATABASES = {
     'default': {
         'ENGINE': 'django.db.backends.sqlite3',
-        'TEST':
-        {
+        'TEST': {
             'MIGRATE': False
         }
     },
 
@@ -27,15 +27,25 @@
 # 'web' is the name of the docker-compose service which serves Django
 custom_allowed_hosts: Optional[str] = os.getenv('CUSTOM_ALLOWED_HOSTS')
 if custom_allowed_hosts is None:
-    ALLOWED_HOSTS = ['web','.localhost', '127.0.0.1', '[::1]']
+    ALLOWED_HOSTS = ['web', '.localhost', '127.0.0.1', '[::1]']
 else:
     # Gets all the hosts declared by the user (separated by commas)
     allowed_host_list = custom_allowed_hosts.split(',')
     allowed_host_list_stripped = [x.strip() for x in allowed_host_list]
     ALLOWED_HOSTS = allowed_host_list_stripped
 
+# From Django 4 this needs to be set to prevent issue with NGINX
+csrf_trusted_origins_env = os.getenv('CSRF_TRUSTED_ORIGINS', '')
+CSRF_TRUSTED_ORIGINS = csrf_trusted_origins_env.split(',')
+
 # Security settings
-# SESSION_COOKIE_SECURE = True  # TODO: set when configured a SSL Cert.
-# CSRF_COOKIE_SECURE = True  # TODO: set when configured a SSL Cert.
+ENABLE_SECURITY: bool = os.getenv('ENABLE_SECURITY', 'false') == 'true'
+SESSION_COOKIE_SECURE = ENABLE_SECURITY
+CSRF_COOKIE_SECURE = ENABLE_SECURITY
 SECURE_REFERRER_POLICY = 'same-origin'
 
+# This prevents issues with FileField/ImageField URLs
+if ENABLE_SECURITY:
+    SECURE_PROXY_SSL_HEADER = ('HTTP_X_FORWARDED_PROTO', 'https')
+
+