Loading Study Data to cbioportal.ca

Live on cbioportal2

Introduction

This document outlines the steps you will need to follow to load your study data into cbioportal.ca. The data flow diagram below provides the information about the inputs and outputs of each process.

General Overview

Getting your study data automatically loaded into cBioPortal requires four steps:

Prepare your study
Validate your study
Archive your study
Load your study

Prepare your study

A study needs to be loaded in cBioPortal can basically consist of a directory where all the data files are located. Each data file needs a meta file that refers to it and both files need to comply with the format required for the specific data type. The format and fields expected for each file are documented in the File Formats page on the cBioPortal web site. Below is an example of the files in such a directory.

dir
|-meta_study.txt
|-meta_cancer_type.txt -> cancer_type.txt
|-meta_clinical.txt -> data_clinical.txt
|-meta_[expression|mutations|CNA|etc] -> data_[expression|mutations|CNA|etc]

Rules

There are a few rules you will need to follow:

meta_study, meta_clinical and respective clinical data files are the only mandatory files.
cancer type files can be mandatory if the study is referring to a cancer type that does not yet exist in the DB.
meta files can be named anything, as long as it starts or ends with name 'meta'. (E.g. meta_test, meta.test, test.meta are all fine) metal_test and metastudy are wrong.
data files can be named anything and are referenced by the property data_filename set in the meta file.

Validate your study

Once all files are in place and follow the proper format, it is recommended that you validate your files using a local staging instance. You can skip this step if you do not have any local instance running. The validation has been integrated into the cbioportal data loading python script. The importing script validates the data and then loads it if the validation succeeds.

Archive your study

To create a tar.gz archive from your study folder you can use the following command. Currently, only the tar.gz file format is supported.

tar zcvf study_name.tar.gz ./study_folder

Load your study

To load your study into cBioPortal, all you need to do is drop your archived data file into the designated drop off sites.

Currently, we have two dropoff sites:

OICR Users: /data/cbioportal_uploads on ociwire transfer data nodes
UHN Users: /data/cbioportal/cbio-env/dropoff on merry

The cbio watchdog, a python module running as a background process, will pick it up and execute the importing script of cBioPortal. This script first validates the data and, if the validation succeeds, it starts loading the data. If any errors occur during the validiting or loading, they are logged for troubleshooting. If all goes well, your study should become availabe at cbioportal.ca. All dropoffs will be deleted after the loading process is finished.

Notes

For a new study, you will need to contact your project manager to help with the following:

registering your new study with the UHN KeyCloak Server
granting you access to the new study

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.DS_Store		.DS_Store
README.md		README.md
cbio-watchdog.py		cbio-watchdog.py
cbioportal-auto-data-loading-workflow.png		cbioportal-auto-data-loading-workflow.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Loading Study Data to cbioportal.ca

Introduction

General Overview

Prepare your study

Rules

Validate your study

Archive your study

Load your study

Notes

About

Uh oh!

Releases

Packages

Languages

pughlab/cbio-auto-import

Folders and files

Latest commit

History

Repository files navigation

Loading Study Data to cbioportal.ca

Introduction

General Overview

Prepare your study

Rules

Validate your study

Archive your study

Load your study

Notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages