split_cesar_jobs error

Hi, I am running into error when I run TOGA at split_cesar_jobs step:

My code is:
./toga.py /work/dk_lab/Cactus-447way/paf2chain/Lemur_catta.chain /work/dk_lab/Cactus-447way/Input_chain/toga.transcripts.bed /work/dk_lab/Cactus-447way/All_fasta/new_Twobit/Homo_sapiens.2bit /work/dk_lab/Cactus-447way/All_fasta/new_Twobit/Lemur_catta.2bit --kt --pn TOGA_Lemur_minimap -i /work/dk_lab/Cactus-447way/Input_chain/toga.isoforms.tsv --nc /work/dk_lab/Cactus-447way/TOGA/TOGA/nextflow_config_files/ --cb 16,32 --cjn 300 --msc 100 --ncf

I tried to edit the number of buckets into  --buckets 8,16,32,64,128,256,512,999 based on suggestion on issue #121 


My error message stays the same regardless and looks like this:
* selected chain class to annotate transcript ENST00000491939.TXNRD2: ORTH
* selected chain class to annotate transcript ENST00000370655.ANKRD2: ORTH
split_cesar_jobs: number of transcripts to create CESAR jobs: 37734
split_cesar_jobs: total number of 104782 transcript/chain pairs
split_cesar_jobs: skipped total of 3 transcripts
split_cesar_jobs: out of them, transcripts not intersected by chains: 3
split_cesar_jobs: assigning MISSING class to 3 transcripts not intersected by any chain
split_cesar_jobs: creating a list of RAM-limit buckets based on user arguments
split_cesar_jobs: defined memory limit: 999, RAM-limit buckets: {8: [], 16: [], 32: [], 64: [], 128: [], 256: [], 512: [], 999: []} (to be filled with CESAR jobs)
split_cesar_jobs: reading bed file /work/dk_lab/Cactus-447way/TOGA/TOGA/TOGA_Lemur_minimap/temp/toga_filt_ref_annot.bed
split_cesar_jobs: got data for 39664 transcripts
split_cesar_jobs: reading transcript fragments data from /work/dk_lab/Cactus-447way/TOGA/TOGA/TOGA_Lemur_minimap/temp/gene_fragments.txt
split_cesar_jobs: got data for 22965 transcripts potentially fragmented in the query genome
split_cesar_jobs: precomputing query regions for each transcript/chain pair
split_cesar_jobs: batch size: 37734
split_cesar_jobs: first, invert gene-to-chains dict to chain-to-genes
split_cesar_jobs: for each of 42342 involved chains, precompute regions
Error! Requested range num 0 lies in the chromosome chr1 meanwhile the chain covers chrom ????U in target genome
Traceback (most recent call last):
  File "/work/dk_lab/Cactus-447way/TOGA/TOGA/./split_exon_realign_jobs.py", line 1054, in <module>
    main()
  File "/work/dk_lab/Cactus-447way/TOGA/TOGA/./split_exon_realign_jobs.py", line 873, in main
    regions, skipped_2, predef_glp = precompute_regions(
                                     ^^^^^^^^^^^^^^^^^^^
  File "/work/dk_lab/Cactus-447way/TOGA/TOGA/./split_exon_realign_jobs.py", line 505, in precompute_regions
    chain_coords_conv_out.append(raw_ch_conv_out[i].decode("utf-8"))
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 6: invalid start byte

Is there anything you'd suggest to fix the issue?
Thank you



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

split_cesar_jobs error #211

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

split_cesar_jobs error #211

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions