Description
Hi, I am running into error when I run TOGA at split_cesar_jobs step:
My code is:
./toga.py /work/dk_lab/Cactus-447way/paf2chain/Lemur_catta.chain /work/dk_lab/Cactus-447way/Input_chain/toga.transcripts.bed /work/dk_lab/Cactus-447way/All_fasta/new_Twobit/Homo_sapiens.2bit /work/dk_lab/Cactus-447way/All_fasta/new_Twobit/Lemur_catta.2bit --kt --pn TOGA_Lemur_minimap -i /work/dk_lab/Cactus-447way/Input_chain/toga.isoforms.tsv --nc /work/dk_lab/Cactus-447way/TOGA/TOGA/nextflow_config_files/ --cb 16,32 --cjn 300 --msc 100 --ncf
I tried to edit the number of buckets into --buckets 8,16,32,64,128,256,512,999 based on suggestion on issue #121
My error message stays the same regardless and looks like this:
- selected chain class to annotate transcript ENST00000491939.TXNRD2: ORTH
- selected chain class to annotate transcript ENST00000370655.ANKRD2: ORTH
split_cesar_jobs: number of transcripts to create CESAR jobs: 37734
split_cesar_jobs: total number of 104782 transcript/chain pairs
split_cesar_jobs: skipped total of 3 transcripts
split_cesar_jobs: out of them, transcripts not intersected by chains: 3
split_cesar_jobs: assigning MISSING class to 3 transcripts not intersected by any chain
split_cesar_jobs: creating a list of RAM-limit buckets based on user arguments
split_cesar_jobs: defined memory limit: 999, RAM-limit buckets: {8: [], 16: [], 32: [], 64: [], 128: [], 256: [], 512: [], 999: []} (to be filled with CESAR jobs)
split_cesar_jobs: reading bed file /work/dk_lab/Cactus-447way/TOGA/TOGA/TOGA_Lemur_minimap/temp/toga_filt_ref_annot.bed
split_cesar_jobs: got data for 39664 transcripts
split_cesar_jobs: reading transcript fragments data from /work/dk_lab/Cactus-447way/TOGA/TOGA/TOGA_Lemur_minimap/temp/gene_fragments.txt
split_cesar_jobs: got data for 22965 transcripts potentially fragmented in the query genome
split_cesar_jobs: precomputing query regions for each transcript/chain pair
split_cesar_jobs: batch size: 37734
split_cesar_jobs: first, invert gene-to-chains dict to chain-to-genes
split_cesar_jobs: for each of 42342 involved chains, precompute regions
Error! Requested range num 0 lies in the chromosome chr1 meanwhile the chain covers chrom ????U in target genome
Traceback (most recent call last):
File "/work/dk_lab/Cactus-447way/TOGA/TOGA/./split_exon_realign_jobs.py", line 1054, in
main()
File "/work/dk_lab/Cactus-447way/TOGA/TOGA/./split_exon_realign_jobs.py", line 873, in main
regions, skipped_2, predef_glp = precompute_regions(
^^^^^^^^^^^^^^^^^^^
File "/work/dk_lab/Cactus-447way/TOGA/TOGA/./split_exon_realign_jobs.py", line 505, in precompute_regions
chain_coords_conv_out.append(raw_ch_conv_out[i].decode("utf-8"))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 6: invalid start byte
Is there anything you'd suggest to fix the issue?
Thank you