Skip to content

split_cesar_jobs error #211

Open
Open
@KabitaBaral1

Description

@KabitaBaral1

Hi, I am running into error when I run TOGA at split_cesar_jobs step:

My code is:
./toga.py /work/dk_lab/Cactus-447way/paf2chain/Lemur_catta.chain /work/dk_lab/Cactus-447way/Input_chain/toga.transcripts.bed /work/dk_lab/Cactus-447way/All_fasta/new_Twobit/Homo_sapiens.2bit /work/dk_lab/Cactus-447way/All_fasta/new_Twobit/Lemur_catta.2bit --kt --pn TOGA_Lemur_minimap -i /work/dk_lab/Cactus-447way/Input_chain/toga.isoforms.tsv --nc /work/dk_lab/Cactus-447way/TOGA/TOGA/nextflow_config_files/ --cb 16,32 --cjn 300 --msc 100 --ncf

I tried to edit the number of buckets into --buckets 8,16,32,64,128,256,512,999 based on suggestion on issue #121

My error message stays the same regardless and looks like this:

  • selected chain class to annotate transcript ENST00000491939.TXNRD2: ORTH
  • selected chain class to annotate transcript ENST00000370655.ANKRD2: ORTH
    split_cesar_jobs: number of transcripts to create CESAR jobs: 37734
    split_cesar_jobs: total number of 104782 transcript/chain pairs
    split_cesar_jobs: skipped total of 3 transcripts
    split_cesar_jobs: out of them, transcripts not intersected by chains: 3
    split_cesar_jobs: assigning MISSING class to 3 transcripts not intersected by any chain
    split_cesar_jobs: creating a list of RAM-limit buckets based on user arguments
    split_cesar_jobs: defined memory limit: 999, RAM-limit buckets: {8: [], 16: [], 32: [], 64: [], 128: [], 256: [], 512: [], 999: []} (to be filled with CESAR jobs)
    split_cesar_jobs: reading bed file /work/dk_lab/Cactus-447way/TOGA/TOGA/TOGA_Lemur_minimap/temp/toga_filt_ref_annot.bed
    split_cesar_jobs: got data for 39664 transcripts
    split_cesar_jobs: reading transcript fragments data from /work/dk_lab/Cactus-447way/TOGA/TOGA/TOGA_Lemur_minimap/temp/gene_fragments.txt
    split_cesar_jobs: got data for 22965 transcripts potentially fragmented in the query genome
    split_cesar_jobs: precomputing query regions for each transcript/chain pair
    split_cesar_jobs: batch size: 37734
    split_cesar_jobs: first, invert gene-to-chains dict to chain-to-genes
    split_cesar_jobs: for each of 42342 involved chains, precompute regions
    Error! Requested range num 0 lies in the chromosome chr1 meanwhile the chain covers chrom ????U in target genome
    Traceback (most recent call last):
    File "/work/dk_lab/Cactus-447way/TOGA/TOGA/./split_exon_realign_jobs.py", line 1054, in
    main()
    File "/work/dk_lab/Cactus-447way/TOGA/TOGA/./split_exon_realign_jobs.py", line 873, in main
    regions, skipped_2, predef_glp = precompute_regions(
    ^^^^^^^^^^^^^^^^^^^
    File "/work/dk_lab/Cactus-447way/TOGA/TOGA/./split_exon_realign_jobs.py", line 505, in precompute_regions
    chain_coords_conv_out.append(raw_ch_conv_out[i].decode("utf-8"))
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 6: invalid start byte

Is there anything you'd suggest to fix the issue?
Thank you

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions