Problems in preprocessing step with nanopolish

Hello,

I am trying to replicate your results using CHEUI, but I'm encountering issues with nanopolish during the preprocessing steps.

When generating the index, I receive the following error (example for one sample):
```
[readdb] indexing ${WORKDIR}/fast5/HeLa_all_fast5_M11A_KO/data/xc17/pm1122/SRA/HeLa/fast5s/all_fast5_M11A_KO/  
[readdb] num reads: 1260168, num reads with path to fast5: 630084  
fast5 files could not be located for 630084 reads  
```

If I then run nanopolish eventalign, the output file is empty. Do you know what might be causing this issue? I'm unsure whether it’s related to the downloaded samples or an error in my code. Below, I’ve included the code I used, in case it helps identify any potential problems. Thank you very much for your help!

I downloaded both the fastq and fast5 files (HeLa WT and KO samples) from this link: https://trace.ncbi.nlm.nih.gov/Traces/study/?acc=SRP393373&o=acc_s%3Aa
I also downloaded the transcriptome version you used:
https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_38/gencode.v38.transcripts.fa.gz

For the alignment step, I followed the instructions in your documentation. For nanopolish, I used version 0.14.0 (this is the only version available on the cluster that I'm using):

```
#!/bin/bash
#SBATCH -t 36:00:00 # execution time
#SBATCH --mem=30G
#SBATCH -c 24

module load cesga/2020 gcccore/system nanopolish/0.14.0

FASTQ_DIR=${WORKDIR}/fastq
FAST5_DIR=${WORKDIR}/fast5
BAM_DIR=${WORKDIR}/01_aligments
REF=${WORKDIR}/reference/gencode.v38.transcripts.fa.gz
OUTDIR_BASE=${WORKDIR}/02_nanopolish

mkdir -p ${OUTDIR_BASE}

# index for nanopolish

nanopolish index -d ${FAST5_DIR}/HeLa_all_fast5_M11A_KO/data/xc17/pm1122/SRA/HeLa/fast5s/all_fast5_M11A_KO/ ${FASTQ_DIR}/HeLa_M11A_KO.fastq
nanopolish index -d ${FAST5_DIR}/HeLa_all_fast5_M11A_WT/data/xc17/pm1122/SRA/HeLa/fast5s/all_fast5_M11A_WT/ ${FASTQ_DIR}/HeLa_M11A_WT.fastq

nanopolish index -d ${FAST5_DIR}/HeLa_all_fast5_M11B_KO/data/xc17/pm1122/SRA/HeLa/fast5s/all_fast5_M11B_KO/ ${FASTQ_DIR}/HeLa_M11B_KO.fastq
nanopolish index -d ${FAST5_DIR}/HeLa_all_fast5_M11B_WT/data/xc17/pm1122/SRA/HeLa/fast5s/all_fast5_M11B_WT/ ${FASTQ_DIR}/HeLa_M11B_WT.fastq

nanopolish index -d ${FAST5_DIR}/HeLa_all_fast5_M11C_KO/data/xc17/pm1122/SRA/HeLa/fast5s/all_fast5_M11C_KO/ ${FASTQ_DIR}/HeLa_M11C_KO.fastq
nanopolish index -d ${FAST5_DIR}/HeLa_all_fast5_M11C_WT/data/xc17/pm1122/SRA/HeLa/fast5s/all_fast5_M11C_WT/ ${FASTQ_DIR}/HeLa_M11C_WT.fastq

if compgen -G "${FASTQ_DIR}/*.fastq" > /dev/null; then
    for FASTQ_FILE in "${FASTQ_DIR}"/*.fastq; do
        
        SAMPLE_NAME=$(basename "${FASTQ_FILE}" .fastq)
        BAM_FILE="${BAM_DIR}/${SAMPLE_NAME}_s.bam"

        if [[ -f "${BAM_FILE}" ]]; then
            OUTFILE="${OUTDIR_BASE}/${SAMPLE_NAME}.txt"

            mkdir -p "$(dirname "${OUTFILE}")"

            # Launch nanopolish
            echo "Running Nanopolish for $SAMPLE_NAME"
            nanopolish eventalign\
                -t 24 \
                --reads "${FASTQ_FILE}" \
                --bam "${BAM_FILE}" \
                --genome "${REF}" \
                --scale-events --signal-index --samples --print-read-names > "${OUTFILE}" 2> "${OUTFILE}.err"
            echo "Nanopolish eventalign ended"
        fi
    done
fi
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Problems in preprocessing step with nanopolish #44

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Problems in preprocessing step with nanopolish #44

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions