-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Description
moved from Psy-Fer/buttery-eel#76 (comment)
Hi, Hasindu! Here you go:
I generated synthetic data using squigulator then basecalled with buttery-eel. There seems to be much more indels and lower base quality scores with the generated synthetic data than the actual data. Below is an IGV screenshot (upper panel: squiggulator+buttery-eel data; bottom panel:actual amplicon data sequenced in a R10 flowcell, lib prep kit NBD114, basecalled with SUP)
Here are the commands used to generate the synthetic data:
config=dna_r10.4.1_e8.2_400bps_sup.cfg
#create artificial datasets
time $squigulator -x dna-r10-prom -f ${d} -t 8 -r ${r} -q $outdir/${i}_${d}"x"_${r}ideal_${n}.fasta \
--bps 400 --ont-friendly=yes $ref/${i}.fasta -o $datadir/${i}_${d}"x"_${r}_${n}.blow5
#basecall
time buttery-eel -g $basecaller --config $config --device cuda:1 -i $datadir/${i}_${d}"x"_${r}_${n}.blow5 -o $outdir/${i}_${d}"x"_${r}_buttery-eel_${n}.fastq \
--port auto --use_tcp --dorado_download_path $dorado_download_path
The mean baseQs are 13.3 for the synthetic data and 34.8 for the actual data.
Thank you in advance for your help.
Metadata
Metadata
Assignees
Labels
No labels