This repository contain the scripts generated in the project Genomic characterization of the wild-to-domesticated complex of Gossypium hirsutum in Mexico whitch is subdivided in three sections:
- Analysis of chloroplast genomes
- Analysis of nuclear genomes
- Location of transgenes in the genomes
- WGS
- Sequencing with Illumina NovaSeq 6000
- Library TruSeq DNA PCR free
- Paired-end
Software to analyze the chloroplast genomes:
Software to analyze the nuclear genome:
R packages:
- ggplot2
- gtable
- grid
- lattice
- phyloch
- strap
- phytools
- tidyverse
- GenomicRanges
Directory organization where the letter c and n are used after the number to indicate whether the script corresponds to the chloroplast or nuclear genome analysis, respectively:
+-- genomic_cotton
| +--bin/
| +--1_download_seq.sh
| +--2_quality_samples.sh
| +--3_transgenes_blast.sh
| +--3c_assembly_chloroplast.sh
| +--3n_clean_data.sh
| +--4c_mapping.sh
| +--4n_mapping.sh
| +--5c_sort_sequences.sh
| +--5n_sort_sequences.sh
| +--6c_quality_mapping.sh
| +--6n_quality_mapping.sh
| +--7c_find_variants.sh
| +--7n_find_variants.sh
| +--8c_variant_annotation.sh
| +--8n_short_variant_annotation.sh
| +--9_boxplot.R
| +--data/
| +-- transgen_db.fasta
| +-- README_data.md
| +--annotation/
| +--README_annotation.md
| +--Predicted gene alignments_TM-1_V2.1.gene.gff.gz
| +--blat_AD1_transcript_ZJU_g.hirsutum_cottongen_reftransV1.p97.len97.gff3.gz
| +--marker_alignment_blat_AD1_ZJU_SNP.p90.len97.gaplt2.gapszlt2.gff3.gz
| +--other_transcript_blat_AD1_TxJGI_g.hirsutum_cottongen_refTransV1.p97.len97.gff3.gz
| +--predicted_gene_alignment_Tx-JGI_G.hirsutum_v1.1.gene.gff3.gz
| +--annotation.bed****
| +--meta/
| +--id_samples.txt****
This directory contains the necessary scripts for do the structural variant analysis. Each script correspond to a step of workflow. Description each script:
1_download_seq.sh
to download sequences of projects in NCBI1.1_transgenes_blast.sh
to search of transgenes sequences genomes samples2_fastqc_samples.sh
to do the quality analysis of sequencing with fastqc3c_assembly_chloroplast.sh
to do the assembly of chloroplast genomes3n_clean_data.sh
to clean sequences with low quality4*_mapping.sh
to align sequences to reference genomes5*_sort_sequences.sh
to convert file .sam to .bam, and to sort the aligmented genomes with Picard6*_quality_mapping.sh
to evaluate the quality of mapping7*_find_variants.sh
to identified genomic variants8*_variant_annotation.sh
to annotated each genomic variant9_boxplot.R
to generated graphics with the number of genomic variants found
Contains the description about the data obtained, and the links to the repository of data where you can found .fastq files
Directory meta contain the additional information about the management of data
M. S. Melania Vega