- 
                Notifications
    You must be signed in to change notification settings 
- Fork 8
Open
Description
I was going through your code for calculating feature length but I see you use exon rather than the gene field of the gtf. I was trying to use this code for NOISeq so was wondering which is the right way. I see gene would be more appropriate, wouldn't it?
Answers/SEQanswers_42420/GTF2LengthGC.R
Line 10 in d2d7366
| GTF <- import.gff(GTFfile, format="gtf", genome="GRCm38.71", asRangedData=F, feature.type="exon") | 
Besides elementNROWS is now used instead of elementLengths. I understand what you wrote was almost a decade ago but people refer to your post regularly as they are good and reliable.
library(GenomicRanges)
library(rtracklayer)
library(Rsamtools)
GTFfile = "../../resource/gencode.v45.primary_assembly.annotation.gtf"
FASTAfile = "../../resource/GRCh38.primary_assembly.genome.fa"
GTF <- import.gff(GTFfile, format="gtf", genome="GRCh38", feature.type="gene")
grl <- reduce(split(GTF, elementMetadata(GTF)$gene_id))
reducedGTF <- unlist(grl, use.names=T)
elementMetadata(reducedGTF)$gene_id <- rep(names(grl), elementNROWS(grl))
#Open the fasta file
FASTA <- FaFile(FASTAfile)
open(FASTA)
#Add the GC numbers
elementMetadata(reducedGTF)$nGCs <- letterFrequency(getSeq(FASTA, reducedGTF), "GC")[,1]
elementMetadata(reducedGTF)$widths <- width(reducedGTF)
#Create a list of the ensembl_id/GC/length
calc_GC_length <- function(x) {
  nGCs = sum(elementMetadata(x)$nGCs)
  width = sum(elementMetadata(x)$widths)
  c(width, nGCs/width)
}
output <- t(sapply(split(reducedGTF, elementMetadata(reducedGTF)$gene_id), calc_GC_length))
colnames(output) <- c("Length", "GC")
Metadata
Metadata
Assignees
Labels
No labels