GeneStruct user manual
Author list
- rongzhengqin@basepedia.com
- zhush@basepedia.com
Most important Class object
- Transcript
- Gene
Transcript object
1. to creat transcript instance:
- snippet.python
transcript = Transcript(transcriptid,transcriptname="",strand = ".",chrom=None,exons = [], transcript_type=None,leftmost=None,rightmost=None,transcript_source=None,parsed=0,transcript_status=None)
Here, leftmost and rightmost are ORF's leftmost and rightmost postion on genome. 0-based, [leftmost,rightmost), exons = [[s1,e1],[s2,e2],…], also s1,e2 is 0-based [s1,e2) on genome.
2. add exon to a transcript, once initialized with no exons
- snippet.python
transcript.add_exon(start,end) # 0-based [start,end) on genome, start < end
3. After add all exons to a transcript, you can parse the transcript to get utr,intron,cds ...
- snippet.python
transcript.parse_transcript()
then
- parsed utr3 ⇒ transcript.utr3 (a list, include [[s1,e1),[s2,e2),[s3,e3), … ] ) 0-based,left include,right exclude
- parsed intron ⇒ transcript.intron (a list like utr3)
- parsed exon ⇒ transcript.exon (a list like utr3)
- parsed utr5 ⇒ trancript.utr5 (a list like utr3)
other informations
- transcript.transcriptid
- transcript.transcriptname ⇒ name or None
- transcript.chrom
- transcript.strand ⇒ [“+”,“-”,“.”]
- transcript.leftmost
- transcript.rightmost ⇒ record ORF region [leftmost,rightmost), leftmost < rightmost on genome
Gene object
1. creat Gene instance
- snippet.python
gene = Gene(geneid,genename="",gene_type=None,gene_source=None,gene_status=None)
2. add transcript to a gene
- snippet.python
gene.add_transcript(Transcript_instance) # return the Transcript_instance
3. get transcipt from a gene
- snippet.python
gene.get_transcript(tid,transcriptname="",strand = ".",chrom=None,exons = [], transcript_type=None,leftmost=None,rightmost=None,transcript_source=None,parsed=0,transcript_status=None) # if tid is already include, it will get transcript by id, if not exist, it will create a trancript instance and add it to this gene with optional parameter
4. to iterate all transcript of one gene
- snippet.python
for transcript in gene.transcripts: print transcript
5. after add all transcripts to gene, then parse gene and its transcripts
- snippet.python
gene.parse_gene()
then return
gene.geneid => gene id gene.genename => gene name, offical symbol gene.gene_type => gene type gene.gene_source => gene source gene.gene_status => status gene.transcripts => record genes‘ transcripts, a dict object, transcript id => transcript instance gene.gene_start => gene leftmost on genome gene.gene_stop => gene rightmost on genome gene.chrom => gene located chromosome gene.strand => location strand on genome
6. gene output like gtf string
- snippet.python
gene.togtf()
7. gene output like refgene string
- snippet.python
gene.torefgene()
Useful functions
Beside above, we supply 2 useful functions to parse gtf file and refgene file
1. to read gtf file
- snippet.python
annoregion2 = Annoregion2(fmt="gtf",gattr="gene_name",tattr="transcript_name",gidattr="gene_id",tidattr="transcript_id",genetypeattr="gene_type",transcripttypeattr="transcript_type",tanno="exon,CDS,UTR") annoregion2.gtf2exons(gtf_filename) # then all genes is in annoregion2.h # which is a dict, include all genes, key => value, key is geneid, value is corresponding Gene_instance
2. to read refgene file
- snippet.python
hgene = readrefgene(refgene_filename) # to return a dict, {geneid => Gene_instance}
3. to write genes to gtf or refgene files
we supply a script to output the hgene dict, here the hgene dict is(geneid ⇒ gene_instance)
- snippet.python
gene2file(hgene,fmt="gtf",outputprefix = "test.") # fmt = 'gtf' or 'refgene'