公共:software:genestruct [BasePedia]

GeneStruct user manual

Author list

rongzhengqin@basepedia.com
zhush@basepedia.com

Most important Class object

Transcript
Gene

Transcript object

1. to creat transcript instance:

transcript = Transcript(transcriptid,transcriptname="",strand = ".",chrom=None,exons = [], transcript_type=None,leftmost=None,rightmost=None,transcript_source=None,parsed=0,transcript_status=None)

Here, leftmost and rightmost are ORF's leftmost and rightmost postion on genome. 0-based, [leftmost,rightmost), exons = [[s1,e1],[s2,e2],…], also s1,e2 is 0-based [s1,e2) on genome.

2. add exon to a transcript, once initialized with no exons

snippet.python

transcript.add_exon(start,end) # 0-based [start,end) on genome, start < end

3. After add all exons to a transcript, you can parse the transcript to get utr,intron,cds ...

snippet.python

transcript.parse_transcript()

then

parsed utr3 ⇒ transcript.utr3 (a list, include [[s1,e1),[s2,e2),[s3,e3), … ] ) 0-based，left include，right exclude
parsed intron ⇒ transcript.intron （a list like utr3）
parsed exon ⇒ transcript.exon （a list like utr3）
parsed utr5 ⇒ trancript.utr5 （a list like utr3）

other informations

transcript.transcriptid
transcript.transcriptname ⇒ name or None
transcript.chrom
transcript.strand ⇒ [“+”,“-”,“.”]
transcript.leftmost
transcript.rightmost ⇒ record ORF region [leftmost,rightmost), leftmost < rightmost on genome

Gene object

1. creat Gene instance

snippet.python

gene = Gene(geneid,genename="",gene_type=None,gene_source=None,gene_status=None)

2. add transcript to a gene

snippet.python

gene.add_transcript(Transcript_instance) 
# return the Transcript_instance

3. get transcipt from a gene

snippet.python

gene.get_transcript(tid,transcriptname="",strand = ".",chrom=None,exons = [], transcript_type=None,leftmost=None,rightmost=None,transcript_source=None,parsed=0,transcript_status=None)
# if tid is already include, it will get transcript by id, if not exist, it will create a trancript instance and add it to this gene with optional parameter

4. to iterate all transcript of one gene

snippet.python

for transcript in gene.transcripts:
	print transcript

5. after add all transcripts to gene, then parse gene and its transcripts

snippet.python

gene.parse_gene()

then return

gene.geneid      => gene id
gene.genename    => gene name, offical symbol
gene.gene_type   => gene type
gene.gene_source => gene source
gene.gene_status => status
gene.transcripts => record genes‘ transcripts, a dict object, transcript id => transcript instance
gene.gene_start  => gene leftmost on genome
gene.gene_stop   => gene rightmost on genome
gene.chrom       => gene located chromosome
gene.strand      => location strand on genome

6. gene output like gtf string

snippet.python

gene.togtf()

7. gene output like refgene string

snippet.python

gene.torefgene()

Useful functions

Beside above, we supply 2 useful functions to parse gtf file and refgene file

1. to read gtf file

snippet.python

annoregion2 = Annoregion2(fmt="gtf",gattr="gene_name",tattr="transcript_name",gidattr="gene_id",tidattr="transcript_id",genetypeattr="gene_type",transcripttypeattr="transcript_type",tanno="exon,CDS,UTR")
annoregion2.gtf2exons(gtf_filename) 
# then  all genes is in 
annoregion2.h  # which is a dict, include all genes,  key => value, key is geneid, value is corresponding Gene_instance

2. to read refgene file

snippet.python

hgene = readrefgene(refgene_filename) # to return a dict, {geneid => Gene_instance}

3. to write genes to gtf or refgene files

we supply a script to output the hgene dict, here the hgene dict is（geneid ⇒ gene_instance）

snippet.python

gene2file(hgene,fmt="gtf",outputprefix = "test.") # fmt = 'gtf' or 'refgene'