GeneStruct user manual


Author list

  • rongzhengqin@basepedia.com
  • zhush@basepedia.com

Most important Class object

  • Transcript
  • Gene


Transcript object

1. to creat transcript instance:
snippet.python
transcript = Transcript(transcriptid,transcriptname="",strand = ".",chrom=None,exons = [], transcript_type=None,leftmost=None,rightmost=None,transcript_source=None,parsed=0,transcript_status=None)

Here, leftmost and rightmost are ORF's leftmost and rightmost postion on genome. 0-based, [leftmost,rightmost), exons = [[s1,e1],[s2,e2],…], also s1,e2 is 0-based [s1,e2) on genome.

2. add exon to a transcript, once initialized with no exons
snippet.python
transcript.add_exon(start,end) # 0-based [start,end) on genome, start < end 
3. After add all exons to a transcript, you can parse the transcript to get utr,intron,cds ...
snippet.python
transcript.parse_transcript()

then

  • parsed utr3 ⇒ transcript.utr3 (a list, include [[s1,e1),[s2,e2),[s3,e3), … ] ) 0-based,left include,right exclude
  • parsed intron ⇒ transcript.intron (a list like utr3)
  • parsed exon ⇒ transcript.exon (a list like utr3)
  • parsed utr5 ⇒ trancript.utr5 (a list like utr3)

other informations

  • transcript.transcriptid
  • transcript.transcriptname ⇒ name or None
  • transcript.chrom
  • transcript.strand ⇒ [“+”,“-”,“.”]
  • transcript.leftmost
  • transcript.rightmost ⇒ record ORF region [leftmost,rightmost), leftmost < rightmost on genome

Gene object

1. creat Gene instance
snippet.python
gene = Gene(geneid,genename="",gene_type=None,gene_source=None,gene_status=None)
2. add transcript to a gene
snippet.python
gene.add_transcript(Transcript_instance) 
# return the Transcript_instance
3. get transcipt from a gene
snippet.python
gene.get_transcript(tid,transcriptname="",strand = ".",chrom=None,exons = [], transcript_type=None,leftmost=None,rightmost=None,transcript_source=None,parsed=0,transcript_status=None)
# if tid is already include, it will get transcript by id, if not exist, it will create a trancript instance and add it to this gene with optional parameter
4. to iterate all transcript of one gene
snippet.python
for transcript in gene.transcripts:
	print transcript
5. after add all transcripts to gene, then parse gene and its transcripts
snippet.python
gene.parse_gene()

then return

gene.geneid      => gene id
gene.genename    => gene name, offical symbol
gene.gene_type   => gene type
gene.gene_source => gene source
gene.gene_status => status
gene.transcripts => record genes‘ transcripts, a dict object, transcript id => transcript instance
gene.gene_start  => gene leftmost on genome
gene.gene_stop   => gene rightmost on genome
gene.chrom       => gene located chromosome
gene.strand      => location strand on genome
6. gene output like gtf string
snippet.python
gene.togtf()
7. gene output like refgene string
snippet.python
gene.torefgene()

Useful functions

Beside above, we supply 2 useful functions to parse gtf file and refgene file

1. to read gtf file

snippet.python
annoregion2 = Annoregion2(fmt="gtf",gattr="gene_name",tattr="transcript_name",gidattr="gene_id",tidattr="transcript_id",genetypeattr="gene_type",transcripttypeattr="transcript_type",tanno="exon,CDS,UTR")
annoregion2.gtf2exons(gtf_filename) 
# then  all genes is in 
annoregion2.h  # which is a dict, include all genes,  key => value, key is geneid, value is corresponding Gene_instance

2. to read refgene file

snippet.python
hgene = readrefgene(refgene_filename) # to return a dict, {geneid => Gene_instance}

3. to write genes to gtf or refgene files

we supply a script to output the hgene dict, here the hgene dict is(geneid ⇒ gene_instance)

snippet.python
gene2file(hgene,fmt="gtf",outputprefix = "test.") # fmt = 'gtf' or 'refgene'
  • 公共/software/genestruct.txt
  • 最后更改: 8年前
  • 由 rongzhengqin