GENESEQ

 


Q: What does GENESEQ contain and when did coverage start?

A: GENESEQ contains sequence information taken from patent applications and granted basic patents (all nucleic acids which are 10 or more bases in length, amino acids which are four or more residues in length plus all PCR primers and probes, of any length are included in the database).

Sequences from patents from 1981 onwards are included in GENESEQ.


Q: How does GENESEQ on STN (DGENE) differ from GENESEQ flat file or in GCG format?

A: The content of the GENESEQ database is identical on all platforms and with all the various software packages. The only differing characteristics are the software packages which are used to search the GENESEQ database. However, some of the formats are different. In particular, sequence Accession numbers in DGENE are preceded with YYYYN - for nucleotides, and YYYYP - for polypeptides/peptides, where YYYY is the year of the WPIL accession number for the patent.


Q: What are the main differences between the GENESEQ software packages?

A:

  • GENESEQ on STN uses the standard STN command languages to search the entire record apart from the SEQ field, i.e. the sequence. To search the sequence part of the record, the run GERAEQ and run GETSIM packages are used. GETSEQ and GETSIM do not have to be purchased as separate software packages. GETSEQ will allow the following types of string searches - exact polypeptide searches, exact polypeptide searches with family-equivalent substitutions, subsequence polypeptide searches with family-equivalent substitutions, exact nucleic acid searches and subsequence nucleic acid searches

    Subsequence searches will allow the use of variability symbols (e.g. repeat preceding sequence m times etc.) and specified gaps.

    GETSIM performs similarity (homologous) sequence searches, and retrieves polypeptide sequences which include the exact or similar sequence query and assigns a similarity score.

  • GENESEQ in GCG format is available for use with the Wisconsin Package produced by GCG (Genetics Computer Group).

  • Q: What are the differences between GENESEQ and other sequence databases?

    A: The major difference is that GENESEQ only covers sequences which appear in patents. Every patent in intellectually scanned and processed, which means that there is no unnecessary duplication in the database. All records have the same sort of content and format and only the most relevant information is provided. The patent issuing authorities coverage is greater in GENESEQ than in other databases which cover patent information: GENESEQ covers 40 patent issuing authorities.

    GENESEQ concentrates on both the bioinformatic and the IP aspect of the sequence wherease the other databases tend to concentrate just on the bioinformatic aspects.


    Q: How can I search for D-amino acid residues?

    A: D-amino acid residues are represented as the parent amino acid in the sequence and the Feature Table will show the positon of any D-modified amino acids (annotated by "Misc-difference").


    Q: What are the selection rules for GENESEQ and GENESEQ FASTAlert?

    A: All sequences appearing in the patents claims section are included in GENESEQ and GENESEQ FASTAlert. All other sequences not specifically stated to be known or to have been published are included. Specific examples of generically claimed peptides are included.

    Note: GENESEQ and GENESEQ FASTAlert only cover basic patents. A basic patent is one in which Thomson Scientific has seen the details of this invention published for the first time.


    Q: How can I search cyclic peptides in GENESEQ?

    A: Al cyclic peptides have the word "cyclic" appearing in the Keyword field. Cyclic peptides can only be searched by carrying out a series of searches with each amino acid residue acting as the first residue in the query sequence.


    Q: Does GENESEQ have single or double stranded DNA?

    A: GENESEQ covers both single-stranded and double-stranded DNA. The two types are distinguished by the terms "ss" and "ds" in the keyword field. However, in the case of double-stranded DNA, only the 5' to 3' strand in input. This follows the USPTO requirements for all nucleotide sequences in the sequence listing section, which have to be presented on diskette, as single stranded, 5' to 3', left to right (for more information see "An overview of the PTO sequence rules". The Law Works, February 1996, page 4).