Fehlertolerantes DNA - und Proteinretrieval

Schnelle Mustererkennung in DNA- und Protein-Sequenzen

(Fast Retrieval of DNA- and Protein-Sequences)

An innovative associative memory model -called SpaCAM- is used as a tool to solve information retrieval problems (pattern matching, pattern completion, pattern extraction) in a uniform, effective and efficient way. Its basic conceptions apply naturally to the problem of concordances in long strings. The device has an application to the field of biochemistry where it supplies hints for improved DNA and protein sequence search and information representation.

The new technique provides very fast retrieval as well as fast alignment of sequences in the database (e.g. complete EMBL Nucleotide Sequence Database).

Usually the output consists of

  1. a list of hits (e.g.: accession numbers and scores
    of the most similar sequences).
  2. alignments (e.g.: list of alignments with the most
    similar sequences).
  3. a distribution tree (e.g.: an x / y graph that shows the
    amount of hits per similarity level).

Participants of the Forschungsforum will be given the opportunity to test the retrieval engine on a Workstation.

In contrast to many conventional systems, where the search aims at k-tuples only, the concept of associative memory provides much more flexibility with respect to such or other basic information units. This feature was used in order to detect and analyze relevant patterns and improve pattern sets describing syntactic properties of sequences. Furthermore, the innovative technology will be used to introduce patterns carrying semantic information as well.

In cooperation with the researchers at the University of Hildesheim, the industrial developers connex software GmbH at Hildesheim and the Magic Works GmbH at Teltow have implemented a software solution MW-ANN, an engine which is based on this innovative technique, which is the core module of a commercial software application package.