To use all functions of this page, please activate cookies in your browser.
my.bionity.com
With an accout for my.bionity.com you can always see everything at a glance – and you can configure your own website and individual newsletter.
 My watch list
 My saved searches
 My saved topics
 My newsletter
Positionspecific scoring matrixA position weight matrix (PWM), also called positionspecific weight matrix (PSWM) or positionspecific scoring matrix (PSSM), is a commonly used representation of motifs (patterns) in biological sequences. A PWM is a matrix of score values that gives a weighted match to any given substring of fixed length. It has one row for each symbol of the alphabet, and one column for each position in the pattern. The score assigned by a PWM to a substring is defined as , where j represents position in the substring, s_{j} is the symbol at position j in the substring, and m_{α,j} is the score in row α, column j of the matrix. In other words, a PWM score is the sum of positionspecific scores for each symbol in the substring. Additional recommended knowledge
Basic PWM with loglikelihoodsA PWM assumes independence between positions in the pattern, as it calculates scores at each position independently from the symbols at other positions. The score of a substring aligned with a PWM can be interpreted as the loglikelihood of the substring under a product multinomial distribution. Since each column defines loglikelihoods for each of the different symbols, where the sum of likelihoods in a column equals one, the PWM corresponds to a multinomial distribution. A PWM's score is the sum of loglikelihoods, which corresponds to the product of likelihoods, meaning that the score of a PWM is then a productmultinomial distribution. The PWM scores can also be interpreted in a physical framework as the sum of binding energies for all nucleotides (symbols of the substring) aligned with the PWM. Incorporating background distributionInstead of using loglikelihood values in the PWM, as described in the previous paragraph, several methods uses logodds scores in the PWMs. An element in a PWM is then calculated as m_{i,j} = log(p_{i,j} / b_{i}), where p_{i,j} is the probability of observing symbol i at position j of the motif, and b_{i} is the probability of observing the symbol i in a background model. The PWM score then corresponds to the logodds of the substring being generated by the motif versus being generated by the background, in a generative model of the sequence. Information content of a PWMThe information content (IC) of a PWM is sometimes of interest, as it says something about how different a given PWM is from a uniform distribution. The selfinformation of observing a particular symbol at a particular position of the motif is − log(p_{i,j}). The expected (average) selfinformation of a particular element in the PWM is then . Finally, the IC of the PWM is then the sum of the expected selfinformation of every element: . Often, it is more useful to calculate the information content with the background letter frequencies of the sequences you are studying rather than assuming equal probabilities of each letter (e.g. the GCcontent of DNA of thermophilic bacteria range from 65.3 to 70.8 [1], thus a motif of ATAT would contain much more information than a motif of CCGG). The equation for information content thus becomes where p_{b} is the background frequency for that letter. References


This article is licensed under the GNU Free Documentation License. It uses material from the Wikipedia article "Positionspecific_scoring_matrix". A list of authors is available in Wikipedia. 