This program calculates correlation scores between columns in multiple alignment according to the new pair-to-pair substitution matrix. Platforms: The program is working on different Unix/Linux platforms. Specifically it was tested successfully on Linux RedHat, Linux Suse, Sun Unix and Silicon Graphics. The program was also successully tested on Windows and on cygwin installed on windows. Files included in this distribution: readme.txt: This documentation file PF00180_seed.fasta: An example for fasta format input multiple sequence alignment P2PConPred: Executable version of the program for Linux Makefile: Generic compilation program for Linux/Unix P2Pmat: The Pair to Pair substitution matrix. The format of tis file should be maintained as is this file is read by the program. The only freedom is that line s start with "#" are remark lines and do not read by the program. P2Pmat_light: An easily parsable version of the matrix where the scores are separated by tabs. This version is not read by the p2pConPred program, but could be easily read by other applications make_sg: Compilation command for Silicon Graphics make_sun: Compilation command for SUN make_linux: Compilation command for Linux make_cygwin: Compilation command for Cygwin P2PConPreWin.exe: Executable version for windows P2PConPred.cpp C++ source code. This source was successfully compiled on all platforms mentioned here Compilation: Run: > make_x where x is the specific platform you are working on. To run the program: > p2pConPred -i [multiple sequence alignment file] -m [pair to pair substitution matrix] -o [output file] program options: -i : name/path of input multiple sequence alignment file. This file should be in fasta format. -m : name/path of the pair to pair substitution matrix. If -m flag is not specified the program uses the matrix in the P2Pmat file which should be located in the same directory as the executable file. -o : optional output file. The default, in case the -o flag is not specified, is the standard output (the screen). The format of this output file is: 1 2 3 4 5 6 7 8 9 0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890 23 E 279 T 54 54 54 54 2.42 2.79 0.19 0.07 0.10 0.54 23 E 280 T 54 54 54 54 2.42 2.49 0.19 0.17 0.05 0.52 ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 field 1 columns 0 - 3 sequence index of residue i (the first position of the sequence is assigned index 1) field 2 column 5 amino acid type of the residue i field 3 columns 7 - 10 sequence index of residue j field 4 column 12 amino acid type of residue j field 5 columns 14 - 17 number of sequences having an amino acid (and not a gap) at position i field 6 columns 19 - 22 number of sequences having an amino acid (and not a gap) at position j field 7 columns 24 - 27 number of sequences having an amino acid (and not a gap) at both positions i and j field 8 columns 29 - 32 number of sequences in the alignment field 9 columns 34 - 38 sequence entropy at position i (based on Shannon's entropy definition) field 10 columns 40 - 44 sequence entropy at position j field 11 columns 46 - 50 sequence conservation at position i (in a scale between 0 and 1 (most conserved)) field 12 columns 52 - 56 sequence conservation in position j (in a scale between 0 and 1) field 13 columns 59 - 63 correlation score field 14 columns 66 - 70 standard deviation of the score with respect to all pairwise scores A value of -9 for the correlation score indicates that less than 10% of the sequences have amino acid present at one of the positions. Calculation of pair to pair substitution is not feasible or not meaningful in this case. The program creates an additional output file "cormat" which includes only the correlation scores within a matrix in an easily parsed format, where the scores are separated by tabs. For questions and suggestions please contact: eyal@ccbb.pitt.edu