SUPERFAMILY 1.75 HMM library and genome assignments server

Evaluation Ruleset for SCOP 1.61 Benchmarks

Within SCOP there are certain evolutionary relationships between superfamilies. This web page is devoted to listing those exceptions for the purpose of benchmarking or evaluating sequence comparison procedures. This work is done as part of the SUPERFAMILY project, so please read and cite the reference below.

Caveat

These rules are derived by Julian Gough from SCOP but have not been checked or approved by the SCOP authors. This is provided 'as is' and may well have mistakes and limitations which I take no responsibility for.

Rules

The major rules are summarised here, if you are doing a SCOP-based evaluation then at least take these into consideration. There are other minor rules but they make little or no difference.

  • Generally relationships detected between domains belonging to the same superfamily are correct.

  • Relationships between domains belonging to the same fold are ambiguous and should be neither penalised not rewarded.

  • Relationships between domains belonging to different folds should generally be penalised.

  • Members of the TIM barrel fold are related. Treat this fold like a superfamily.

  • Members of the Rossmann folds are also distantly related, treat these as a single superfamily.

  • Families belonging to the superfamily of the Membrane-all-alpha superfamily are not necessarily related. These families should be treated like folds.


Assessments

Here are some assessments which have been carried out using these rules, or rules of this style:

  • Evaluation of CAFASP3 uses the exact sub-routine below.

  • Evaluation of LiveBench4 results use the exact sub-routine below.

  • The following paper uses an earlier ruleset on an older version of SCOP. (Madera, M. and Gough, J. 2002. "A comparison of profile hidden Markov model procedures for remote homology detection." Nucl. Acids Res., 30(19), 4321-4328.)

  • The following paper was where this particular ruleset was first started. Keeping SUPERFAMILY up to date requires updating and improving this ruleset with each new SCOP release. (Gough, J., Karplus, K., Hughey, R., and Chothia, C. 2001. "Assignment of Homology to Genome Sequences using a Library of Hidden Markov Models that Represent all Proteins of Known Structure." J. Mol. Biol., 313(4), 903-919.)

  • An earlier comparison also using the same major rules. ( Park, J., Karplus, K., Barrett, C., Hughey, R., Haussler, D., Hubbard, T. & Chothia, C. (1998). "Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods." J Mol Biol, 284(4), 1201-10.)

  • The original major rules were developed for this paper I believe. ( Brenner, S. E., Chothia, C. & Hubbard, T. J. (1998). Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. Proc Natl Acad Sci U S A, 95(11), 6073-8.)


Perl sub-routine

Click here for SCOP 1.69 ruleset.


sub Criteria {
#Gough, J., Karplus, K., Hughey, R., and Chothia, C. 2001. 
#"Assignment of Homology to Genome Sequences using a Library of Hidden Markov Models that Represent all Proteins of Known Structure." 
#J. Mol. Biol., 313(4), 903-919.
#
#Takes as an input two SCOP classifications, and returns the version (1.61), and a flag. 
#1 if they're the same, 0 if it's ambiguous, and -1 if they're different
my $flag=-1;
my $one=$_[1];
my $two=$_[2];
my $cf1;
my $cf2;
my $sf1;
my $sf2;
my $fa1;
my $fa2;
my %rossmann=('c.2',0,'c.3',0,'c.4',0,'c.5',0,'c.27',0,'c.28',0,'c.30',0,'c.31',0);
#these all have notes in SCOP

if ($one =~ /^(\w\.\d+)(\.\d+)(\.\d+)/){
$fa1="$1$2$3";
$sf1="$1$2";
$cf1="$1";
}
else{
print STDERR "Error parsing classification: $one\n";
}
if ($two =~ /^(\w\.\d+)(\.\d+)(\.\d+)/){
$fa2="$1$2$3";
$sf2="$1$2";
$cf2="$1";
}
else{
print STDERR "Error parsing classification: $two\n";
}

#Same fold ambiguous
if ($cf1 eq $cf2){
$flag=0;
}
#plain right
if ($sf1 eq $sf2){
$flag=1;
}
#Unless Membrane all-alpha
if (($sf1 eq 'f.2.1' or $sf2 eq 'f.2.1') and $fa1 ne $fa2){
$flag=-1;
}

#TIM barrels
if (($cf1 eq 'c.1' and $cf2 eq 'c.1')){
$flag=1;
}
#Rossmann-like
if (exists($rossmann{$cf1}) and exists($rossmann{$cf2})){
$flag=1;
}
#Rossmanns
if ((exists($rossmann{$cf1}) and $sf2 eq 'c.23.12') or (exists($rossmann{$cf2}) and $sf1 eq 'c.23.12')){
$flag=1;
}
#as of 1.57  c.23.12 looks like superposes OK
if ((exists($rossmann{$cf1}) and $cf2 eq 'c.66') or (exists($rossmann{$cf2}) and $cf1 eq 'c.66')){
$flag=1;
}
#Old note -correspondance checked-
if ((exists($rossmann{$cf1}) and $cf2 eq 'c.32') or (exists($rossmann{$cf2}) and $cf1 eq 'c.32')){
$flag=1;
}
#Old note -correspondance checked-
if ((exists($rossmann{$cf1}) and $cf2 eq 'c.108') or (exists($rossmann{$cf2}) and $cf1 eq 'c.108')){
$flag=1;
}
#There was a note in SCOP, 2 domains (1) alpha/beta with a Rossmann-fold topology, (2) 4-helical bundle 

#Other rules
if ((exists($rossmann{$cf1}) and $cf2 eq 'c.111') or (exists($rossmann{$cf2}) and $cf1 eq 'c.111')){
$flag=1;
}
#note: the ATP nucleotide-binding site is similar to that of the NAD-binding Rossmann-folds
if (($cf1 eq 'b.67' or $cf1 eq 'b.68' or $cf1 eq 'b.69' or $cf1 eq 'b.70') and ($cf2 eq 'b.67' or $cf2 eq 'b.68' or $cf2 eq 'b.69' or $cf2 eq 'b.70')){
$flag=1;
}
#beta propellors 5-8 blades
if (($cf1 eq 'c.94' and $cf2 eq 'c.93') or( $cf2 eq 'c.94' and $cf1 eq 'c.93')){
$flag=1;
}
#Note in SCOP, Similar in architecture but partly differs in topology
if (($sf1 eq 'c.23.9' and $sf2 eq 'c.69.1') or ($sf2 eq 'c.23.9' and $sf1 eq 'c.69.1')){
$flag=1;
}
#Cutinase-like
if (($fa1 eq 'f.2.1.10' and $sf2 eq 'c.108.1') or ($fa2 eq 'f.2.1.10' and $sf1 eq 'c.108.1')){
$flag=1;
}
# -provisional classification-

if (($sf1 eq 'a.118.8' and $sf2 eq 'a.118.6') or ($sf2 eq 'a.118.8' and $sf1 eq 'a.118.6')){
$flag=0;
}
#Very similar alpha super-helix
if (($sf1 eq 'd.58.1' and $sf2 eq 'a.1.2') or ($sf2 eq 'd.58.1' and $sf1 eq 'a.1.2')){
$flag=0;
}
#Similar motif sulphur binding
if (($sf1 eq 'a.137.4' and $sf2 eq 'c.96.1') or ($sf2 eq 'a.137.4' and $sf1 eq 'c.96.1')){
$flag=0;
}
#OK note in SCOP one of the previous cases
if (($sf1 eq 'b.42.5' and $sf2 eq 'b.42.1') or ($sf2 eq 'b.42.5' and $sf1 eq 'b.42.1')){
$flag=0;
}
#OK same fold and general look
if (($sf1 eq 'c.10.1' and $sf2 eq 'c.10.2') or ($sf2 eq 'c.10.1' and $sf1 eq 'c.10.2')){
$flag=0;
}
#Leucine rich repeats both of them,  structures look the same OK
if (($sf1 eq 'c.11.1' and $sf2 eq 'c.10.2') or ($sf2 eq 'c.11.1' and $sf1 eq 'c.10.2')){
$flag=0;
}
#Obvious sequence homology with blast,  one is beta-beta-alpha superhelix,  and one is beta-alpha togethor in PFAM
if (($sf1 eq 'c.11.1' and $sf2 eq 'c.10.1') or( $sf2 eq 'c.11.1' and $sf1 eq 'c.10.1')){
$flag=0;
}
#Obvious sequence homology with blast,  one is beta-beta-alpha superhelix,  and one is beta-alpha togethor in PFAM
if (($sf1 eq 'c.91.1' and $sf2 eq 'c.37.1') or( $sf2 eq 'c.91.1' and $sf1 eq 'c.37.1')){
$flag=0;
}
#Note in SCOP, contains P-loop
if (($sf1 eq 'd.51.1' and $sf2 eq 'd.52.3') or( $sf2 eq 'd.51.1' and $sf1 eq 'd.52.3')){
$flag=0;
}
#Note in SCOP, shared motif
if (($cf1 eq 'c.6' and $cf2 eq 'c.1') or( $cf2 eq 'c.6' and $cf1 eq 'c.1')){
$flag=0;
}
#Note in SCOP, shared motif
#-----------------------------------

return ($flag);
}