Scaligner implements several scores to evaluate the humanness of a sequence:
- H-score: expressed human antibody sequences from the latest Kabat database (2000) have been grouped by isotype (IGHV, IGKV, IGLV). The ‘raw humanness’ of an antibody variable domain is defined as the mean sequence identity of a variable region sequence scored against all distinct human variable regions of the same isotype. The procedure calculates a Z-fold for each isotype to compare the raw humanness of a test sequence to the raw-humanness of all the expressed sequences in the group. H-score is the highest Z-fold. Sequences with a H-score above zero have therefore a higher than average similarity with expressed sequences and are therefore more representative of the expressed repertoire in Kabat. Conversely sequences with a negative score have less than average similarity to the expressed sequences.
- G-score: the G-score is similar in essence with the H-score, however grouping of expressed sequences is done at the level of the most probable germline (e.g., IGHV1-39) instead of the isotype (e.g., IGHV).
- T20 score: sequence identity is calculated against a database of expressed human antibody sequences, and the average of the 20 highest similarities is the T20 score. Sequence identity can be evaluated on the complete V domain (framework regions and CDRs) or only on framework regions.
- Germinality index: calculated as the highest sequence identity against all human germlines. Sequence identities can be evaluated on the complete V domain (framework regions and CDRs) or only on framework regions.
Scores are displayed in the sequence window. They are represented along with a distribution of scores of human antibodies (in blue) and mouse antibodies (in red) to estimate how human the sequence is.