SVRMHC Download Page

The current URL of this page is http://supplementary.biolead.org/SVRMHC/download.

SVRMHC is an easy-to-use command-line-based package for predicting epitope binding with class I and class II MHC molecules. It was developed with the LibSVM library. SVRMHC integrates an automatic parameter selection procedure, two sequence encoding schemes, outlier removal functions and other capabilities.

SVRMHC can be downloaded as Windows (DOS) or Linux executables. The source code and Makefile are also provided for compiling under other operating systems.

1. Quick start

To take advantage of this package efficiently,you need to know the format of the input file and the output file.

1.1 Input file

For class I traing program,the input file should contain pairs of peptide sequence and
its corresponding pIC50 value.The input data format should be organized as follow:

peptide_sequence_1 pIC50_value_1
peptide_sequence_2 pIC50_value_2
..
..
..
peptide_sequence_n pIC50_value_n

Note:The length of all the peptide sequences in class I input file should be the same,otherwise a warning message would display on the standard output screen.pIC50 value should be a non-negative real number.

For class II training program,the format of input file is mainly the same as the case of class I training program.The mere difference is that there is no length limitation for the peptide sequence,but in general the length of class II peptide sequence would no more than 25.

For class I and class II prediction program,the input file should contain the peptide sequence only(for class I,with "same sequence length" limitation;also for class II,there is no such limitation).

1.2 Output file

For model file,user do not need to know much about it or you can just treat it like a black box.However we would like to emphasize here the special content of the model file,which is different from LibSVM's.The first a couple of lines in the model file are the corresponding parameters used in training process,such as encoding factor for both class I and class II case,and anchor position and anchor limitation characters for class II only.Thus users are not required to re-input these parameters during prediction.

For result file,the program will write into it the sequence and its predicted value(In the case of class II,users may need to indicate the alignment protocol for output of predicted value).

Additionally,the training program will output a file of details named "info.txt" to the same path with the executable file,you may see the parameter ranges(C,GAMMA,epsilon) which are created automatically or manually,optimal parameters,removed outliers and so on in this file.

2.Package usage


When there is a syntax error in command line,a help message will display on the screen presents every single usage detail to users.

Four sections of more detailed introduction will be presented below.

2.1 Class I model training

a. General command line template:
svrmhc -train [options] -i input_file_name -m model_file_name
b. Options:
-train
-classi (default)

Note:classi is the default case of this package,so if you want to train a class I model you can just enter -train without entering -classi.

-e encoding factor: set the encoding factor for each peptide
11_factor ----11 factor encoding (default)
sparse ---- sparse encoding

-v cross-validation mode: set the cross-validation mode to find the optimal parameter
LOO ---- leave one out cross validation
n ---- n fold cross-validation(n is an integer, should be less than the number of input sequence) (default value:5)

-k svm kernel type:set type of kernel function
linear -- linear kernel: u'*v\n
polynomial-- polynomial kernel: (gamma*u'*v + coef0)^degree\n
rbf -- radial basis function: exp(-gamma*|u-v|^2) (default)

-r remove outliers or not
removal -- remove the outliers in data (default)
no_removal -- do not remove outliers

-o outlier_threshold :set the threshold of outliers residual
n -- residual threshold(n is a real number,default value:2)

-h parameter selection: specify the way to do parameter selection (default:automatic)
automatic-- automatically select parameters, the parameter range will be calculated by our package
manual-- manually select parameters, users should specify the parameter range manually

Note:When you have chosen manual selection way,you should input the c_min,c_max,c_step,epsilon_min,epsilon_max,epsilon_step,and so on explicitly according to the message on the screen.

-i input file name:set the path for input file (should be entered explicitly)
-m model file name: set the path for model file (should be entered explicitly)

c. Example:

>svrmhc -train -v 7 -k linear -o 1 -i data_file -m model_flie

Train a class I model with linear kernel, select the optimal parameter automatically by 7-fold cross-validation and remove those sequences with residual more than 1.

>svrmhc -train -v 0 -r no_removal -h manual -i data_file -m model_file

Train a classI model with rbf kernel,select the optimal parameter manually by leave-one-out cross-validation and do not remove outliers.

2.2 Predicting using a class I model

a. General command line template:
svrmhc -predict -i input_file_name -m model_file_name -s result_file_name

Note:The program will read way of encoding sequence(sparse or 11_factor) from model file automatically.

b. Options:

-predict

-classi (default)
-i---input file name:set the path for input file (should be entered explicitly)
-m---model file name: set the path for model file (should be entered explicitly)
-s---result file name:set the save path for result file (should be entered explicitly)

2.3 Class II model training

a. General command line template
:
svrmhc -train -classii [options] -i input_file_name -m model_file_name

b. Options:
-e encoding factor: set the encoding factor for each peptide
11_factor ----11 factor encoding (default)
sparse---- sparse encoding

-v cross-validation mode: set the cross-validation mode to find the optimal parameter (default value:5)
LOO---- leave-one-out cross-validation
n ---- n fold cross-validation(n is a integer, should be less than the number of input sequence)

-k svm kernel type:set type of kernel function
linear--linear kernel: u'*v\n
polynomail--polynomial kernel: (gamma*u'*v + coef0)^degree\n
rbf-- radial basis function: exp(-gamma*|u-v|^2) (default)

-r remove outliers or not
removal--remove the outliers in data (default)
no_removal--do not remove outliers

-o outlier_threshold :set the threshold of outliers residual
n--n should be a real number(2 it the default)

-h parameter selection: specify the way to do parameter selection (default:automatic)
automatic-- automatically select parameters, the parameter range will be calculated by our package
manual-- manually select parameters, users should specify the parameter range manually

Note:When you have chosen manual selection way,you should input the c_min,c_max,c_step,epsilon_min,epsilon_max,epsilon_step,and so on explicitly according to the message on the screen.

-n iteration number:set the iteration number of selfconsistent iterative classII training (default value:1)
n--n rounds of iteration for classII trainging(n should be an natural number)

-a anchor position:specify which position on peptide would be the anchor site for (default position:1)
n--the nth amino acid on the peptide would be set as anchor position(n should be an integer between 1 and 9)

-l character list:specify the amino acid character table for anchor site (default:YWFLIVM)
residue_string--a string with each character be one of the canonical 20 amino acid characters

Note:This default number "YWFLIVM" is just set for DRB1*0401,so for the reliability of you running,you'd better make sure of this value each time.
On the other words,you have to ensure that each polypeptide can be anchored after you set this value,otherwise it would be meaningless and you may
encounter running time problem.

-i input file name:set the path for input file (should be entered explicitly)
-m model file name: set the path for model file (should be entered explicitly)

c.Examples:

>svrmhc -classii -train -k polynomial -a 3 -l LIVM -i data_file -m model_file

Train a classII model with polynomial kernel,select optimal parameter automatically with 5 fold cross-validation,remove outliers with residual more than 2,set anchor character as LIVM and anchor position at 3.

>svrmhc -classii -train -v LOO -h 1 -n 5 -i data_file -m model_file

Train a classII model with rbf kernel,select optimal parameters manually with leave-one-out cross-validation,set iteration number as 5 and remove outliers with residual more than 2.

2.4 Predicting using a class II model

In this case,the program will read from model file the anchor position,anchor residue list and encoding factor automatically.

a. General command line format:
svrmhc -predict -classii -i input_file_name -m model_file_name -s result_file_name
b. Options

-p positioning of the sequence in the test set.(default:combi)
combi--combined strategy to select the value for tested sequence
mean --output the mean value of all the divided sequence
max -- output the max value of all the divided sequence

-i input file name:set the path for input file (should be indicated explicitly)
-m model file name: set the path for model file (should be indicated explicitly)
-s result file name:set the save path for result file (should be indicated explicitly)

c. Examples:

>svrmhc -predict -classii -p max -i input_file -m model_file -s result_file

Predict test file "input_file" with model_file and output max value of each poly-peptide into result_file

3.Executables and installation

We have made two executables,one under GNU compiler and the other under Microsoft visual studio environment.

Without modification to the source code,you can use them directly.Otherwise,you may have to make it on your own.

On Unix systems with GNU g++ installed,you can simply type "make" to build the executable.

On other systems with other compilers,you may have to consult to "Makefile" to build it.

e.g.
Provided that you have installed Visual C++,you can build it under Windows with the following steps:

a.Open a dos box and change to SVRMHC package directory,type
"c:\program files\microsoft visual studio\vc98\bin\vcvars32.bat"

b.(optional)Erase the out-of-date executable while type "nmake /f Makefile.win clean"

c.Type "nmake /f Makefile.win" to build a new executable.

4.Additional information

Please direct questions and comments to SVRMHC@biolead.org.

Last updated: Tue, 12/08/2009 6:38 PM