Evaluation of IGH partitioning utilities

Help, FAQs and Further Information

Formatting of input

The evaluation utility requires that the results of parititioning are formatted such that each line contains a single sequence's IGHV, IGHD and IGHJ, listed as tab separated values. Optionally, the sequence ID can be listed at the start of each line.

Multiple germline genes

It is possible to list multiple alternatives for the IGHV, IGHD and IGHJ genes for a single sequence. This may be desirable if a utility indicates a number of 'equal scoring' matches to the germline repertoire. Multiple genes names can be separated by a number of characters inlcuding spaces, backslash, comma, underscore or the word 'or'. Tabs should not be used to separate multiple gene alternatives as there are reserved for delimiting the results into IGHV, IGHD and IGHJ.

Inverse germline genes

A number of IGH partitioning utilities allow the use of inverse IGHD genes. These may be indicated in the input by appending either '/inv', 'inv' or 'R', or by prepending the IGHD gene name with 'Inv_'. To avoid confusion with listings of multiple IGHD alternatives, the / and _ characters are automatically replaced by the evaluation tool prior to identification of multiple inputs.

Transformation of utility output to evaluation tool input

Perhaps the simplist way to achieve a suitable input is to collate the partitioning results in a spreadsheet and to then export this as a tab-delimited text file, or to simply cut&paste from the spreadsheet to the text area of the evaluation utility. Please note, any hidden columns will be included in such actions, so please ensure that only the columns required (ID [optional], IGHV, IGHD and IGHJ) are present.

Genotype input file

The Stanford_S22 genotype file can be loaded by selecting the Stanford_S22 on the evaluation utility form, or can be viewed here. Once a genotype has been inferred, the input file format is simply a list of the names of the genes that comprise the genotype. These may be listed as a single gene per line or simply separated via whitespace. Valid gene names are considered to include V, D or J to denote the gene types and this is expected to be followed by a number denoting the gene family. Allele numbers are expected to be separated from gene family and gene number by an '*'.

In the event that a utility uses gene naming conventions that differ to those contained within the genotype file, a custom genotype file will need to be created (and selected via the 'custom' option).

The default genotype file includes a number of putative polymorphisms from the UNSWIg germline repertoire. These are accompanied by the closest matches to germline genes from the repertoires used by other utilities. A number of genes are listed multiple times in order to account for the variations in naming between repertoires. This does not impact the operation of the utility, but rather acts to ensure a gene is accounted for.

Please note that additional information contained within the file, such as noting whether or not a gene is functional or an ORF, will be ignored by the evaluation utility.

IGH utility evaluation
Contact

For queries about iHMMune-align:
Andrew Collins
Katherine Jackson