Programs for analyzing MLEE data

Address questions to Thomas S. Whittam (whittam@msu.edu)

These programs were written to analyze genetic diversity and relationships among bacterial strains characterized by multilocus enzyme electrophoresis (Selander et al. 1986, App. Environ. Microbiol. 51:873-884). The programs can also be used with other types of binary state of multi-state data with unordered categories. The electromorph data should be stored as integers with 0 (null alleles) to be treated as missing data. The input data files need to be saved as text files in the format described in the Readme file.

See an example input data file

Programs can be executed on a PC running Windows by clicking on the application in Windows Explorer or typing the name of the program at the Command prompt. The FORTRAN source code can be obtained by request.

ETDIV finds and lists the electrophoretic types (ETs) in a collection of bacterial isolates with multilocus enzyme profiles. The program writes the results to an output file and creates a file named ETLIST.DAT to be used as input for ETCLUS. The input file for ETDIV must have the format explained in the Readme file.
See example output file

ETCLUS uses the output file ETLIST.DAT created by ETDIV and finds a dendrogram based on the average linkage algorithm (UPGMA). Distance is measured as the proportion of mismatched loci between pairs of ETs. Null alleles that are scored as '0' are not used in the calculation of pairwise distances. To obtain dendrograms for publication, I use ETMEGA (see below) and the MEGA program.
See example output file

ETMEGA creates a distance matrix for input into the MEGA program (Kumar, Tamura, and Nei, 1994, CABIOS 10:189). The program uses the same input file format and has the same default parameter values as ETCLUS. It calculates genetic distance between pairs of ETs and writes a file in the MEGA input format. Note that MEGA does not accept blanks spaces within the strain labels, so replace these blank spaces with some other symbol. The output file from ETMEGA is then used as data file input (distance matrix option) in MEGA.
See annotated output file

The MEGA computer program can be obtained from the authors by filling out an order form

ETLINK calculates several measures of linkage disequilibrium, including the distribution of standardized coefficient (D') between all pairs of alleles, the two-locus coefficient Q* for multiple alleles per locus, and the indices of multilocus association based on the properties of the mismatch distribution. For information and references about these measures, see Whittam et al. (1983, Proc. Natl. Acad. Sci. USA 80:1751-1755) and Hedrick and Thomson (1986, Genetics 112:135-156).
See annotated output file

ETBOOT is a bootstrap program that randomly selects loci, obtains a distance matrix, finds a tree (based on the average linkage or the neighbor joining algorthim, and records the nodes of the tree. The process is repeated for a number of bootstrapped trees (input by the user). ETBOOT then tabulates the number and frequency of each observed node recovered among the randomly generated trees.
See annotated output file