- Permute 3 was started from scratch - completely new project, everything written from the ground up again. It includes many new features and the entire workflow has been improved with many per-conversion customizations.
- Since 6. 5. 4. 3. 2. 1 appears on both the top and bottom, these terms can be cancelled out. So, this would leave (10. 9. 8. 7 ) / 1, which equals 5,040 / 1, which equals 5,040.
Permute! version 3.4 alpha 9
User's manual
= 4 x 3 x 2 x 1 4! = 24 Calculate our permutation value n P r for n = 7 and r = 3: 7 P 3 = 5,040: 24: 7 P 3 = 210.
Multiple regression over distance, ultrametric and additive matrices with permutation test
Last revision: Saturday, March 30, 2013
Version History
Download it hereImportant
Permute! 3.4 is a major update to the Permute! 3.2 program, despite the small change in version number. It has been made easier to use, faster and more error-tolerant. Hopefully you will find this program to be highly usable, much more than the previous version. This is alpha software, which means it is usable and the feature list is almost complete but not totally implemented.
What does Permute! do?
This program computes a multiple regression of one or more independent variables over one dependent variable, and assesses the probability (p-value) of the regression coefficients and the associated R2 (R-squared) using a permutational method described in Legendre, Lapointe and Casgrain (1994). It also allows the user to select the variables that contribute most to the variation through several procedures: backward elimination, forward selection and stepwise regression.The program is remarkable in its use of appropriate permutation methods. If the variables represent distance matrices, ultrametric or additive trees, the permutation method accounts for it by generating a permuted distance matrix or tree.
How to use the program
Keeping with the philosophy of Macintosh programs, the author assumed that users do not like to read manuals. Hence the program's interface is as self-explanatory as possible, and has balloon help throughout. However, it assumes that the user is familiar with the contents of the Legendre, Lapointe and Casgrain (1994) paper, or at least with the concepts of multiple regression. To start the program, double-click on the icon. From the 'File' menu, choose 'Open..' and select your input file (see below for the file's format). You can also drag your input file and drop it on the program's icon. After a short delay, your file should appear in its own scrollable and resizable window, to show you what the program has read. You can then be sure that the program read your data file correctly. NoteThe program has no way of knowing if your data file is corrupt. You should at least make a visual check on your data to ensure it has been read correctly.About threads..
Starting with version alpha 2, the program is multi-threaded, allowing the user to run multiple, concurrent analyses and background tasks. This requires the Thread Manager, part of System 7.5 and later. You can also get an extension for earlier systems. The 68k and PPC versions will be multi-threaded if the Thread Manager is present.When at least one data file has been read, the File menu's 'New analysis' item is enabled. Choosing it will cause the 'Multiple regression' window to come to the front of the screen. If you have opened more than one data window, the multiple regression will be carried out for the topmost data file. The multiple regression window looks like this: The names of the variables will be different for your own data file. The features of this window are as follows:
- A vector of values, with no matrix-like (distance) structure embedded: select 'Vector'. The values of the dependent variable will be permuted totally at random. This option may be used, for instance
- to test the behavior of the permutation tests of significance against standard multiple regression software, all possible permutations being equally likely, or
- to test multiple regression done on variables that do not meet the distributional assumptions of the standard parametric tests used in multiple regression, or else
- when your variables have no particular structure to them, e.g. they are simply a list of n values measured at n stations.
- A simple distance matrix (not a dendrogram or phylogenetic tree): select 'Matrix'. The rows and the columns of the original dependent distance matrix (Y) will be permuted at random in the manner of the Mantel test.
- A dendrogram (ultrametric tree): select 'Double' (Lapointe and Legendre, 1991).
- A phylogenetic (additive) tree: select 'Triple' (Lapointe and Legendre, 1992). In this case, you will need to indicate which object is the 'Rooting Object' in the popup menu. This is necessary in order for the program to extract the star tree that forms the additive component of the phylogenetic tree.
- The first one, dubbed 'Simple test', computes the regression coefficients (b's) of the selected variables (see below), permutes the data according to the permutation method you selected, and re-computes the regression coefficients k times, where k is the number of permutations.
- Backward elimination procedure
With this procedure, all the user-selected variables (see below) are initially included. The permutation test is conducted, and at each step the independent variable whose partial regression coefficient has the highest p-value is dropped, provided that this probability is also higher than a predetermined and Bonferroni-corrected p-to-remove value, which the user must select from the popup menu - Forward selection procedure
This procedure is based on the fact that a variable should be included in the model if (a) it gives the equation with the most significant R2 (Sokal and Rohlf, 1981) and (b) its b coefficient is significant at the Bonferroni-corrected p-to-enter level.
A forward selection procedure can be defined as follows: at each step, the independent matrix-variable is entered that produces the equation whose R2 coefficient has the lowest probability, provided that this probability is also smaller than or equal to a predetermined Bonferroni-corrected p-to-enter value. This variable's b coefficient must also be significant at the p-to-enter level, again pending approval through Bonferroni's correction. If a tie occurs in the permutational probabilities, the value of the increment in R2 is used as an additional selection criterion. Notice that we cannot use here the forward selection criterion based on the variable with the highest partial correlation, which is computationally simpler and is, in ordinary regression, equivalent to the one used here, because the relation between the values of the partial correlations and their associated probabilities may not hold, due to the special ways the permutations are performed in matrix regression.
The user must also select a p-to-enter value. - Stepwise regression
This procedure is a combination of the backward elimination and forward selection procedures. It consists in a forward selection where each forward step (at the p-to-enter level) is followed by a backward elimination (at the p-to-remove level), assessing the significance of all variables already entered in the model.
Unfortunately, stepwise regression is broken in the current version of Permute!, and the author does not have enough free time to go back and fix it. However, a forward selection followed by a backward elimination of the selected variables should provide a reasonable facsimile of a stepwise regression.
Test results
Test results are shown in a scrolling window. Currently, the program does not respond to user input while it is performing the computations. This will change in future revisions, but for now it means that exploratory analyses should be done with a limited number of permutations (e.g. 99 or 49, even 24 if the dataset is large). A good indication of the time needed to compute one permutation is to press the 'Quick test' button and see how much time is necessary for the results to appear. Multiply that time by the number of permutations and you have an indication of the total time needed before results start to appear. The 'Test results' window may be scrolled back and resized as desired. A run may have as many windows as the computer memory allows. You can copy the results to the clipboard (Edit menu.. Copy), save them to a file (File menu.. Save) and print them. There is a limit of 32,000 characters on the results window, the earlier characters being flushed as needed. In practice this limit is unlikely to be encountered, but the author is aware of the situation.Input file format
Although the computations are based on distance matrices, the program uses a clever scheme to simulate matrix-like behavior in vectors, in order to maximize memory usage, using some properties of symmetric matrices. If 'A' is a distance matrix, and i and j are integers, the following will be true: Therefore, the only unique part (i.e., that cannot be deduced from information found elsewhere) in the matrix is the upper-triangular part, excluding the diagonal. Of course, the lower-triangular could have been used instead; we selected the upper-triangular arbitrarily.Here is an illustration of the way the original distance matrices should be 'unfolded': The input file thus consist of the 'unfolded' upper-triangular matrices, read by rows, in a vertical format similar to the one found in many spreadsheet applications. There are as many columns to the file as necessary, each column representing an unfolded, upper-triangular matrix-variable. Columns should be separated by tabs. The first line of the file should be the name of the matrix-variables (separated by tabs); up to 10 characters per variable name will be read by the program. Following are a series of distance matrices, followed by the corresponding input files. The input files consist of the 'unfolded' upper triangular matrices (boldface values), excluding the diagonal. For each matrix, there is only one value per row in the input file.
File name: MyInputFile.txt File contents:
Technical notes
Limitations removed in version 3.4
The previous version of the program required two files (one for the Y matrix-variable and one for the X1.Xn matrix-variables), and the order in which the X appeared in the file determined their names in the program. For instance, the third column of the file containing the X matrix-variables was always named 'X3'. Furthermore, there were several limitations in the previous version of the program, such as:- Maximum of 6 independent matrix-variables
- The number of lines in each file had to be an integer that satisfied the n(n-1)/2 relationship: you could have 105 lines in your input file (meaning 15 objects in the matrix: 15*14/2 = 105) but not 104 or 106, as these numbers do not correspond to an integer number of objects and hence, the data file can't be a distance matrix..
- Maximum of 29 objects (406 lines) in all input files
Current limitations
- Maximum number of rows in a data file: 178,956,970
(CodeWarrior seems to accept a maximum size of MaxLongInt, or 2,147,483,647, but this is the highest Think Pascal allows me to go without complaining that a 'variable of this type would be too large'.. - Maximum number of columns (variables) in a data file: 178,956,970
- Note: due to limitations in the List Manager if your matrix has more than 16,000 cells or so it will show up as blank. It will be used in its entirety for the computations, though.
Requirements
Any Macintosh capable of running System 7.0 and up should be able to run Permute! 3.4, which has been tested on the following machines:Computer | Configuration | Processor | Time required |
Macintosh Plus | 4MB RAM, System 7.0.1 | 68000@16MHz | [Macro error: Can't call the script because the name 'elapsedTime' hasn't been defined.] |
Macintosh II | 8MB RAM, System 7.5.5 | 68020+FPU@16MHz | [Macro error: Can't call the script because the name 'elapsedTime' hasn't been defined.] |
Macintosh II | 8MB RAM, System 7.5.5 | 68020@16MHz (no FPU) | [Macro error: Can't call the script because the name 'elapsedTime' hasn't been defined.] |
Macintosh SE/30 | 20MB RAM, System 7.5.5 | 68030+FPU@16MHz | |
Macintosh IIci | 20MB RAM, System 7.5.5 | 68030+FPU@25MHz | [Macro error: Can't call the script because the name 'elapsedTime' hasn't been defined.] |
PowerBook 160 | 14MB RAM, System 7.5.3 | 68030@25MHz | |
Macintosh Classic II | 4MB RAM, System 7.5.5 | 68030@16MHz | [Macro error: Can't call the script because the name 'elapsedTime' hasn't been defined.] |
Power Macintosh 7100/66 | 48MB RAM, System 7.5.5 | PPC 601@66MHz (256k L2 Cache) | [Macro error: Can't call the script because the name 'elapsedTime' hasn't been defined.] |
Power Macintosh 7200/90 | 64MB RAM, System 8.0 | PPC 601@90MHz | [Macro error: Can't call the script because the name 'elapsedTime' hasn't been defined.] |
Motorola StarMax 180 | 80MB RAM, System 7.6.1 | PPC 603e@180MHz | [Macro error: Can't call the script because the name 'elapsedTime' hasn't been defined.] |
Power Macintosh 9600/350 | 64MB RAM, System 7.6.1 | PPC 604e@350MHz | [Macro error: Can't call the script because the name 'elapsedTime' hasn't been defined.] |
Permute 3 1 9 Equals Meters
Times are for a forward selection (p-to-enter: 0.10) of 6 variables vs. Brain (Marsupial test dataset), 29 objects (406 lines in data file), Triple Permutation Test (root=29) and 999 permutations. The program is compiled as a 'fat binary', meaning it is PowerPC-native but will work fine on older, 680x0-based machines.References
Hope, A. C. A. 1968. A simplified Monte Carlo significance test procedure. J. Roy. Stat. Soc. Ser. B 30: 582-598. Lapointe, F.-J., and P. Legendre. 1991. The generation of random ultrametric matrices representing dendrograms. J. Class. 8: 177-200.
Lapointe, F.-J. & P. Legendre. 1992. A statistical framework to test the consensus among additive trees (cladograms). Syst. Biol.41: 158-171.
Legendre, P., F.-J. Lapointe & P. Casgrain. 1994. Modeling brain evolution from behavior: a permutational regression approach. Evolution48: 1487-1499.
Sokal, R.R. and F.J. Rohlf. 1981. Biometry, second edition. W.H. Freeman and Co., San Francisco. 859 pp.
Still to do
Sorted by prioritySuggestions? Bugs?
My electronic mail address is [email protected]Built with..
Thanks to..
Description
Utility functions to describe unrestricted and restricted permutationdesigns for time series, line transects, spatial grids and blockingfactors.
Usage
Arguments
within, plots, blocks | Permutation designs for samples within thelevels of plots (within ), permutation of plots themselves, or for the definition of blocking structures whichfurther restrict permutations (blocks ). within andplots each require a named list as produced by Within and Plots respectively. blocks takes a factor (or anobject coercible to a factor via as.factor ), the levels ofwhich define the blocking structure. |
nperm | numeric; the number of permutations. |
complete | logical; should complete enumeration of allpermutations be performed? |
type | character; the type of permutations required. One of 'free' , 'series' , 'grid' or 'none' . SeeDetails. |
maxperm | numeric; the maximum number of permutations toperform. Currently unused. |
minperm | numeric; the lower limit to the number of possiblepermutations at which complete enumeration is performed. When nperm is lower than minperm , sampling is performedfrom the set of complete permutations to avoid duplicatepermutations. See argument complete and Details, below. |
all.perms | an object of class allPerms , the result of acall to allPerms . |
make | logical; should check generate all possiblepermutations? Useful if want to check permutation design but notproduce the matrix of all permutations, or to circumvent theheuristics governing when complete enumeration is activated. |
observed | logical; should the observed permutation be returnedas part of the set of all permutations? Default is FALSE tofacilitate usage in higher level functions. |
constant | logical; should the same permutation be used withineach level of strata? If FALSE a separate, possibly restricted,permutation is produced for each level of strata . |
mirror | logical; should mirroring of sequences be allowed? |
ncol, nrow | numeric; the number of columns and rows of samplesin the spatial grid respectively. |
strata | A factor, or an object that can be coerced to a factorvia as.factor Pcalc for apple watch. , specifying the strata for permutation. |
Details
shuffle
can generate permutations for a wide range ofrestricted permutation schemes. A small selection of the availablecombinations of options is provided in the Examples section below.Argument type controls how samples are actually permuted;
'free'
indicates randomization, 'series'
indicatespermutation via cyclic shifts (suitable for evenly-spaced linetransect or time series data), 'grid'
indicates permutation viatoroidal shifts (suitable for samples on a regular grid), and'none'
indicates no permutation of samples. See the packagevignette (browseVignettes('permute')
) for additionalinformation on each of these types of permutation.Argument
mirror
determines whether grid or series permutationscan be mirrored. Consider the sequence 1,2,3,4. The relationshipbetween consecutive observations is preserved if we reverse thesequence to 4,3,2,1. If there is no inherent direction in yourexperimental design, mirrored permutations can be consideredpart of the Null model, and as such increase the number of possiblepermutations. The default is to not use mirroring so you mustexplicitly turn this on using mirror = TRUE
in how
.To permute plots rather than the observations within plots (thelevels of
strata
), use Within(type = 'none')
andPlots(type = foo)
, where foo
is how you want the plotsto be permuted. However, note that the number of observations withineach plot must be equal! For some experiments, such as BACI designs, one might wish to use thesame permutation within each plot. This is controlled byargument
constant
. If constant = TRUE
then the samepermutation will be generated for each level of strata
. Thedefault is constant = FALSE
.Value
For
how
a list with components for each of the possible arguments.Author(s)
Gavin Simpson
Permute 3 1 9 Equals =
References
shuffle()
is modelled after the permutation schemes of Canoco3.1 (ter Braak, 1990); see also Besag & Clifford (1989).Besag, J. and Clifford, P. (1989) Generalized Monte Carlo significancetests. Biometrika76; 633–642.
ter Braak, C. J. F. (1990). Update notes: CANOCO version3.1. Wageningen: Agricultural Mathematics Group. (UR).
See Also
shuffle
and shuffleSet
forpermuting from a design, and check
, a utility functionfor checking permutation design described by how
.