So, it seems that if I want a fast tukey test I need a fast ANOVA. Which seems to be where the bottle neck is. If I had time I would code ANOVA and borrow a Tukey implementation in C. I think that all of these programming languages are doing it wrong for what I need to do. In all of the languages I have to load all of the numbers and then perform an ANOVA. Then do the post hoc tukey test. The thing that takes a lot of ram is that I think these algorithms are keeping all of the numbers in ram rather than loading and summing and then releasing the memory and keeping the sum. What should be done is that the values could be calculated while the file that the numbers are stored in is being read. Then you would really have at most n numbers (where n is the number of groups) in ram. Which is tiny. So, the ANOVA really should be super fast. Which means that maybe its the tukey test itself that takes a while… I don’t think so though. The code for it looks pretty fast. So, it must just be poor memory management and large numbers that are tripping mathematica and R etc. up because it seems like it could be a super fast calculation.
http://www.graphpad.com/support/faqid/1517/ has a link to the C code that R uses for the tukey test. It uses double values so I don’t think I would be able to just plug and play with it since the values I have are greater than a double.