large numbers

Well I’ve been looking into using R to calculate the tukey test.  I’m doing this because I have a 10 files of 200,000 integers (all around 5000 and above).  That is about 200MB per file.  So, right now the ANOVA with a post hoc tukey test takes about 1hr to run in Mathematica.  THIS IS SO SLOWWWW.

To do a tukey test you first get the numbers from an ANOVA (http://web.mst.edu/~psyworld/anovaexample.htm).  Then run it through the tukey formula (http://web.mst.edu/~psyworld/tukeysexample.htm) for each pair.  It really doesn’t seem that hard.  And it doesn’t seem like it should take an hour to run.  In mathematica it takes forever to even load the files.

So, I’ve been looking into using R.  Loading the numbers into a list is super fast, takes no time at all.  However, to run the ANOVA it uses so much ram I can’t do it on my computer.  I’m starting to think it might be that the numbers are so big after squaring them that it causes it to use so much memory.  I don’t know….

But, the fun thing was in the process I tried out Rmpfr a library that is suposed to provide arbitrary precision calculations.  However, when I try doing 1246863692^2 using Rmpfr I get 1554669066427870976.  Which is wrong!  It should be 1554669066427870864.  Which I hand, wolfram, and scientific calculator triple checked!  So, I’m not going to use that package.

Also, I tried http://cran.r-project.org/web/packages/Brobdingnag/vignettes/brobpaper.pdf.  Which also doesn’t do what I want!

So, I found gmp which works!!!