Spatial Queuing Theory with Holes

So, consider a dynamic wireless sensor network.  We wish to minimize the average wait time for each of the nodes in the network to be serviced by new information.  We however do not want to increase the   By using the bounty hunting algorithm we can do this.  I might want to look into routing algorithms.

Consider poison point processes with holes.  When we have a single neighborhood we have a poisson point process, when we have multiple neighborhoods we will have non-overlapping regions where no tasks are generated.  This is where the “holes” are.  Stochastic Geometry is the area of mathmatics which is interested in this.

But, I’ve not really been focused on wireless sensor networks, and it is a bit of a stretch to fit bounty hunting to it (at least as far as I can tell.  My first papers might suggest otherwise).  But with my current direction I have more interest with spatial queues, I have queues rather than wireless sensor networks.  So, there is spatial queuing theory, but there is not a spatial queuing theory with holes!  The paper “Risk and Reward in Spatial Queuing Theory” deals with spatial queuing theory for the dynamic traveling repairman problem.  All of these systems assume a region without holes or space where no tasks will be generated.  This is an important thing in the real world as there are generally spaces where there won’t actually be tasks.  Therefore, I think I need to incorporate the concept of Poisson Point Processes with Holes.  Then build from that what to expect based on the size of the holes and locations.  The holes matter because the distance the servers must travel between the next task is dependent on the size of these holes!

So, I think this is important.  Actually I think that holes might not be general enough.  It would be better if I could generalize to any space.

Ethics and Incompleteness

So, I was talking to my wife 🙂 about ethics and the types of ethics.  She told me that there is this form called consequential ethics and deontological ethics.  And there are I’m sure a bunch more.  But, what I have discovered is that they each have a set of axioms and from these they base all of there conclusions.  However, I believe that we here are encountering a similar problem as Godel’s incompleteness theorem.  Basically, a complete and consistent set of axioms for all ethics is impossible.

http://philosophy.stackexchange.com/questions/31726/can-g%C3%B6dels-incompleteness-theorems-be-applied-to-ethics is a link I just found and he references Tarski’s undefinability theorem.

Math problem of the day

Since I’m taking my last semester of classes I realized that I might never take a math class again :(.  So, I was thinking to keep my math skills sharp I might start a problem of the day (or in some other increment).  I really don’t want to loose my mathematics skills.  Also, I can’t find anything like this for more of the advanced math like DE and probability theory stuff.  I’ll post the question along with my solution.  This might remind you of Project Euler, but I’m thinking that I’d like to keep the problems (at least for me) to those that require less computers and more mathematical tools.  So, basically they can be solved entirely with a pen and paper without a computer in less than a half hour.  Also, I’d like to then expand my own math skills so I might have a day where I find a problem from some graduate level math.  If I actually find I do this I might start its own page.

Adaptive Data Analysis

Google Research recently wrote a piece The reusable holdout: Preserving validity in adaptive data analysis  (so I’m not go to write much).  It details the problem with statistics generated when the machine learning methods are adapted to the data through data analysis and repeated trials on the same hold out data.  This is a problem that is easily recognizable in machine learning competition leader boards like those on Kaggle (good article on that).  The solution they gave was to use their method detailed here and here (going to be published in Science) that allows us to reuse the hold out data set many times without loosing its validity.  So, that is awesome!  Hopefully the Kaggle competitions and data science research will be greatly improved by having more meaningful feedback.

Comparison Continued

I took a look at the R source code for the tukey test (HSD test) from the agricolae library http://cran.r-project.org/web/packages/agricolae/index.html.  The interesting thing is that they purposefully call round:

round(1-ptukey(abs(dif[k])*sqrt(2)/sdtdif,ntr,DFerror),6)

Also, they assume lm or aov.  So, we know that they don’t like big numbers. However, the nice thing is that the R file shows how to use the ptukey function!  Which is where I was a bit iffy.  So, I think I could get away with writing it in R and just instead of requiring lm or aov I just want the data then I can perform the requisite calculations.

I think I’ve convinced myself that I’ll trust mathematica and I’ll deal with how long it takes to get the data.  I don’t like how ineffiecient it is, but I can sacrifice time if it means that I get accurate results.  I don’t think that I’m qualified or have the time to create such a library.  It is interesting though that we don’t have many papers on knowing how many significicant digits are necessary for accurate tests.  Or that the R libraries don’t report the errors in their calculations.  This must be why people use Mathematica or Matlab because we trust them to do it right to enough sig figs that it doesn’t matter the error because it is insignificant.

So, I just need a nice desktop.

Handcoded Tukey

So, it seems that if I want a fast tukey test I need a fast ANOVA.  Which seems to be where the bottle neck is. If I had time I would code ANOVA and borrow a Tukey implementation in C.  I think that all of these programming languages are doing it wrong for what I need to do.  In all of the languages I have to load all of the numbers and then perform an ANOVA.  Then do the post hoc tukey test.  The thing that takes a lot of ram is that I think these algorithms are keeping all of the numbers in ram rather than loading and summing and then releasing the memory and keeping the sum.  What should be done is that the values could be calculated while the file that the numbers are stored in is being read.  Then you would really have at most n numbers (where n is the number of groups) in ram.  Which is tiny.  So, the ANOVA really should be super fast.  Which means that maybe its the tukey test itself that takes a while…  I don’t think so though.  The code for it looks pretty fast.  So, it must just be poor memory management and large numbers that are tripping mathematica and R etc. up because it seems like it could be a super fast calculation.

http://www.graphpad.com/support/faqid/1517/ has a link to the C code that R uses for the tukey test.  It uses double values so I don’t think I would be able to just plug and play with it since the values I have are greater than a double.

large numbers

Well I’ve been looking into using R to calculate the tukey test.  I’m doing this because I have a 10 files of 200,000 integers (all around 5000 and above).  That is about 200MB per file.  So, right now the ANOVA with a post hoc tukey test takes about 1hr to run in Mathematica.  THIS IS SO SLOWWWW.

To do a tukey test you first get the numbers from an ANOVA (http://web.mst.edu/~psyworld/anovaexample.htm).  Then run it through the tukey formula (http://web.mst.edu/~psyworld/tukeysexample.htm) for each pair.  It really doesn’t seem that hard.  And it doesn’t seem like it should take an hour to run.  In mathematica it takes forever to even load the files.

So, I’ve been looking into using R.  Loading the numbers into a list is super fast, takes no time at all.  However, to run the ANOVA it uses so much ram I can’t do it on my computer.  I’m starting to think it might be that the numbers are so big after squaring them that it causes it to use so much memory.  I don’t know….

But, the fun thing was in the process I tried out Rmpfr a library that is suposed to provide arbitrary precision calculations.  However, when I try doing 1246863692^2 using Rmpfr I get 1554669066427870976.  Which is wrong!  It should be 1554669066427870864.  Which I hand, wolfram, and scientific calculator triple checked!  So, I’m not going to use that package.

Also, I tried http://cran.r-project.org/web/packages/Brobdingnag/vignettes/brobpaper.pdf.  Which also doesn’t do what I want!

So, I found gmp which works!!!

N-Queens

n-queens problem is a very interesting problem.  Especially when you start imagining more than 2 dimensions and start thinking d-dimensions it becomes an even more interesting and challenging combinatorics problem.  The link also points to some applications including traffic control, which makes sense.

Roots of Branching Processes

In branching processes you can find the probability of the system going extinct by finding the roots of the polynomial where the coefficients are the probabilities and the exponents are the number generated.  The smallest root between 0 and 1 will be the answer.  What do the other roots mean?  Basically the root corresponds to when the system is zero (extinct).  We can have all kinds of roots: complex, negative, positive > 1 roots.  So, what do they mean?  Do they mean anything.  Well wikipedia has a page on exotic probability so, I guess they might have some meaning… http://en.wikipedia.org/wiki/Exotic_probability. Can’t find anything on the subject so who knows…