I was reading about this book called Factfulness: Ten Reasons We’re Wrong About the World—and Why Things Are Better Than You Think by Hans Rosling that Bill Gates recommend reading. From this I found the Gapminder (which is a spin off from Han’s work) and their tool:
which lets you explore a dizzying number of statistics in order to get a better idea of the world from a macro perspective.
Open Numbers is a cool organization that has a lot of data and is where gapminder pulls its data to put into their tool. Particularly this dataset:
As I am into multiagent systems and agent based modeling this seems like an amazing resource for providing real world data to back up simulations. There are so many interesting things to try and model this data and then with those models be able to code “what if” scenarios. Like say what if we taxed all the millionaires and billionaires 1% every year and redistributed it somehow to the poorest 6 billion? With this data we could see how nations could change and populations grow. We might even find that the people we tax grow even richer due to the increase in the number of people that would be buying things. So many other things we could study with sort of simulation. We could consider what would happen if we had trade tariffs, or natural disasters, or famines… We would see what would happen globally not just locally and not just to a particular sector but to a variety of variables. Clearly this would require a massive amount of research and more data than is currently available. It would be awesome just modeling the behavior of these datasets would be beneficial to understanding how the world works and possibly aiding decision making to possibly reveal outcomes previously not thought of.
Some interesting projects:
Google has there own modified jupyter notebook that integrates into google drive:
And there is Binder (beta) that will create an executable jupyter environment from a github repo with jupyter notebooks. Then anyone can easily run your code.
So, consider a dynamic wireless sensor network. We wish to minimize the average wait time for each of the nodes in the network to be serviced by new information. We however do not want to increase the By using the bounty hunting algorithm we can do this. I might want to look into routing algorithms.
Consider poison point processes with holes. When we have a single neighborhood we have a poisson point process, when we have multiple neighborhoods we will have non-overlapping regions where no tasks are generated. This is where the “holes” are. Stochastic Geometry is the area of mathmatics which is interested in this.
But, I’ve not really been focused on wireless sensor networks, and it is a bit of a stretch to fit bounty hunting to it (at least as far as I can tell. My first papers might suggest otherwise). But with my current direction I have more interest with spatial queues, I have queues rather than wireless sensor networks. So, there is spatial queuing theory, but there is not a spatial queuing theory with holes! The paper “Risk and Reward in Spatial Queuing Theory” deals with spatial queuing theory for the dynamic traveling repairman problem. All of these systems assume a region without holes or space where no tasks will be generated. This is an important thing in the real world as there are generally spaces where there won’t actually be tasks. Therefore, I think I need to incorporate the concept of Poisson Point Processes with Holes. Then build from that what to expect based on the size of the holes and locations. The holes matter because the distance the servers must travel between the next task is dependent on the size of these holes!
So, I think this is important. Actually I think that holes might not be general enough. It would be better if I could generalize to any space.
Wow so it has been a long time. I’ve recently been looking at stocks again and just two days ago I found a stock and I was like I should buy that. Then I didn’t. But I really really should have because it then proceeded to go up by 20% in 2 days. So, this made me look again into algorithmic trading. I found a couple really good resources:
Quantopian will let you design your own algorithms for trading on old data and will also let you run it through robinhood.io or interactivebrokers. I think I like IB better but should start with robinhood.io as there are no fees. But this is awesome!
Google Research recently wrote a piece The reusable holdout: Preserving validity in adaptive data analysis (so I’m not go to write much). It details the problem with statistics generated when the machine learning methods are adapted to the data through data analysis and repeated trials on the same hold out data. This is a problem that is easily recognizable in machine learning competition leader boards like those on Kaggle (good article on that). The solution they gave was to use their method detailed here and here (going to be published in Science) that allows us to reuse the hold out data set many times without loosing its validity. So, that is awesome! Hopefully the Kaggle competitions and data science research will be greatly improved by having more meaningful feedback.
David had an idea for a problem:
Given N individuals, each individual must, say, play a game with M other individuals. The length of the game is stochastic and dependent on who is playing the game. Describe an algorithm that will optimally group the individuals such that each individual has to wait the minimum amount of time to play with all of the other combinations of individuals.
Originally the problem was stated with M=2.
An extension would be what if N is changing with time. Meaning new people come and go. Can the algorithm be robust to this?
So, this seems like it could be approached through a combination of methods:
Stochastic processes (for the arrival and finish times)
Graph Theory / Combinatorics (for the pairing)
Optimization (for the scheduling)
I’m sure that there are other ways, but I think that this problem needs to be solvable pretty quickly in order to be useful…
I think that fractal clustering could be used to cluster data streams of electricity data.
What about distributed clustering algorithms? So, we have massive data streams from different parts of the grid coming in to their respective hubs. How do you make sense of the whole? I think fractal clustering would work here. I believe that the data would be self-similar enough to warrant such an algorithm.
Using the Fractal Dimension to Cluster Datasets
Fast Feature Selection using Fractal Dimension (another interesting paper). This one made think that maybe it would be possible for Horde to automatically select the correct features to be using to make its decisions on. This would be extremely useful, however highly unlikely that it would work with the few examples it is currently classifying on. I think that in order for automatic feature selection it would need a live stream of the features and “spike” the features on transitions between behaviors. I would then be able to continue doing horde as normal but would base my decision tree off of the new feature set.
Would this also be useful in creating the hierarchies? I think so. I will have to think a bit more about this aspect.
This algorithm looks useful.