I am using Bounties as a distributed approach to assigning tasks and developing roles for robots in a multi-robot system. I am trying to figure out a good learning algorithm that will get good results for this system that will scale and be general. Therefore, since I’m not that smart I have to learn about some of the theory to do with bounties and how they operate so that I can program the bounty hunters to learn how to do well. One game theory tool I think I should apply is the concept of the price of anarchy.
The price of anarchy is the cost of allowing the agents to act selfishly rather than being controlled by a central authority. It is usually used in traffic routing or networks. One interesting result is Braess’ Paradox which describes the self evident fact that low latency routes when freed for everyone to use will result in a lower social welfare because when everyone uses that route it becomes the Nash equilibrium.
I believe that in the robot case we have that auctions are a centralized approach to robot task allocation (essentially solving the graph coloring problem) whereas bounties are the selfish distributed approach. Therefore, what is the social welfare cost of using this selfish approach. (Social welfare is what me as the designer is interested in since I don’t want my robots congesting the route to performing a task such that the other tasks don’t get done.)
So, can I learn the “congestedness” of particular tasks sort of like the idea that I will longer route if it is faster. Meaning I will take a bounty that will cause me to explore the environment even though there is a bounty that is close by but I know that the last time I went that way the route was congested with other bounty hunters trying to get to it. The bounty is essentially then the motivation for me to try to get the bounty even if I have encountered congestion in the past. The bounty is acting as a flag to be like hey, I’m not congested come get me. However, the bounty hunters have to know that all of the hunters also know that info. So, how do they react? Nash equilibrium probably becomes that they all take that bounty?.. The robots have an estimate of the cost to travel to the task location and deliver that box. They also integrate in the cost of congestion expected to task.
On a side note I can make the system heterogeneous by using the flockbots to grab the cans with the barcodes since the darwin’s can’t pick up cans, but flockbots can. Darwins can pickup blocks and flockbots can’t pick them up.
Some links:
http://theory.stanford.edu/~
http://math.mit.edu/~olver/msc_thesis.pdf good explanation of the concept
https://www.mpi-sb.mpg.de/~gchristo/papers/ck05a.pdf — The price of anarchy in finite congestion games seems like this would be relevant to my problem since we have finite number of robots?
http://www.cs.cmu.edu/~avrim/Papers/ics_bbm.pdf — might be inspiration for my learning algorithm.