more ideas

So the idea is that we have these beacons.  They have a broadcast radius.  In that radius they state what they want to have done to them, ie moved to location (x,y).  Then as robots explore the environment they discover tasks.  they can decide to take on the task or remember it and then continue doing what they were doing.  As the robots encounter other robots they can exchange information about the tasks they have encountered and when.  Therefore, updating their belief about the state of the tasks.  Maps are not shared between the robots.  Only the coordinates of the task relative to the current location and possibly any way points that may help the robot find it (like if you have to go in the opposite direction to get to the task.  This way the robots are assuming intelligence between the robots and acting more like humans.  This is more how humans give directions rather than sharing maps.

The interesting part is when we have more tasks than robots.  The robots must cooperatively decide what tasks to take.  I think that when the robots encounter each other that is when they have to decide how they will coordinate their actions and decide when to do cooperative tasks.

The other idea I had was about essentially behavioral bootstrapping only in the reinforcement area.  I know there was a paper on different levels of agents.  And I know that at GMU they have looked at behavior bootstrapping in the case of multi-robot learning from demonstration.  The main thing it is similar to would be cooperative cooevolution

Another idea is that we really don’t want to have to publish all the time the current bounty price.  We initially do a broadcast of the starting price and then they stop.  Any new agents on the field must ask other robots.  Whenever a new task becomes available the bondsman decides when to announce it.

The other idea is inspired from this book I am reading.  They make a good point that we as humans answer simpler questions when solving a problem we don’t know how to solve.  For example, when picking stock if you don’t know how you might be like “I like Porches” so I am going to buy stock in Porche.  Deciding on what stock to buy based on what things you like is answering a simpler question because you know you want stock, you know you like Porche, but you don’t know anything about picking stock.  So, the nice thing with humans is that we can learn to adjust what questions we ask our selves as we learn new things.  We not only learn the rewards for states and actions, but we also learn better question, how to ask better questions and learn how to answer them.

 

Google glass idea

I sitting at a counter watching the cook make the orders.  For each item he would have to look back at the slip of paper with what to make next.  This seems like the perfect situation to have google glass.  The orders would appear on the google glass in an optimized order and allow the cook to not have to keep glancing at the slips of paper.  Another cool thing would be to let the customers get a first person view from the google glass camera showing the cook make their meal.  This could be a form of advertising other meals as well.

I wonder if that is how the cooks at Chilis take orders, from slips of paper.

Dynamic lane reversal and Braess’ Paradox

http://www.cs.utexas.edu/~pstone/Papers/bib2html-links/ITSC11-hausknecht.pdf   Dynamic Lane Reversal in Traffic Management Peter Stone et al.

This paper is very interesting.  Based on my recent learning about the Price of Anarchy and Braess’ paradox I wonder why their method works.  Certainly it is an immediate solution.  However, I would imagine that as people learn the system it would degrade and Braess’ paradox would manifest.  I’m sure there is probably some sort of critical mass that would have to be reached in the number of cars before this happens.  Have to think about this…  I’m hoping to sit in on a urban transport design class maybe I can ask the teacher his opinion…

https://www.oasys-software.com/blog/2012/05/braess%E2%80%99-paradox-or-why-improving-something-can-make-it-worse/ is a nice post about the Braess’ paradox.

So, it would be interesting to be able to predict when Braess’ paradox will affect the system and as the above link suggests to automatically find instances of the paradox.  I think this is something that would be useful in route planning for commuters and possibly designing road systems.  This might be framed as a multiagent systems and distributed optimization problem.   Multiagent in the case of the modeling the uncertainty of the other drivers and coordinating with the other drivers using the app.

Adaptive Mechanism Design

I found a Peter Stone paper that address part of my idea on adaptive auctions and mechanism design.  However, they only took it part of the way.  They only looked at adapting the parameters of single type of auction.  Rather my idea was that not only would that happen but also the type of auction would change as well to adapt to the bidders.  The nice thing is that they have discussed the application areas.

So, my original ideas are basically what would happen if we combined the two following papers.

http://www.cs.utexas.edu/~pstone/Papers/bib2html-links/INFORMS10-pardoe.pdf

http://edoc.sub.uni-hamburg.de/hsu/volltexte/2014/3041/pdf/2014_02_Lang.pdf

I would want to apply it to not only auctions but also negotiation as well.

Reply with BCCs

I think you should be able to have BCC’s that when the person the email was TO: replies will also unknowingly be replying to the rest.  This could be accomplished if the person replies to the original sender.  The original sender will receive the mail and that person’s account knows that they BCC’ed others as well so if the original sender had checked that they wanted this feature the server would also cause the reply email to be sent to the BCCs.

Price of Anarchy

I am using Bounties as a distributed approach to assigning tasks and developing roles for robots in a multi-robot system.  I am trying to figure out a good learning algorithm that will get good results for this system that will scale and be general.  Therefore, since I’m not that smart I have to learn about some of the theory to do with bounties and how they operate so that I can program the bounty hunters to learn how to do well.  One game theory tool I think I should apply is the concept of the price of anarchy.

The price of anarchy is the cost of allowing the agents to act selfishly rather than being controlled by a central authority.  It is usually used in traffic routing or networks.  One interesting result is Braess’ Paradox which describes the self evident fact that low latency routes when freed for everyone to use will result in a lower social welfare because when everyone uses that route it becomes the Nash equilibrium.

I believe that in the robot case we have that auctions are a centralized approach to robot task allocation (essentially solving the graph coloring problem) whereas bounties are the selfish distributed approach.  Therefore, what is the social welfare cost of using this selfish approach.  (Social welfare is what me as the designer is interested in since I don’t want my robots congesting the route to performing a task such that the other tasks don’t get done.)

So, can I learn the “congestedness” of particular tasks sort of like the idea that I will longer route if it is faster.  Meaning I will take a bounty that will cause me to explore the environment even though there is a bounty that is close by but I know that the last time I went that way the route was congested with other bounty hunters trying to get to it.  The bounty is essentially then the motivation for me to try to get the bounty even if I have encountered congestion in the past.  The bounty is acting as a flag to be like hey, I’m not congested come get me.  However, the bounty hunters have to know that all of the hunters also know that info.  So, how do they react?  Nash equilibrium probably becomes that they all take that bounty?..  The robots have an estimate of the cost to travel to the task location and deliver that box.  They also integrate in the cost of congestion expected to task.

On a side note I can make the system heterogeneous by using the flockbots to grab the cans with the barcodes since the darwin’s can’t pick up cans, but flockbots can.  Darwins can pickup blocks and flockbots can’t pick them up.

Some links:

http://theory.stanford.edu/~tim/papers/robust.pdf why is it useful how robust?

http://math.mit.edu/~olver/msc_thesis.pdf good explanation of the concept

https://www.mpi-sb.mpg.de/~gchristo/papers/ck05a.pdf — The price of anarchy in finite congestion games seems like this would be relevant to my problem since we have finite number of robots?

 http://www.cs.cmu.edu/~avrim/Papers/ics_bbm.pdf — might be inspiration for my learning algorithm.

Bounties are Prisoners Dillema

For my CS880 project I’m making a bounty system that will hopefully enable robots to learn what roles they should do.  I’m currently going to try providing robots a tit-for-tat (or some variation of it) to learn what tasks to do.  I think this method might be good since in a real life scenario we don’t have time to do reinforcement learning.  Anyway it seems like in the simplest case robots going after bounties are essentially in a “Prisoners Dilemma” scenario.

So, the idea is that the robots if they both cooperate (neither go after the same bounty) they end up both going to jail just not for too long since they can always go after a different bounty (this is modeling the expected loss they take by not going after that bounty).  If both defect (they both go for the same bounty) then one of them is going to get it and the other won’t (since they will both be fighting over the bounty we are going to assume that it will cost them more even though one of them is going to get the reward, we are assuming that the reward was not great enough).  If one defects and the other cooperates then they don’t compete and have thus decided their roles in the system.

Does the bails bondsman tell the winner about who else was trying to get the bounty?  We would have to in order for the the defect-defect case to actually be relevant.  Otherwise from the point of view of the winner he was participating in a Defect-Cooperate outcome game and from the losers they were participating in a Defect-Defect game.

I also need to make it a weighted tit-for-tat based off of the changing bounties.  I might decided to go for the bounty if the bounty is large enough.

Adaptive Checkout Lights

[written in October 2013 but decided to publish now]

I think it would be awesome if checkout lights could be more useful.  For example, if the light would change color depending on how full the line was.  So, it could be red if it was really full, yellow if a little full, and green if empty or practically empty.  The wouldn’t even need to be any software installed to control the light.  The cashier could just flip a switch to change the light color.  Also, there could be lines on the floor to let the cashier know when to change it.  Of course I would prefer the software solution with learning etc.  The software solution would be to put up sensors like those that open the automatic doors at each of the lines on the floor.  But this way would be cheap and a good way to signal customers.

Also, (in my world 😉 ) in order to make checking people out fun there would be a friendly competition between the cashiers.  The computers could keep track of stats of each of the cashiers, like throughput, number of errors, most saved or most number of items etc.  Then you could