Directed Reading

I’m planning my directed reading class this coming semester.  So, basically I have/get to come up with an entire semester’s worth of material.  I might be able to make a class out of it by the time I’m done 🙂 haha.  My subjects are focusing on the areas I want to explore with the bounty hunting task allocation model.

  1. Petri-net models and applications to MAS and task allocation.  Basically I’m thinking that petri-nets can be used to create task allocation problems abstractly.  By doing so I can better gauge the usefulness of different task allocation methods.
  2. Contest Theory and tournament theory.  They are two areas which seem like they have crucial information for helping the bounty hunting method mature.  Explore how aspects such as effort can be incorporated into the task allocation problem to make it more realistic.
  3. Coalitional Game theory and the El Farol Bar problem.  The game theory aspect will help me create better learning algorithms for the bounty hunters and the El Farol problem is very similar to the bounty hunting problem.  Also, look into collectives, a complex systems term that describes a system of selfish agents that have a well defined system level performance measure.
  4. Specific problems in Multiagent Task Allocation.  Namely cooperative, resource dependent, heterogeneous tasks and tasks that have sub-tasks.
  5. Automated Task decomposition for MAS.  This is mainly so that I can explore the idea that behaviors (and possibly tasks) may be automatically reduced to states with single feature transitions hierarchically using only state action pair sequences.  Doing so will allow complex behaviors to be learned without so much work required by the human to decompose the problem.  I’d also like to apply it to tasks.
  6. Learning coalition formation, or automatic organization formation for MAS mainly with coordination in mind.  Main applications focusing on large scale systems like autonomous vehicles, cloud robotics and crisis management.
  7. Large scale MAL/MAS and applications.  autonomous vehicles, cloud robotics and crisis management
  8. Current topics in reinforcement learning and its use in MAS.

I’m working on fleshing out the details before the semester starts so that I might already have the list of papers I want to read.

Price Surging and Bounty Hunting

So, it seems like uber might have a bounty type pricing model.  They have this system of price surging.  This we also found doesn’t work for adjusting the bounty.  I thought that maybe it would provide a method to get agents to go after the right tasks, it doesn’t.  However, it does give me hope that maybe my bounty stuff will be useful in something like taxis.

I found this article here.

Neural Nets and Agent Hierarchy

So, the idea is that what if we have a hierarchy where some of the agents in the hierarchy have multiple supperiors. Then the underlying gets tasks from both its superiors.  How does it learn how much weight to put on each of the tasks.  How does it know which to do first?  I think that maybe if the hierarchy backpropagated the error (sort of like in a neural network) that the agents could learn to weight the various (inputs) commands.

Another idea is to use HyperNEAT to evolve the hierarchy.  The substrate is defined as the fully connected hierarchy, y-axis is rank and the x-axis is the id of the agent.  These coordinates get plugged into the CPPN which will then output the weight for that connection?  This would then allow a malleable hierarchy to be constructed.  Allowing agents to move in the substrate persists the weights!

It would be awesome if HyperNEAT wasn’t a centralized approach to the process.  That should be the next step.  How to do a similar procedure where the agents have more autonomy…?


Multiagent Learning

So, I really want my PhD thesis topic to be on large scale (>8 agents) heterogeneous multi agent learning.  The two problems in MAL are speed and generality.  The main way I see solving the speed problem is by providing the learning agent information (or background knowledge or reasonable prior probabilities).  This has been the study of transfer learning.  The other problem that we have is application.  I don’t know of any applications that would necessitate many learning agents.  However, I believe that there is an application, I just haven’t found it yet.  Most application areas are really just system development.  Solving the problem of how do I engineer this system so that given these agents with their behaviors can solve a problem or do a task etc.  That is designing systems.  The other problem is that when the number of agents increases the system may no longer be manageable in terms of actually interpreting the system.  This is another area that would be useful.  However, as you get >100 agents you end up studying an economics problem rather than a MAL problem.  Which is not what I want to do.  I intend on creating some MAS in MASON and implement a few of the current learning algorithms and see how they fair on problems requiring ~8 agents.  So, I will write some more on how I plan on evaluating them and what sort of test problems I will develop.

Decision Theory Paradox is a decision theory paradox.  One solution to the paradox is to use this Choquet integral.  I don’t have time to read about it right now, but it seems like something that would be interesting to observe in multi-agent learning algorithms.  What decision do they learn to take, what would make them learn the Choquet integral method?  Is it stable, meaning do they always chose a particular action in line with a particular method or not?

A paper on the subject:

Seems like it is useful in multi-objective combinatorial optimization:

Maybe I was asking the wrong questions as no one is looking at it from a MAL perspective.

Large MAS app. Parking lots

So the idea is that parking lots and local walmarts are not going to be able to afford remaking their parking lots so that when we have autonomous cars we can get dropped off at the door and the robot park somewhere and then pick us up at the entrance.  This is because structurally there is not enough room for a line of cars to wait for the customers to get dropped off.  Also, since most walmarts the entrances and exits are the same door we would also   The line would become like the one when picking up kids from school.  So, it is probably logistically best if the walmarts that are already built have the customers walk from the parking spot of the car.  The problem then becomes one of optimization.  One, I don’t want to walk far going into the store and coming out.  So, the idea is that the emmergent behavior that I want out of the bounties is one of a cycle where initially I want to park as close to the entrance as possible.  Then as time goes on the cars must move to spaces further from the entrance so that those just coming in can park and those leaving can move closer to the exit to pick up the leaving customers.  We want to have the robots learn patterns of behavior and adapt them rather than specific

Want to create behavior patterns and learn the patterns of the particular environment to adapt and choose which pattern of behavior should be used.  The dynamics change as you move through the parking lot since it all depends on the number of cars in the parking lot, the layout of the parking lot, the heterogeneity of the cars (trucks, tractor trailers, RVs, cars, motor cycles etc) and the uncertainty about how long each car is expecting its passengers to be in the store.  How long you are going to be in the store is not something you want to share with the other cars.  That is private information.  We only want our car to know this.  And this number is only an estimate since it is doesn’t know if you are going to stop and talk with a friend you happen to spot while in the store.  Of course to improve the accuracy there would be need to be an interface via your smart phone to be able to tell it your progression.  However, this seems like a nice area for those wearable sensors to be able to predict your progress in your trip through the store and make adjustments based on you seeing something you like that wasn’t on your list.

So the idea is that with the bounties is that the cars would have a distributed mechanism solving the constraint satisfaction problem of everyone needs a parking spot and the optimization problem of specific spots wanted.

So, I think that coalitions of agents will emerge due to common exit times of the customers.  Also I think we want to minimize the number of movements that the cars make.  So, essentially if you know that your passenger is going to be in the store for a while make room closer to the entrance.  Essentially you place a bounty out for a parking spot closer to the entrance.

To pay the bounties would your client have to collect tokens for parking far away and then you would be able to use them in order to park closer.

A case where a lot of vehicles are trying to park at the same time is at distribution centers such as Utz.  Many drivers leave the center and come back to the center to park all around the same time.  This might be interesting as well…  I don’t know…

This would also be useful in the case where we have a system where there are both autonomous and human driven cars.

Another problem is the size of the vehicles and finding parking spots that take up multiples.

That is why I think it would be awesome if the cars were sort of like the bike share program then there would always be a car waiting at the front.  The part that would yours could be stored in a locker…

Dynamic lane reversal and Braess’ Paradox   Dynamic Lane Reversal in Traffic Management Peter Stone et al.

This paper is very interesting.  Based on my recent learning about the Price of Anarchy and Braess’ paradox I wonder why their method works.  Certainly it is an immediate solution.  However, I would imagine that as people learn the system it would degrade and Braess’ paradox would manifest.  I’m sure there is probably some sort of critical mass that would have to be reached in the number of cars before this happens.  Have to think about this…  I’m hoping to sit in on a urban transport design class maybe I can ask the teacher his opinion… is a nice post about the Braess’ paradox.

So, it would be interesting to be able to predict when Braess’ paradox will affect the system and as the above link suggests to automatically find instances of the paradox.  I think this is something that would be useful in route planning for commuters and possibly designing road systems.  This might be framed as a multiagent systems and distributed optimization problem.   Multiagent in the case of the modeling the uncertainty of the other drivers and coordinating with the other drivers using the app.

Price of Anarchy

I am using Bounties as a distributed approach to assigning tasks and developing roles for robots in a multi-robot system.  I am trying to figure out a good learning algorithm that will get good results for this system that will scale and be general.  Therefore, since I’m not that smart I have to learn about some of the theory to do with bounties and how they operate so that I can program the bounty hunters to learn how to do well.  One game theory tool I think I should apply is the concept of the price of anarchy.

The price of anarchy is the cost of allowing the agents to act selfishly rather than being controlled by a central authority.  It is usually used in traffic routing or networks.  One interesting result is Braess’ Paradox which describes the self evident fact that low latency routes when freed for everyone to use will result in a lower social welfare because when everyone uses that route it becomes the Nash equilibrium.

I believe that in the robot case we have that auctions are a centralized approach to robot task allocation (essentially solving the graph coloring problem) whereas bounties are the selfish distributed approach.  Therefore, what is the social welfare cost of using this selfish approach.  (Social welfare is what me as the designer is interested in since I don’t want my robots congesting the route to performing a task such that the other tasks don’t get done.)

So, can I learn the “congestedness” of particular tasks sort of like the idea that I will longer route if it is faster.  Meaning I will take a bounty that will cause me to explore the environment even though there is a bounty that is close by but I know that the last time I went that way the route was congested with other bounty hunters trying to get to it.  The bounty is essentially then the motivation for me to try to get the bounty even if I have encountered congestion in the past.  The bounty is acting as a flag to be like hey, I’m not congested come get me.  However, the bounty hunters have to know that all of the hunters also know that info.  So, how do they react?  Nash equilibrium probably becomes that they all take that bounty?..  The robots have an estimate of the cost to travel to the task location and deliver that box.  They also integrate in the cost of congestion expected to task.

On a side note I can make the system heterogeneous by using the flockbots to grab the cans with the barcodes since the darwin’s can’t pick up cans, but flockbots can.  Darwins can pickup blocks and flockbots can’t pick them up.

Some links: why is it useful how robust? good explanation of the concept — The price of anarchy in finite congestion games seems like this would be relevant to my problem since we have finite number of robots? — might be inspiration for my learning algorithm.

Bounties are Prisoners Dillema

For my CS880 project I’m making a bounty system that will hopefully enable robots to learn what roles they should do.  I’m currently going to try providing robots a tit-for-tat (or some variation of it) to learn what tasks to do.  I think this method might be good since in a real life scenario we don’t have time to do reinforcement learning.  Anyway it seems like in the simplest case robots going after bounties are essentially in a “Prisoners Dilemma” scenario.

So, the idea is that the robots if they both cooperate (neither go after the same bounty) they end up both going to jail just not for too long since they can always go after a different bounty (this is modeling the expected loss they take by not going after that bounty).  If both defect (they both go for the same bounty) then one of them is going to get it and the other won’t (since they will both be fighting over the bounty we are going to assume that it will cost them more even though one of them is going to get the reward, we are assuming that the reward was not great enough).  If one defects and the other cooperates then they don’t compete and have thus decided their roles in the system.

Does the bails bondsman tell the winner about who else was trying to get the bounty?  We would have to in order for the the defect-defect case to actually be relevant.  Otherwise from the point of view of the winner he was participating in a Defect-Cooperate outcome game and from the losers they were participating in a Defect-Defect game.

I also need to make it a weighted tit-for-tat based off of the changing bounties.  I might decided to go for the bounty if the bounty is large enough.