Neural Nets and Agent Hierarchy

So, the idea is that what if we have a hierarchy where some of the agents in the hierarchy have multiple supperiors. Then the underlying gets tasks from both its superiors.  How does it learn how much weight to put on each of the tasks.  How does it know which to do first?  I think that maybe if the hierarchy backpropagated the error (sort of like in a neural network) that the agents could learn to weight the various (inputs) commands.

Another idea is to use HyperNEAT to evolve the hierarchy.  The substrate is defined as the fully connected hierarchy, y-axis is rank and the x-axis is the id of the agent.  These coordinates get plugged into the CPPN which will then output the weight for that connection?  This would then allow a malleable hierarchy to be constructed.  Allowing agents to move in the substrate persists the weights!

It would be awesome if HyperNEAT wasn’t a centralized approach to the process.  That should be the next step.  How to do a similar procedure where the agents have more autonomy…?

 

Multiagent Learning

So, I really want my PhD thesis topic to be on large scale (>8 agents) heterogeneous multi agent learning.  The two problems in MAL are speed and generality.  The main way I see solving the speed problem is by providing the learning agent information (or background knowledge or reasonable prior probabilities).  This has been the study of transfer learning.  The other problem that we have is application.  I don’t know of any applications that would necessitate many learning agents.  However, I believe that there is an application, I just haven’t found it yet.  Most application areas are really just system development.  Solving the problem of how do I engineer this system so that given these agents with their behaviors can solve a problem or do a task etc.  That is designing systems.  The other problem is that when the number of agents increases the system may no longer be manageable in terms of actually interpreting the system.  This is another area that would be useful.  However, as you get >100 agents you end up studying an economics problem rather than a MAL problem.  Which is not what I want to do.  I intend on creating some MAS in MASON and implement a few of the current learning algorithms and see how they fair on problems requiring ~8 agents.  So, I will write some more on how I plan on evaluating them and what sort of test problems I will develop.

Prestige in Bounty hunting

What about prestige points. What if besides the bounty there was a fixed number of prestege points associated with . Sort of like going after a local town crook vs going after an international thief. This could be used to determine which tasks were available to the agent. The more prestege the harder the task and the more bounty that would be available. It would also help to stratefy the agents. Better break them up. So, there could be a task that required at least 4 stars of prestege.

We could also make it so that the agent could not commit to bounties that were lower than some threshold compared to its prestege. So, if he has a 3.4 prestege level it would be able to take on tasks at the 3-4 level.

So, this would mean that maybe prestege is a continuum and you start at prestege level 0 and can take on tasks between 0 and 1 prestege. The formula for your prestege level could be cubic a cubic polynomial. So the number of points necessary to be at level n is n^3. Then your current level is current number of prestege points /n^3.

There might be a max level that the agent could get to no matter what. Meaning this particular type of agent would only ever be a level 2 agent. Where is this more advanced agent could become a level 8 agent.

Or instead of limiting the max level you could make it so that the first x to get to level i can proceed to the next level. The rest are stuck at the i-1 level. This case would be interesting because it would allow the question of whether the agents could loose prestege points. I’m not sure how that would work yet.

I think that the bounty could do this, but possibly very slowly. I think the bounty points would speed things along when there are a large number task classes and some classes would not appear very often. Also, when the agents are heterogeneous. Meaning that some of the agents would just be better (faster or can do something else better) and would just be hogging all of the tasks that are not of as a high priority just because it can get to them faster. In that scenario the lower level agents might learn to do the harder task since it would not be competing against the faster agent.

There might even be attribute levels in addition to prestige levels. This would help to predict how good you are at certain things in your environment. For example, if you are a solar array with 5 panels that are in the sunlight vs a panel with 1 that is in the sunlight the 5 panel would be able to accomplish a tougher task like provide 5 energy vs 1 energy in one hour (don’t know reasonable units). But, the 1 panel agent might also have a bigger battery so it has a higher storage level. Might not be the best example…

The better the description of the problem and the better their cost functions become the more chance they have to doing things optimally. I think that having these levels would provide this in a bonty-esq fashion. And that within the particular strata that bounties would still be useful in helping exploration.

Note to self

Ohh how I wish I had read through my posts before going to Robocup.  I had the chance to talk with Peter Stone!!!  I could have discussed some of my ideas with him and maybe coauthored a paper with him on this.  Oh well maybe I can do it at GMU.

Well, I just remembered I wrote a couple months ago about this topic I have been talking about.  https://drewsblog.herokuapp.com/?p=1471 (Dynamic lane reversal and Braess’ Paradox).  The paper referenced there even has a nice name for what I have been talking about.

WOW…

So, at least I finally remembered.  I also took a more multiagent perspective to the problem.  I am now intrigued to find out if my method would scale compared to their ILP approach.

I also found the first authors notes!  http://www.cs.utexas.edu/~mhauskn/projects/aim/notebook.html.  They are very detailed and even mention the Braess’ Paradox!  He apparently encountered it.  It seems like the project just stopped.  Many interesting questions are still open regarding the problem.  It is certainly not solved.  Also, in there experiments they never had a road fully become one-directional there was always at least one lane that went in the other direction.  The other thing was that lane was always on the side that it “should” be on.  That lane never could change sides or be in the middle or something really strange.  Which would be possible in my system.

It seems like bounties might be able to work in this environment.  I have to think about it a bit.  Peter has been doing more recent work on the subject and using Auctions to allow drivers to bid on things.  He has also created larger scale simulations with a nice simulator for Austin Texas.  http://www.cs.utexas.edu/~mhauskn/publications.html

Cool RL reading group at UT https://www.cs.utexas.edu/~eladlieb/RLRG.html

Large MAS app. Parking lots

So the idea is that parking lots and local walmarts are not going to be able to afford remaking their parking lots so that when we have autonomous cars we can get dropped off at the door and the robot park somewhere and then pick us up at the entrance.  This is because structurally there is not enough room for a line of cars to wait for the customers to get dropped off.  Also, since most walmarts the entrances and exits are the same door we would also   The line would become like the one when picking up kids from school.  So, it is probably logistically best if the walmarts that are already built have the customers walk from the parking spot of the car.  The problem then becomes one of optimization.  One, I don’t want to walk far going into the store and coming out.  So, the idea is that the emmergent behavior that I want out of the bounties is one of a cycle where initially I want to park as close to the entrance as possible.  Then as time goes on the cars must move to spaces further from the entrance so that those just coming in can park and those leaving can move closer to the exit to pick up the leaving customers.  We want to have the robots learn patterns of behavior and adapt them rather than specific

Want to create behavior patterns and learn the patterns of the particular environment to adapt and choose which pattern of behavior should be used.  The dynamics change as you move through the parking lot since it all depends on the number of cars in the parking lot, the layout of the parking lot, the heterogeneity of the cars (trucks, tractor trailers, RVs, cars, motor cycles etc) and the uncertainty about how long each car is expecting its passengers to be in the store.  How long you are going to be in the store is not something you want to share with the other cars.  That is private information.  We only want our car to know this.  And this number is only an estimate since it is doesn’t know if you are going to stop and talk with a friend you happen to spot while in the store.  Of course to improve the accuracy there would be need to be an interface via your smart phone to be able to tell it your progression.  However, this seems like a nice area for those wearable sensors to be able to predict your progress in your trip through the store and make adjustments based on you seeing something you like that wasn’t on your list.

So the idea is that with the bounties is that the cars would have a distributed mechanism solving the constraint satisfaction problem of everyone needs a parking spot and the optimization problem of specific spots wanted.

So, I think that coalitions of agents will emerge due to common exit times of the customers.  Also I think we want to minimize the number of movements that the cars make.  So, essentially if you know that your passenger is going to be in the store for a while make room closer to the entrance.  Essentially you place a bounty out for a parking spot closer to the entrance.

To pay the bounties would your client have to collect tokens for parking far away and then you would be able to use them in order to park closer.

A case where a lot of vehicles are trying to park at the same time is at distribution centers such as Utz.  Many drivers leave the center and come back to the center to park all around the same time.  This might be interesting as well…  I don’t know…

This would also be useful in the case where we have a system where there are both autonomous and human driven cars.

Another problem is the size of the vehicles and finding parking spots that take up multiples.

That is why I think it would be awesome if the cars were sort of like the bike share program then there would always be a car waiting at the front.  The part that would yours could be stored in a locker…

Dynamic lane reversal and Braess’ Paradox

http://www.cs.utexas.edu/~pstone/Papers/bib2html-links/ITSC11-hausknecht.pdf   Dynamic Lane Reversal in Traffic Management Peter Stone et al.

This paper is very interesting.  Based on my recent learning about the Price of Anarchy and Braess’ paradox I wonder why their method works.  Certainly it is an immediate solution.  However, I would imagine that as people learn the system it would degrade and Braess’ paradox would manifest.  I’m sure there is probably some sort of critical mass that would have to be reached in the number of cars before this happens.  Have to think about this…  I’m hoping to sit in on a urban transport design class maybe I can ask the teacher his opinion…

https://www.oasys-software.com/blog/2012/05/braess%E2%80%99-paradox-or-why-improving-something-can-make-it-worse/ is a nice post about the Braess’ paradox.

So, it would be interesting to be able to predict when Braess’ paradox will affect the system and as the above link suggests to automatically find instances of the paradox.  I think this is something that would be useful in route planning for commuters and possibly designing road systems.  This might be framed as a multiagent systems and distributed optimization problem.   Multiagent in the case of the modeling the uncertainty of the other drivers and coordinating with the other drivers using the app.

Bounties are Prisoners Dillema

For my CS880 project I’m making a bounty system that will hopefully enable robots to learn what roles they should do.  I’m currently going to try providing robots a tit-for-tat (or some variation of it) to learn what tasks to do.  I think this method might be good since in a real life scenario we don’t have time to do reinforcement learning.  Anyway it seems like in the simplest case robots going after bounties are essentially in a “Prisoners Dilemma” scenario.

So, the idea is that the robots if they both cooperate (neither go after the same bounty) they end up both going to jail just not for too long since they can always go after a different bounty (this is modeling the expected loss they take by not going after that bounty).  If both defect (they both go for the same bounty) then one of them is going to get it and the other won’t (since they will both be fighting over the bounty we are going to assume that it will cost them more even though one of them is going to get the reward, we are assuming that the reward was not great enough).  If one defects and the other cooperates then they don’t compete and have thus decided their roles in the system.

Does the bails bondsman tell the winner about who else was trying to get the bounty?  We would have to in order for the the defect-defect case to actually be relevant.  Otherwise from the point of view of the winner he was participating in a Defect-Cooperate outcome game and from the losers they were participating in a Defect-Defect game.

I also need to make it a weighted tit-for-tat based off of the changing bounties.  I might decided to go for the bounty if the bounty is large enough.

Human Trafficking Simulation

Need a game that people play that simulates human trafficking without it actually being human trafficking (since that would just be wrong).  It could be a MMO strategy game that requires the players to distribute mail in a war zone.  So, you can choose  to be the good guy, the mail distributor (which corresponds to the human trafficker and the mail is like the humans).  Or the bad guy and try and stop the mail men.

We can then use machine learning to identify trends in the strategies, social networks and roles that form as the game progresses.  Using this data along with real world human trafficking data we can create a multiagent simulation that can predict the effects of various decisions.

Builder Broker

I would imagine that building a house from scratch is pretty difficult for the regular lay person. It would be for me. Making the blue prints, figuring out the permits, what contractors I need and when. Especially if I want to change something or money changes etc there are many dynamic unpredictable things that can change the whole plan. Just think of what happened in Spain when they increased the number of floors during the building stage and they didn’t include a powerful enough elevator to go up the whole way (http://tinyurl.com/kx4jbyh)! I want to create planner and scheduler that will directly connect to the broker that will be dealing with the contractors.

One really has no hope of doing it on there own if you don’t know how a house is put together.

To tackle this problem I would need a someone that is familiar with dealing with contractors and building houses.
At first would most likely have to market this to larger projects. I think that this is necessary for the future of smart cities. Eventually the system would be able to plan how to put together a whole city.

http://www.esri.com/ products would be to figure out the layout of the city. Would want to work with them to make it more collaborative. The whole city planning thing needs to deal with a ton of data. That would be an awesome place to incorporate multiagent learning. Various entities will need to cooperate to decide on things. If we have a system that can ingest the preferences and distribute them across agents in the system then the agents can learn to cooperate to better.

This guy came up with my idea for the instant city in early January 2013!!!! http://video.esri.com/watch/2116/the-instant-citygeodesign-and-urban-planning around 19:32. I am sooo glad its not just me. He thinks it is possible. I not only want what he wants the creation of the plans of how it will look like to be automatic I also want the actually plans and schedules for caring out the “Master Plan” to be automated as well. So I had his idea and took it a step farther. Not only that I think that probably we could then simulate the entire city with the people in it and make improvements to the designs.

Smart Logistics and the Knowledge Genesis Group

These are two pretty awesome companies/contractors.  They provide muliagent systems approach to solving real world problems.  They implement real time schedulers and optimization algorithms in the real world and not just as academics.  Pretty cool, too bad they aren’t based in the US (though they do have a branch in Florida).

 

http://www.knowledgegenesis.co.uk/home/what-we-do

http://smartlogisticsinc.com/partnerships/

http://goaleurope.com/2012/02/10/russian-science-put-to-a-commercial-use-real-time-multi-agent-optimization-know-how/

Russia and the rest of Europe are not only doing cutting edge research they are so much more advanced in their logistics and preparedness than the US!  Seems like the US is losing  in the multiagent systems race ;).