The IROS paper was submitted 10 minutes before the deadline on March 1st. The paper was okay, but I’m not thrilled about it because it is only the beginning stages of the work. So, it doesn’t really use the bounty hunting stuff to its fullest. Future papers will hopefully provide that. We also had to create a video. Ermo and I got the footage of the Pioneer doing the visual servoing on the ball both indoors and semi-successfully outdoors (either due to lighting or to the massively long ping times was not able always stop in time before running into the ball).
Now that is over I’m taking a bit of a break from working with robots and am working with Ermo to do some research in continuous action multiagent learning (CAMAL our acronym, don’t steal it!). It has been a ton of fun! I’ve been doing derivations and learning new math (and re-learning some old math!). Getting to work out how the equations and proofs of convergence happen I believe will aid me in my work with doing proofs of convergence with bounty hunting in the future. So, it is benefiting Ermo and me at the same time and we are having fun in the process :)! We are working on this for a paper in NIPS. If our algorithm concept is successful and we can show some convergence we should have a very strong NIPS paper. David is also trying to join in and we can greatly use his help as it is a very challenging problem. Main issue is he is still taking classes. Can’t wait for his classes to be done!
Dan wants us to write up a paper for the Robocup Symposium on how we were able to adjust the behaviors to work with a robot that could not turn that well using HiTAB (our learning from demonstration software). That is due March 25! He just told us about it this monday. Thankfully though we won’t have to produce any experiments or results we just need to write about what we did. So we are actually pushing that off until the 22nd because Ermo and I need to make a poster and a slide for GMU’s first CS symposium. Just what we need, not! The really frustrating part is that it is mandatory for all CS PhD students to attend and those selected to present are required to present! So, it is not optional. I guess they figured no one would show up otherwise.
So, I’m working with Ermo on applying reinforcement learning to text based games. So, I was wondering if eventually if our method works if we could do text based learning from demonstration with reinforcement learning? Basically instead of the user pressing buttons they would describe what they wanted the system to do using english sentences. The user could then be able to say yes or no to what they are doing. Using natural language to train a multiagent system seems like it would be better. Especially since once it works for text, it could naturally be extended to speech! Telling the robots what to do and what to pay attention to would be even better.
I’m planning my directed reading class this coming semester. So, basically I have/get to come up with an entire semester’s worth of material. I might be able to make a class out of it by the time I’m done 🙂 haha. My subjects are focusing on the areas I want to explore with the bounty hunting task allocation model.
- Petri-net models and applications to MAS and task allocation. Basically I’m thinking that petri-nets can be used to create task allocation problems abstractly. By doing so I can better gauge the usefulness of different task allocation methods.
- Contest Theory and tournament theory. They are two areas which seem like they have crucial information for helping the bounty hunting method mature. Explore how aspects such as effort can be incorporated into the task allocation problem to make it more realistic.
- Coalitional Game theory and the El Farol Bar problem. The game theory aspect will help me create better learning algorithms for the bounty hunters and the El Farol problem is very similar to the bounty hunting problem. Also, look into collectives, a complex systems term that describes a system of selfish agents that have a well defined system level performance measure.
- Specific problems in Multiagent Task Allocation. Namely cooperative, resource dependent, heterogeneous tasks and tasks that have sub-tasks.
- Automated Task decomposition for MAS. This is mainly so that I can explore the idea that behaviors (and possibly tasks) may be automatically reduced to states with single feature transitions hierarchically using only state action pair sequences. Doing so will allow complex behaviors to be learned without so much work required by the human to decompose the problem. I’d also like to apply it to tasks.
- Learning coalition formation, or automatic organization formation for MAS mainly with coordination in mind. Main applications focusing on large scale systems like autonomous vehicles, cloud robotics and crisis management.
- Large scale MAL/MAS and applications. autonomous vehicles, cloud robotics and crisis management
- Current topics in reinforcement learning and its use in MAS.
I’m working on fleshing out the details before the semester starts so that I might already have the list of papers I want to read.
I’m sure someone has thought of this, but I didn’t look.
So, what if we delayed the update of the q-value and the policy for x number of time steps. We would keep track of the history and create an average reward. Initialize the policy for each action to be 1/|A| so that we can start with a large x. Then we can update q-value and the policies for each of the actions based on that average reward. Finally we can modify x based on a damped sine wave as time goes on.
This is similar to a lenient learner, but may also work against competitive players as well. The time where I average is sort of like a learning period to see what type of opponent and player I am playing against. This is what is like the lenient learner.
I found a really cool paper that basically benchmarks a lot of MARL algorithms using MGS, a stochastic game simulator written in Java. This simulator is like my MALSIM except I only implemented features for repeated games not stochastic games.
Here is a nice informative PhD thesis on Reinforcement Learning. Really does a nice job in Chapter 2.2.3 explaining the use of Boltzmann distribution and Q-values.