So the idea is that we have these beacons. They have a broadcast radius. In that radius they state what they want to have done to them, ie moved to location (x,y). Then as robots explore the environment they discover tasks. they can decide to take on the task or remember it and then continue doing what they were doing. As the robots encounter other robots they can exchange information about the tasks they have encountered and when. Therefore, updating their belief about the state of the tasks. Maps are not shared between the robots. Only the coordinates of the task relative to the current location and possibly any way points that may help the robot find it (like if you have to go in the opposite direction to get to the task. This way the robots are assuming intelligence between the robots and acting more like humans. This is more how humans give directions rather than sharing maps.
The interesting part is when we have more tasks than robots. The robots must cooperatively decide what tasks to take. I think that when the robots encounter each other that is when they have to decide how they will coordinate their actions and decide when to do cooperative tasks.
The other idea I had was about essentially behavioral bootstrapping only in the reinforcement area. I know there was a paper on different levels of agents. And I know that at GMU they have looked at behavior bootstrapping in the case of multi-robot learning from demonstration. The main thing it is similar to would be cooperative cooevolution
Another idea is that we really don’t want to have to publish all the time the current bounty price. We initially do a broadcast of the starting price and then they stop. Any new agents on the field must ask other robots. Whenever a new task becomes available the bondsman decides when to announce it.
The other idea is inspired from this book I am reading. They make a good point that we as humans answer simpler questions when solving a problem we don’t know how to solve. For example, when picking stock if you don’t know how you might be like “I like Porches” so I am going to buy stock in Porche. Deciding on what stock to buy based on what things you like is answering a simpler question because you know you want stock, you know you like Porche, but you don’t know anything about picking stock. So, the nice thing with humans is that we can learn to adjust what questions we ask our selves as we learn new things. We not only learn the rewards for states and actions, but we also learn better question, how to ask better questions and learn how to answer them.