Bounties are Prisoners Dillema

For my CS880 project I’m making a bounty system that will hopefully enable robots to learn what roles they should do.  I’m currently going to try providing robots a tit-for-tat (or some variation of it) to learn what tasks to do.  I think this method might be good since in a real life scenario we don’t have time to do reinforcement learning.  Anyway it seems like in the simplest case robots going after bounties are essentially in a “Prisoners Dilemma” scenario.

So, the idea is that the robots if they both cooperate (neither go after the same bounty) they end up both going to jail just not for too long since they can always go after a different bounty (this is modeling the expected loss they take by not going after that bounty).  If both defect (they both go for the same bounty) then one of them is going to get it and the other won’t (since they will both be fighting over the bounty we are going to assume that it will cost them more even though one of them is going to get the reward, we are assuming that the reward was not great enough).  If one defects and the other cooperates then they don’t compete and have thus decided their roles in the system.

Does the bails bondsman tell the winner about who else was trying to get the bounty?  We would have to in order for the the defect-defect case to actually be relevant.  Otherwise from the point of view of the winner he was participating in a Defect-Cooperate outcome game and from the losers they were participating in a Defect-Defect game.

I also need to make it a weighted tit-for-tat based off of the changing bounties.  I might decided to go for the bounty if the bounty is large enough.