My original bounty hunting paper could actually be considered a market implementation of the Highest Response Ratio Next.
The bounty assigned to tasks is set to some base bounty and a bounty rate $latex r$ which in the first bounty hunting paper was set to 100 and 1 respectively. So, as each tasks was left undone the bounty on it would rise. Tasks belong to particular “task classes” which basically means that there location is drawn from the same gaussian distribution. In the paper we used 20 task classes and there were four agents. The four agents were located at each of the four corners of a 40×60 rectangular grid. The agents decide which task to go after based on which task has the highest bounty per time step which works out to be:
This is for the case when agents commit to tasks and are not allowed to abandon them. Essentially non-preemptive. When the agents are allowed to abandon tasks we then have:
Both of these equations are stating that the agents are going after the task in an HRRN order. Now, the key part that bounty hunting added was that it made it work in a multiagent setting. This is where they learned some probability of success of going after the particular task class . Also, the paper experimentally demonstrated some other nice properties of bounty hunting based task allocation in a dynamic setting.
Presently I’m taking this approach and moving it to dynamic vehicle routing setting where I use it to minimize the average waiting time of tasks where the agent doesn’t get teleported back to a home base after each task completion. Namely the dynamic multiagent traveling repairman problem. This is another setting where the Shortest Job Next (Nearest Neighbor in euclidean space) is a descent heuristic and because the agents are not reset causes interesting behavior with a non-zero bounty rate.