What about prestige points. What if besides the bounty there was a fixed number of prestege points associated with . Sort of like going after a local town crook vs going after an international thief. This could be used to determine which tasks were available to the agent. The more prestege the harder the task and the more bounty that would be available. It would also help to stratefy the agents. Better break them up. So, there could be a task that required at least 4 stars of prestege.
We could also make it so that the agent could not commit to bounties that were lower than some threshold compared to its prestege. So, if he has a 3.4 prestege level it would be able to take on tasks at the 3-4 level.
So, this would mean that maybe prestege is a continuum and you start at prestege level 0 and can take on tasks between 0 and 1 prestege. The formula for your prestege level could be cubic a cubic polynomial. So the number of points necessary to be at level n is n^3. Then your current level is current number of prestege points /n^3.
There might be a max level that the agent could get to no matter what. Meaning this particular type of agent would only ever be a level 2 agent. Where is this more advanced agent could become a level 8 agent.
Or instead of limiting the max level you could make it so that the first x to get to level i can proceed to the next level. The rest are stuck at the i-1 level. This case would be interesting because it would allow the question of whether the agents could loose prestege points. I’m not sure how that would work yet.
I think that the bounty could do this, but possibly very slowly. I think the bounty points would speed things along when there are a large number task classes and some classes would not appear very often. Also, when the agents are heterogeneous. Meaning that some of the agents would just be better (faster or can do something else better) and would just be hogging all of the tasks that are not of as a high priority just because it can get to them faster. In that scenario the lower level agents might learn to do the harder task since it would not be competing against the faster agent.
There might even be attribute levels in addition to prestige levels. This would help to predict how good you are at certain things in your environment. For example, if you are a solar array with 5 panels that are in the sunlight vs a panel with 1 that is in the sunlight the 5 panel would be able to accomplish a tougher task like provide 5 energy vs 1 energy in one hour (don’t know reasonable units). But, the 1 panel agent might also have a bigger battery so it has a higher storage level. Might not be the best example…
The better the description of the problem and the better their cost functions become the more chance they have to doing things optimally. I think that having these levels would provide this in a bonty-esq fashion. And that within the particular strata that bounties would still be useful in helping exploration.