- 新しい順
- 投票が多い順
- コメントが多い順
Hi,
You could insert some debug code to ensure it is getting applied, e.g. print("Fast reward bonus given").
However, looking at your code, I think the issue is that progress is a value from 0-100, where as your steps / TOTAL_NUM_STEPS gives a fraction 0-1. So progress will nearly always be greater than that, and so your bonus is probably being given regardless of actual performance.
I'm also not sure that a bonus of 10 every 50 steps would be enough to encourage the model to favour those actions. It would also depend on your discount rate, e.g. if you're using the default discount rate of 0.9 then that only equates to 10 steps look ahead, so your bonus would be invisible to most policy training updates.
Finally, you need to be careful of overfitting your model to the track. When that happens, lap times can get worse as the model learns the safer actions/track position to ensure more stability. So that might explain why your model got slower, even if your reward wasn't correctly being applied.
Lyndon