updated reward function to speed up but not working as expected.


I updated reward_function to speed up using params['steps'] but not working as expected.

The reward_function is this

def reward_function(params):
Example of rewarding the agent to follow center line

# Read input parameters  
track_width = params\['track_width']  
distance_from_center = params\['distance_from_center']  
steering = params\['steering_angle']  
speed = params\['speed']  
all_wheels_on_track = params\['all_wheels_on_track']  
steps = params\['steps']  
progress = params\['progress']  
# Total num of steps we want the car to finish the lap, it will vary depends on the track length  
if distance_from_center >=0.0 and distance_from_center <= 0.03:  
    reward = 1.0  
if not all_wheels_on_track:  
    reward = reward -1  
    reward = reward + progress  
# add speed penalty  
if speed < 1.0:  
    reward *=0.80  
    reward += speed  
**# Give additional reward if the car pass every 100 steps faster than expected**   
**if (steps % 50) == 0 and progress > (steps / TOTAL_NUM_STEPS) :**  
    **reward += 10.0**  
return float(reward)  


There was no lines with bold in previous model.

I trained for 5 hours but the result is slower than before.

with Previous model, I could get 2124 seconds to run the Empire City track but with the new model after training for 5 hour, I am getting 2427 seconds.

Other Action space and hyperparameters are same.

Anybody knows why the result is slower than before?

asked 4 years ago247 views
1 Answer


You could insert some debug code to ensure it is getting applied, e.g. print("Fast reward bonus given").

However, looking at your code, I think the issue is that progress is a value from 0-100, where as your steps / TOTAL_NUM_STEPS gives a fraction 0-1. So progress will nearly always be greater than that, and so your bonus is probably being given regardless of actual performance.

I'm also not sure that a bonus of 10 every 50 steps would be enough to encourage the model to favour those actions. It would also depend on your discount rate, e.g. if you're using the default discount rate of 0.9 then that only equates to 10 steps look ahead, so your bonus would be invisible to most policy training updates.

Finally, you need to be careful of overfitting your model to the track. When that happens, lap times can get worse as the model learns the safer actions/track position to ensure more stability. So that might explain why your model got slower, even if your reward wasn't correctly being applied.


answered 4 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions