updated reward function to speed up but not working as expected.

0

Hi
I updated reward_function to speed up using params['steps'] but not working as expected.

The reward_function is this

=====================================
def reward_function(params):
'''
Example of rewarding the agent to follow center line
'''
reward=1e-3

# Read input parameters  
track_width = params\['track_width']  
distance_from_center = params\['distance_from_center']  
steering = params\['steering_angle']  
speed = params\['speed']  
all_wheels_on_track = params\['all_wheels_on_track']  
steps = params\['steps']  
progress = params\['progress']  
  
# Total num of steps we want the car to finish the lap, it will vary depends on the track length  
TOTAL_NUM_STEPS = 300  
  
if distance_from_center >=0.0 and distance_from_center <= 0.03:  
    reward = 1.0  
  
if not all_wheels_on_track:  
    reward = reward -1  
else:  
    reward = reward + progress  
      
# add speed penalty  
if speed < 1.0:  
    reward *=0.80  
else:   
    reward += speed  
      
**# Give additional reward if the car pass every 100 steps faster than expected**   
**if (steps % 50) == 0 and progress > (steps / TOTAL_NUM_STEPS) :**  
    **reward += 10.0**  
  
return float(reward)  

==================================================

There was no lines with bold in previous model.

I trained for 5 hours but the result is slower than before.

with Previous model, I could get 2124 seconds to run the Empire City track but with the new model after training for 5 hour, I am getting 2427 seconds.

Other Action space and hyperparameters are same.

Anybody knows why the result is slower than before?

質問済み 5年前963ビュー
1回答
0

Hi,

You could insert some debug code to ensure it is getting applied, e.g. print("Fast reward bonus given").

However, looking at your code, I think the issue is that progress is a value from 0-100, where as your steps / TOTAL_NUM_STEPS gives a fraction 0-1. So progress will nearly always be greater than that, and so your bonus is probably being given regardless of actual performance.

I'm also not sure that a bonus of 10 every 50 steps would be enough to encourage the model to favour those actions. It would also depend on your discount rate, e.g. if you're using the default discount rate of 0.9 then that only equates to 10 steps look ahead, so your bonus would be invisible to most policy training updates.

Finally, you need to be careful of overfitting your model to the track. When that happens, lap times can get worse as the model learns the safer actions/track position to ensure more stability. So that might explain why your model got slower, even if your reward wasn't correctly being applied.

Lyndon

回答済み 5年前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ