updated reward function to speed up but not working as expected.

0

Hi
I updated reward_function to speed up using params['steps'] but not working as expected.

The reward_function is this

=====================================
def reward_function(params):
'''
Example of rewarding the agent to follow center line
'''
reward=1e-3

# Read input parameters  
track_width = params\['track_width']  
distance_from_center = params\['distance_from_center']  
steering = params\['steering_angle']  
speed = params\['speed']  
all_wheels_on_track = params\['all_wheels_on_track']  
steps = params\['steps']  
progress = params\['progress']  
  
# Total num of steps we want the car to finish the lap, it will vary depends on the track length  
TOTAL_NUM_STEPS = 300  
  
if distance_from_center >=0.0 and distance_from_center <= 0.03:  
    reward = 1.0  
  
if not all_wheels_on_track:  
    reward = reward -1  
else:  
    reward = reward + progress  
      
# add speed penalty  
if speed < 1.0:  
    reward *=0.80  
else:   
    reward += speed  
      
**# Give additional reward if the car pass every 100 steps faster than expected**   
**if (steps % 50) == 0 and progress > (steps / TOTAL_NUM_STEPS) :**  
    **reward += 10.0**  
  
return float(reward)  

==================================================

There was no lines with bold in previous model.

I trained for 5 hours but the result is slower than before.

with Previous model, I could get 2124 seconds to run the Empire City track but with the new model after training for 5 hour, I am getting 2427 seconds.

Other Action space and hyperparameters are same.

Anybody knows why the result is slower than before?

질문됨 5년 전811회 조회
1개 답변
0

Hi,

You could insert some debug code to ensure it is getting applied, e.g. print("Fast reward bonus given").

However, looking at your code, I think the issue is that progress is a value from 0-100, where as your steps / TOTAL_NUM_STEPS gives a fraction 0-1. So progress will nearly always be greater than that, and so your bonus is probably being given regardless of actual performance.

I'm also not sure that a bonus of 10 every 50 steps would be enough to encourage the model to favour those actions. It would also depend on your discount rate, e.g. if you're using the default discount rate of 0.9 then that only equates to 10 steps look ahead, so your bonus would be invisible to most policy training updates.

Finally, you need to be careful of overfitting your model to the track. When that happens, lap times can get worse as the model learns the safer actions/track position to ensure more stability. So that might explain why your model got slower, even if your reward wasn't correctly being applied.

Lyndon

ETAGGEL
답변함 5년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠