「強化学習」の版間の差分

en:State–action–reward–state–action 22:44, 3 December 2019‎ からコピー
(en:State–action–reward–state–action 22:44, 3 December 2019‎ からコピー)
=== SARSA ===
SARSA([[:en:state–action–reward–state–action|state–action–reward–state–action]])は方策オン型のTD学習。
:<math>Q(s_t,a_t) \leftarrow Q(s_t,a_t) + \alpha [r_{t} + \gamma Q(s_{t+1}, a_{t+1})-Q(s_t,a_t)]</math>
 
=== Q学習 ===
1,114

回編集