PropertyValue
nif:beginIndex
  • 0 (xsd:integer)
nif:broaderContext
nif:endIndex
  • 269 (xsd:integer)
nif:isString
  • The neural network, parametrized by θ, is trained to minimize the loss function: L(θ) = E[(r + γ maxa′Q(s′, a′; θ′)︸target-Q(s, a; θ)︸prediction)2](3) Notice that the formula closely reassembles the iterative update rule of the Bellmann equation mentioned above (Eq 2).
rdf:type