PropertyValue
nif:beginIndex
  • 0 (xsd:integer)
nif:broaderContext
nif:endIndex
  • 205 (xsd:integer)
nif:isString
  • The goal of reinforcement learning is to find the policy π—a set of rules to select an action in each possible state—that would maximize the agent’s accumulated long term reward in a dynamical environment.
rdf:type