sentence7
at
SoMeSci
http://data.gesis.org/somesci/PMC5381785/sentence7
Property
Value
nif:
beginIndex
0
(xsd:integer)
nif:
broaderContext
sms:
PMC5381785
nif:
endIndex
190
(xsd:integer)
nif:
isString
Q-learning is based on estimating the expected total discounted future rewards (the quality) of each state-action pair under a policy π: Qπ(st, at) = E[rt+1 + γrt+2 + γ2rt+2 + … + γT-trT|π].
rdf:
type
nif:
Context
nif:
OffsetBasedString
nif:
Sentence