sentence2 | SoMeSci

Property	Value
nif:beginIndex	0 (xsd:integer)
nif:broaderContext	sms:PMC5381785
nif:endIndex	190 (xsd:integer)
nif:isString	The reinforcement learning problem is usually modeled as a Markov decision process (MDP) giving rise to a sequence of observed, states, actions and rewards—s0, a0, r1, s1, a1, r2, s2, …, sT.
rdf:type	nif:Context nif:OffsetBasedString nif:Sentence