sentence25 | SoMeSci

Property	Value
nif:beginIndex	0 (xsd:integer)
nif:broaderContext	sms:PMC5381785
nif:endIndex	250 (xsd:integer)
nif:isString	To balance the exploitation of the current best known Q-values with the exploration of even better options, DQN uses a simple ϵ-greedy policy that samples a random action with probability ϵ (instead of always picking the action with maximal Q-value).
rdf:type	nif:Context nif:OffsetBasedString nif:Sentence