The neural network, parametrized by θ, is trained to minimize the loss function: L(θ) = E[(r + γ maxa′Q(s′, a′; θ′)︸target-Q(s, a; θ)︸prediction)2](3) Notice that the formula closely reassembles the iterative update rule of the Bellmann equation mentioned above (Eq 2).