The statistics are collected after each training epoch by letting the DQNs (in their current state) play against each other for 10 games, each game initialized with a different random seed (In Pong one game consists of multiple exchanges and lasts until one of the agents reaches 21 points.).