This may take a few seconds as the server wakes up.
6670:27:10
Average Score on Ten Games: 36.5
This autonomous agent was trained for two hundred iterations using deep q learning and experience replay. Observations of human expert gameplay were then used to finetune the agent's performance.
Average Score on Ten Games: 0
This autonomous agent is trained to mimic the original human player. Human state-action observations were used as training data for a policy network that predicts the next action given the current state.
Average Score on Ten Games: 0
This agent uses inverse q learning to estimate the q values for a given state. All training is done completely offline using 499 observations of human gamplay.