Expected policy gradients for reinforcement learning
- Submitting institution
-
University of Oxford
- Unit of assessment
- 11 - Computer Science and Informatics
- Output identifier
- 10414
- Type
- D - Journal article
- DOI
-
-
- Title of journal
- Journal of Machine Learning Research
- Article number
- -
- First page
- 1
- Volume
- 21
- Issue
- 2020
- ISSN
- 1532-4435
- Open access status
- Compliant
- Month of publication
- February
- Year of publication
- 2020
- URL
-
-
- Supplementary information
-
-
- Request cross-referral to
- -
- Output has been delayed by COVID-19
- No
- COVID-19 affected output statement
- -
- Forensic science
- No
- Criminology
- No
- Interdisciplinary
- No
- Number of additional authors
-
1
- Research group(s)
-
-
- Citation count
- 0
- Proposed double-weighted
- No
- Reserve for an output with double weighting
- No
- Additional information
- This paper extends a conference version at AAAI’18. It proposes a new family of policy gradient methods for reinforcement learning, called expected policy gradients, that perform analytical or numerical quadrature with respect to the agent’s action selections. The paper proves a new general policy gradient theorem that subsumes existing deterministic and stochastic policy gradient theorems and allows the well-known off-policy deterministic policy gradient method to be reinterpreted as an on-policy expected policy gradient method. It also yields new exploration strategies for policy gradient methods and presents empirical results demonstrating that these strategies substantially improve performance over existing exploration strategies.
- Author contribution statement
- -
- Non-English
- No
- English abstract
- -