Reinforcement Learning : An Introduction

by Richard S. Sutton and Andrew G. Barto

ISBN13: 9780262193986

ISBN10: 0262193981

Format: Hardcover

Pub. Date: 1998-03-01

Publisher(s): Mit Pr

Other versions by this Author

We Buy This Book Back!

In-Store Credit: $1.05

Check/Direct Deposit: $1.00

PayPal: $1.00

Sell Book

List Price: ~~$80.25~~

Buy New

Usually Ships in 8 - 10 Business Days.

$76.43

Add to Cart

Rent Textbook

Select for Price

Add to Cart

There was a problem. Please try again later.

Used Textbook

We're Sorry
Sold Out

eTextbook

We're Sorry
Not Available

Buy from our Marketplace starting at $18.72

Summary

Reinforcement learning, one of the most active research areas in artificial intelligence, is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives when interacting with a complex, uncertain environment. In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning. Their discussion ranges from the history of the field's intellectual foundations to the most recent developments and applications. The only necessary mathematical background is familiarity with elementary concepts of probability. The book is divided into three parts. Part I defines the reinforcement learning problem in terms of Markov decision processes. Part II provides basic solution methods: dynamic programming, Monte Carlo methods, and temporal-difference learning. Part III presents a unified view of the solution methods and incorporates artificial neural networks, eligibility traces, and planning; the two final chapters present case studies and consider the future of reinforcement learning.

Series Foreword

xiii

(2)

Preface

I The Problem

(86)

1 Introduction

(22)

1.1 Reinforcement Learning

(3)

1.2 Examples

(1)

1.3 Elements of Reinforcement Learning

(3)

1.4 An Extended Example: Tic-Tac-Toe

(5)

1.5 Summary

(1)

1.6 History of Reinforcement Learning

(7)

1.7 Bibliographical Remarks

(2)

2 Evaluative Feedback

(26)

2.1 An n-Armed Bandit Problem

(1)

2.2 Action-Value Methods

(3)

2.3 Softmax Action Selection

(1)

2.4 Evaluation Versus Instruction

(5)

2.5 Incremental Implementation

(2)

2.6 Tracking a Nonstationary Problem

(1)

2.7 Optimistic Initial Values

(2)

2.8 Reinforcement Comparison

(2)

2.9 Pursuit Methods

(2)

2.10 Associative Search

(1)

2.11 Conclusions

(2)

2.12 Bibliographical and Historical Remarks

(3)

3 The Reinforcement Learning Problem

(36)

3.1 The Agent-Environment Interface

(5)

3.2 Goals and Rewards

(1)

3.3 Returns

(3)

3.4 Unified Notation for Episodic and Continuing Tasks

(1)

3.5 The Markov Property

(5)

3.6 Markov Decision Processes

(2)

3.7 Value Functions

(7)

3.8 Optimal Value Functions

(5)

3.9 Optimality and Approximation

(1)

3.10 Summary

(2)

3.11 Bibliographical and Historical Remarks

(4)

II Elementary Solution Methods

(74)

4 Dynamic Programming

(22)

4.1 Policy Evaluation

(3)

4.2 Policy Improvement

(4)

4.3 Policy Iteration

(3)

4.4 Value Iteration

100

(3)

4.5 Asynchronous Dynamic Programming

103

(2)

4.6 Generalized Policy Iteration

105

(2)

4.7 Efficiency of Dynamic Programming

107

(1)

4.8 Summary

108

(1)

4.9 Bibliographical and Historical Remarks

109

(2)

5 Monte Carlo Methods

111

(22)

5.1 Monte Carlo Policy Evaluation

112

(4)

5.2 Monte Carlo Estimation of Action Values

116

(2)

5.3 Monte Carlo Control

118

(4)

5.4 On-Policy Monte Carlo Control

122

(2)

5.5 Evaluating One Policy While Following Another

124

(2)

5.6 Off-Policy Monte Carlo Control

126

(2)

5.7 Incremental Implementation

128

(1)

5.8 Summary

129

(2)

5.9 Bibliographical and Historical Remarks

131

(2)

6 Temporal-Difference Learning

133

(28)

6.1 TD Prediction

133

(5)

6.2 Advantages of TD Prediction Methods

138

(3)

6.3 Optimality of TD(0)

141

(4)

6.4 Sarsa: On-Policy TD Control

145

(3)

6.5 Q-Learning: Off-Policy TD Control

148

(3)

6.6 Actor-Critic Methods

151

(2)

6.7 R-Learning for Undiscounted Continuing Tasks

153

(3)

6.8 Games, Afterstates, and Other Special Cases

156

(1)

6.9 Summary

157

(1)

6.10 Bibliographical and Historical Remarks

158

(3)

III A Unified View

161

(130)

7 Eligibility Traces

163

(30)

7.1 n-Step TD Prediction

164

(5)

7.2 The Forward View of TD(Frequency)

169

(4)

7.3 The Backward View of TD(Frequency)

173

(3)

7.4 Equivalence of Forward and Backward Views

176

(3)

7.5 Sarsa(Frequency)

179

(3)

7.6 Q(Frequency)

182

(3)

7.7 Eligibility Traces for Actor-Critic Methods

185

(1)

7.8 Replacing Traces

186

(3)

7.9 Implementation Issues

189

(1)

7.10 Variable Frequency

189

(1)

7.11 Conclusions

190

(1)

7.12 Bibliographical and Historical Remarks

191

(2)

8 Generalization and Function Approximation

193

(34)

8.1 Value Prediction with Function Approximation

194

(3)

8.2 Gradient-Descent Methods

197

(3)

8.3 Linear Methods

200

(10)

8.4 Control with Function Approximation

210

(6)

8.5 Off-Policy Bootstrapping

216

(4)

8.6 Should We Bootstrap?

220

(2)

8.7 Summary

222

(1)

8.8 Bibliographical and Historical Remarks

223

(4)

9 Planning and Learning

227

(28)

9.1 Models and Planning

227

(3)

9.2 Integrating Planning, Acting, and Learning

230

(5)

9.3 When the Model Is Wrong

235

(3)

9.4 Prioritized Sweeping

238

(4)

9.5 Full vs. Sample Backups

242

(4)

9.6 Trajectory Sampling

246

(4)

9.7 Heuristic Search

250

(2)

9.8 Summary

252

(2)

9.9 Bibliographical and Historical Remarks

254

(1)

10 Dimensions of Reinforcement Learning

255

(6)

10.1 The Unified View

255

(3)

10.2 Other Frontier Dimensions

258

(3)

11 Case Studies

261

(30)

11.1 TD-Gammon

261

(6)

11.2 Samuel's Checkers Player

267

(3)

11.3 The Acrobot

270

(4)

11.4 Elevator Dispatching

274

(5)

11.5 Dynamic Channel Allocation

279

(4)

11.6 Job-Shop Scheduling

283

(8)

References

291

(22)

Summary of Notation

313

(2)

Index

315

Men

Women

For You

For Your Car

Holiday

Mascot

School Supplies

Reinforcement Learning : An Introduction

Buy New

Rent Textbook

Used Textbook

eTextbook

Summary

Table of Contents

Reinforcement Learning : An Introduction

Buy New

Rent Textbook

Used Textbook

eTextbook

How Marketplace Works:

Summary

Table of Contents

Digital License