.. _intro: RL Quickstart ============== We explain how to initialize training a RL agent in Train a RL agent ---------------- Here is a quick start of reinforcement leanring algorithm using `soft actor-critic (SAC) `_. .. code-block:: console python solve.py --algo sac --env pendulum-v1 Then we delve deeper into how to initialze the actor-critic algorithm. Create the environement ^^^^^^^^^^^^^^^^^^^^^^^ Environement is where the dynamics is and agent get samples from the environement. .. literalinclude:: ../../repr_control/solve.py :language: python :lines: 88 :linenos: Create the replay buffer ^^^^^^^^^^^^^^^^^^^^^^^^ Replay buffer is where the transition data is stored. Only off-policy algorithm needs replay buffer. Here off-policy means the policy used in sampling might be different from policy we are optimizing. .. literalinclude:: ../../repr_control/solve.py :language: python :lines: 94 :linenos: Create the agent ^^^^^^^^^^^^^^^^ .. literalinclude:: ../../repr_control/solve.py :language: python :lines: 104-109 :linenos: Start training ^^^^^^^^^^^^^^ .. literalinclude:: ../../repr_control/solve.py :language: python :lines: 135-156 :linenos: