RL Quickstart

We explain how to initialize training a RL agent in

Train a RL agent

Here is a quick start of reinforcement leanring algorithm using soft actor-critic (SAC).

python solve.py --algo sac --env pendulum-v1

Then we delve deeper into how to initialze the actor-critic algorithm.

Create the environement

Environement is where the dynamics is and agent get samples from the environement.

1        env = gymnasium.make(args.env)

Create the replay buffer

Replay buffer is where the transition data is stored. Only off-policy algorithm needs replay buffer. Here off-policy means the policy used in sampling might be different from policy we are optimizing.

1    replay_buffer = buffer.ReplayBuffer(state_dim, action_dim, device=args.device)

Create the agent

1    if args.alg == "sac":
2        agent = sac_agent.SACAgent(**kwargs)
3    elif args.alg == 'rfsac':
4        agent = rfsac_agent.CustomModelRFSACAgent(dynamics_fn = dynamics, rewards_fn = rewards, **kwargs)
5    else:
6        raise NotImplementedError("Algorithm not implemented.")

Start training

 1    for t in range(int(args.max_timesteps + args.start_timesteps)):
 2
 3        episode_timesteps += 1
 4
 5        # Select action randomly or according to policy
 6        if t < args.start_timesteps:
 7            action = env.action_space.sample()
 8        else:
 9            action = agent.select_action(state, explore=True)
10
11        # Perform action
12        next_state, reward, terminated, truncated, rollout_info = env.step(action)
13        done = terminated or truncated
14        replay_buffer.add(state, action, next_state, reward, done)
15
16        prev_state = np.copy(state)
17        state = next_state
18        episode_reward += reward
19        info = {}
20
21        if t >= args.start_timesteps:
22            info = agent.train(replay_buffer, batch_size=args.batch_size)