RL Quickstart

We explain how to initialize training a RL agent in

Train a RL agent

Here is a quick start of reinforcement leanring algorithm using soft actor-critic (SAC).

python solve.py --algo sac --env pendulum-v1

Then we delve deeper into how to initialze the actor-critic algorithm.

Create the environement

Environement is where the dynamics is and agent get samples from the environement.

        env = gymnasium.make(args.env)

Create the replay buffer

Replay buffer is where the transition data is stored. Only off-policy algorithm needs replay buffer. Here off-policy means the policy used in sampling might be different from policy we are optimizing.

    replay_buffer = buffer.ReplayBuffer(state_dim, action_dim, device=args.device)

Create the agent

    if args.alg == "sac":
        agent = sac_agent.SACAgent(**kwargs)
    elif args.alg == 'rfsac':
        agent = rfsac_agent.CustomModelRFSACAgent(dynamics_fn = dynamics, rewards_fn = rewards, **kwargs)
    else:
        raise NotImplementedError("Algorithm not implemented.")

Start training

    for t in range(int(args.max_timesteps + args.start_timesteps)):

        episode_timesteps += 1

        # Select action randomly or according to policy
        if t < args.start_timesteps:
            action = env.action_space.sample()
        else:
            action = agent.select_action(state, explore=True)

        # Perform action
        next_state, reward, terminated, truncated, rollout_info = env.step(action)
        done = terminated or truncated
        replay_buffer.add(state, action, next_state, reward, done)

        prev_state = np.copy(state)
        state = next_state
        episode_reward += reward
        info = {}

        if t >= args.start_timesteps:
            info = agent.train(replay_buffer, batch_size=args.batch_size)