RL Quickstart
We explain how to initialize training a RL agent in
Train a RL agent
Here is a quick start of reinforcement leanring algorithm using soft actor-critic (SAC).
python solve.py --algo sac --env pendulum-v1
Then we delve deeper into how to initialze the actor-critic algorithm.
Create the environement
Environement is where the dynamics is and agent get samples from the environement.
1 env = gymnasium.make(args.env)
Create the replay buffer
Replay buffer is where the transition data is stored. Only off-policy algorithm needs replay buffer. Here off-policy means the policy used in sampling might be different from policy we are optimizing.
1 replay_buffer = buffer.ReplayBuffer(state_dim, action_dim, device=args.device)
Create the agent
1 if args.alg == "sac":
2 agent = sac_agent.SACAgent(**kwargs)
3 elif args.alg == 'rfsac':
4 agent = rfsac_agent.CustomModelRFSACAgent(dynamics_fn = dynamics, rewards_fn = rewards, **kwargs)
5 else:
6 raise NotImplementedError("Algorithm not implemented.")
Start training
1 for t in range(int(args.max_timesteps + args.start_timesteps)):
2
3 episode_timesteps += 1
4
5 # Select action randomly or according to policy
6 if t < args.start_timesteps:
7 action = env.action_space.sample()
8 else:
9 action = agent.select_action(state, explore=True)
10
11 # Perform action
12 next_state, reward, terminated, truncated, rollout_info = env.step(action)
13 done = terminated or truncated
14 replay_buffer.add(state, action, next_state, reward, done)
15
16 prev_state = np.copy(state)
17 state = next_state
18 episode_reward += reward
19 info = {}
20
21 if t >= args.start_timesteps:
22 info = agent.train(replay_buffer, batch_size=args.batch_size)