Usage
=====

.. _run_samples:

1. Defining nonlinear dynamics
------------------------------

The dynamics is defined in ``repr_control/define_problem.py``. 
The following items should be defined:

- Dynamics
- Reward function
- Initial distributions
- State and action bounds
- Maximum rollout steps
- Noise level
  
The following is a detailed instruction on how to define the stochastic inverted pendulum dynamics. 

The pendulum dynamics is:

.. math::

   \ddot \theta = \frac{3g}{2l}\sin\theta + \frac{3}{ml^2} T

where :math:`\theta` is the angle, :math:`g` is the gravity constant, :math:`m` is the pendulum mass, :math:`l` is the pendulum length, and :math:`T` is the input torque.
To deal with the unbounded :math:`\theta`, The observation is defined as :math:`[\cos\theta,\sin\theta, \dot \theta]`.

We use euler discretization, combined with the stochastic dynamics,

.. math::

   x' = f(x, u)\Delta t + \epsilon

where :math:`f` is the continuous time nonlinear dynamics, and :math:`\epsilon\sim \mathcal N(0, \sigma^2 I_n)`.


Define problem related constants
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

   .. literalinclude:: define_problem.py
      :language: python
      :lines: 5-18
      :linenos:

   The following constants are defined:

   +--------------+--------------------+----------------------------------------------------------------------------+
   | variable     | format             | meaning                                                                    |
   +==============+====================+============================================================================+
   | state_dim    | int                | state dimension                                                            |
   +--------------+--------------------+----------------------------------------------------------------------------+
   | action_dim   | int                | action dimension                                                           |
   +--------------+--------------------+----------------------------------------------------------------------------+
   | state_range  | [list, list]       | state upper and lower bounds. Sampling will be reset if bound is achieved. |
   +--------------+--------------------+----------------------------------------------------------------------------+
   | action_range | [list, list]       | action upper and lower bounds.                                             |
   +--------------+--------------------+----------------------------------------------------------------------------+
   | max_step     | int                | maximum step per episode.                                                  |
   +--------------+--------------------+----------------------------------------------------------------------------+
   | sigma        | float              | Gaussian noise variance :math:`\sigma^2`.                                  |
   +--------------+--------------------+----------------------------------------------------------------------------+
   | env_name     | str                | Name of the dynamics                                                       |
   +--------------+--------------------+----------------------------------------------------------------------------+

Define dynamics and reward functions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   
   Note that the dynamics must be written in ``pyTorch`` and all the inputs should be ``torch.Tensor``. 
   The dynamics must support **batch operations**, which means 
   the input ``torch.Tensor`` should be in shape ``[batch_size, state_dim]`` and ``[batch_size, action_dim]``.

   Define dynamics:

   .. literalinclude:: define_problem.py
      :language: python
      :lines: 23, 44-59
      :linenos:

   Define rewards:

   .. literalinclude:: define_problem.py
      :language: python
      :lines: 61, 75-79
      :linenos:

2. Start training
-----------------
The training can be started with a single line

.. code-block:: console

      $ python solve.py

Define training hyperparameters
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The hyper parameters can be set through command line arguments, for example 
  
.. code-block:: console

   $ python solve.py --max_timesteps 2e5 --rf_num 1024


The ``--max_timesteps 2e5`` means the total number of iterations is set to ``2e5``, and ``--rf_num 1024`` means the 
truncated finite dimension of random features are 1024. 

For all the hyperparameters can be tuned, run

.. code-block:: console

   $ python solve.py --help

- Experimental: vectorized solution:

.. code-block:: console

   $ python solve_vec.py --max_timesteps 2e5 --device cuda


use vectorized rollout and evaluation to speed up the training.
   

3. Monitoring and evaluating the training results
----------------------------------

After training starts, the results will look like

.. code-block:: console
   
   repr-control/
   ├── repr-control/
   │   ├── log/ 
   │   │   ├── rfsac/ 
   │   │   │   ├── seed_SEED_DATE-TIME          # folder title
   │   │   │   │   ├── summary/                 # save tensorboard summaries
   │   │   │   │   ├── best_actor.pth           # actor with the best evaluations
   │   │   │   │   ├── best_critic.pth          # critic with the best evaluations
   │   │   │   │   ├── last_actor.pth           # actor after all training steps
   │   │   │   │   ├── last_critic.pth          # critic after all training steps
   └── └── └── └── └── train_params.yaml        # training parameters

Run the follwoing script to evaluate the trained results,

.. code-block:: console

   $ python scripts/eval.py $LOG_PATH

where `$LOG_PATH` is the path of folder title ``seed_SEED_DATE-TIME``.

Monitoring the training process
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: console

   $ tensorboard --logdir $LOG_PATH

You can inspect the training process via tensorboard. 

.. note::

   Monitoring the training process is very helpful for tuning the hyperparameters. 
   Some rules of thumb if you don't have experience playing with the RL hyper parameteters:
   
   - If the value loss is too large, try to scale the rewards to be smaller (or increase the learning rate).
   - If the agent always get stuck, try to adapt the initial distriution to cover more of the state space.

Evaluating the training results: 

.. code-block:: console

   $ python scripts/eval.py $LOG_PATH

I placed a example results in the `examples` folder, you can run the following to see the results,

.. code-block:: console

   $ tensorboard --logdir ./examples/example_results/rfsac/Pendulum/seed_0_2024-07-18-14-50-35

.. code-block:: console

   $ python scripts/eval.py ./examples/example_results/rfsac/Pendulum/seed_0_2024-07-18-14-50-35


4. Use controller elsewhere
----------------------------

   Add the following line to your python code to load training results as a controller,

   .. code-block:: python

      import numpy as np
      from repr_control.scripts.eval import get_controller
      log_path = '$LOG_PATH'
      agent = get_controller(log_path)
   
   To generate control command from states,

   .. code-block:: python

      state = np.zeros([3]) # a sample state with all zero.
      action = agent.select_action(state, explore=False)