Guide To TensorForce: A TensorFlow-based Reinforcement Learning Framework
TensorForce is an open-source library for Reinforcement Learning, built on the top of the TensorFlow library. Python 3 is required for leveraging this deep RL framework. It is currently maintained by Alexander Kuhnle while its 0.4.2 and earlier versions were jointly introduced by Alexander Kuhnle, Michael Schaarschmidt and Kai Fricke.
A brief introduction to Tensorforce and several such RL frameworks can be found in this article.
Highlighting Features of TensorForce
- It supports TensorBoard.
- It supports a wide range of neural network layers such as 1D and 2D convolutions, fully connected (FC) layers, pooling, embeddings and so on.
- It enables usage of various optimization algorithms – Adam, RMSProp, AdaDelta and optimizer based on natural-gradient, to name a few.
- It also supports L2 and entropy techniques of regularization.
- It allows parallel execution of multiple RL environments.
- It supports random replay memory and batch buffer memory.
What distinguishes TensorForce from similar RL libraries?
- The whole RL logic of TensorForce is implemented using TensorFlow to enable deployment of TensorFlow-based models and employing portable computation graphs without requiring application programming language.
- The modular design of the library has been made as easy as possible to apply and configure for general applications.
- RL algorithms applied using the library are independent of the virtual agent’s interaction with the environment as well as the nature of input states and output actions.
Practical implementation
Here’s a demonstration of creating an RL environment and agent for a temperature-controller using TensorForce. The thermostat environment comprises a room having a heater. When the heater is switched on, room temperature will reach 1.0 and when it’s turned off, the temperature drops to 0.0. The exponential heat decay constant ‘tau’ determines how fast the heater’s temperature reaches 0.0 or 1.0. The change in temperature is computed as:
temp[i + 1] = h[i] + (temp[i] – h[i]) * exp(-1/tau) …(i)
where,
temp[i] denotes temperature (between 0 and 1) at ith timestamp
h[i] represents applied heater state (0 or 1)
The code has been implemented using Google colab with Python 3.7.10 and Tensorforce 0.6.3 versions. Step-wise explanation of the code is as follows:
- Install tensorforce
!pip install tensorforce
- Import required libraries
import pandas as pd import matplotlib.pyplot as plt import numpy as np import math from tensorforce.environments import Environment from tensorforce.agents import Agent
- Calculate response for current temperature and given action
def respond(ac, curr_temp, tau): return ac + (curr_temp - ac) * math.exp(-1.0/tau)
- Define a series of actions (1:on, 0:off)
act = pd.Series(np.array([1,1,1,1,1,1,1,1,0,1,0,0,0,0,0,1,0,0,0,0]))
- Initialize array for responses with zeros
resp = np.zeros(act.size)
Update this array with response to each action
for i in range(act.size): #for 1st action, last response will be 0 (‘off’) if i == 0: lastResp = 0 """ for next attempts, record previous response and update latest response by calling #respond() with current action, last response and tau value as parameters """ else: lastResp = resp[i - 1] resp[i] = respond(act[i], lastResp, 2.0)
- Create dataframe of actions and corresponding responses
df = pd.DataFrame(list(zip(act, resp)), columns=['Action', 'Response'])
Sample condensed data frame:
Plot the actions and responses.
df.plot()
- Create a reward function using which the agent tries to keep the temperature in [0.4,0.6] range.
def reward(temperature): delta = abs(temperature - 0.5) #if the temperature in [0.4,0.6] range, set the reward to 0 if delta < 0.1: return 0.0 “”” If it’s not in the range, the agent sets the reward as the negative distance of the temperature from the nearest end of the range e.g. if the temperature is 0.7, it is nearer to 0.6 than 0.4; the difference between 0.7 and 0. Is 0.1 so the reward is set to -0.1. If the temperature is 0.35, it is nearer to 0.4 than 0.6; the difference between 0.4 and 0.35 is 0.05 so the reward is set to -0.05 “”” else: return -delta + 0.1
Create a list of temperatures from 0.0 to 1.0.
tmp = [t * 0.01 for t in range(100)]
Compute reward for each temperature value
rew = [reward(t) for t in tmp]
Plot temperature vs. reward graph
fig=plt.figure(figsize=(12, 4)) plt.scatter(tmp, rew) plt.xlabel('Temp') plt.ylabel('Reward') plt.title('Reward vs. Temp')
Output:
- Create a class defining thermostat environment
class TSEnv(Environment): def __init__(self): #Initialize tau and current temperature self.tau = 3.0 self.curr_temp = np.random.random(size=(1,)) super().__init__() """ Define a function for state of the heater with minimum and maximum temperatures specified as 0.0 and 1.0 respectively """ def states(self): return dict(type='float', shape=(1,), min_value=0.0, max_value=1.0) #Define a function to specify action (0:off, 1:on) def actions(self): return dict(type='int', num_values=2) #Define a function to set the heater’s state def reset(self): self.timestep = 0 self.curr_temp = np.random.random(size=(1,)) return self.curr_temp #Define a function for agent’s response to the action. def response(self, action): return action + (self.curr_temp - action) * math.exp(-1.0 / self.tau) #Compute reward using the same logic as done in step (7) def reward_compute(self): delta = abs(self.curr_temp - 0.5) if delta < 0.1: return 0.0 else: return -delta[0] + 0.1 #Define a function to execute the action def execute(self, act): # Check the action (whether heater is on or off) assert act == 0 or act == 1 #Advance the environment b one step self.timestep += 1 # Update current temperature according to the agent’s response self.current_temp = self.response(actions) #Calculate the reward reward = self.reward_compute() terminal = False #episode is not over #return the current temperature and computed reward return self.curr_temp, terminal, reward
- Create environment by specifying the thermostat environment class defined above and the maximum number of timestamps in each episode
environment = Environment.create( environment=TSEnv, max_episode_timesteps=150)
- Configure an agent to learn responding in the thermostat environment
ag = Agent.create( agent='tensorforce', environment=environment, update=64, optimizer=dict(optimizer='adam', learning_rate=1e-3), objective='policy_gradient', reward_estimation=dict(horizon=1) )
- Train the agent for 150 episodes
for _ in range(150): #reset the environment first states = environment.reset() terminal = False #while the episode is not over while not terminal: #record agent’s action on the heater’s current state act = agent.act(states=states) #execute the agent’s actions states, terminal, rew = environment.execute(actions=act) """ act() method should be followed by observe() which observes the computed reward and checks whether the temperature has reached a terminal state """ agent.observe(terminal=terminal, reward=rew)
- Check the trained agent’s performance
#Reset the environment environment.reset() #Initialize the current temperature, state and terminal environment.curr_temp = np.array([1.0]) states = environment.curr_temp intr = agent.initial_internals() terminal = False #Run one episode temperature = [environment.curr_temp[0]] #Till the episode is not over while not terminal: #Let the agent perform action on the current state ac, internals = agent.act(states=states, internals=intr, independent=True) #Execute agents action and record rewars states, terminal, reward = environment.execute(actions=ac) temperature += [states[0]] #Plot the agent’s response plt.figure(figsize=(12, 4)) ax=plt.subplot() #Limits of temperature ax.set_ylim([0.0, 1.0]) #plot the temperature plt.plot(range(len(temperature)), temperature) #Draw red lines at temperatures 0.4 and 0.6 to see if temperature #remains in the [0.4,0.6] range plt.hlines(y=0.4, xmin=0, xmax=149, color='r') plt.hlines(y=0.6, xmin=0, xmax=149, color='r') plt.xlabel('Timestep') #X-axis label plt.ylabel('Temperature') #Y-axis label plt.title('Temperature vs. Timestep') #Title of the plot plt.show() #Display the plot
Output:
The output plot shows that the agent keeps the temperature in the [0.4,0.6] range (shown in blue).
- Code source
- Google colab notebook of the above implementation
References
Refer to the following sources for detailed information on Tensorforce:
The post Guide To TensorForce: A TensorFlow-based Reinforcement Learning Framework appeared first on Analytics India Magazine.