# Presentation

Reinforcement Learning in Digital System Simulation

TimeMonday, December 6th10:30am - 12:00pm PST

Location3018

Event Type

Tutorial

Presented In-Person

DescriptionThis tutorial discusses Reinforcement Learning (RL) and its application in the Digital Simulation domain. We will demonstrate a working case study of an agent deriving optimal stimulus using RL principles for a simple digital system.

We start by introducing the fundamentals of a Reinforcement Learning (RL) Framework agnostic of any particular domain. This includes the definition of the Agent, Environment, Actions and Cumulative Reward. We define the Markov property of states that will allow us to use Markov Decision Process (MDP) techniques to arrive at an optimal solution. We explore in detail one such method (Q-Learning) used in our case study.

We rigorously show how Digital Simulation can be treated as an RL episodic task with a discrete state space and a finite set of actions. The Stimulus RL agent starts with no knowledge of the environment and the desired goal of the stimulus is expressed as the cumulative reward in the RL framework. We implement Q-Learning to derive a table to map an optimal action for each state of the Digital System.

For our hands-on case study we will use an open-source environment based on Verilator and Cocotb. Verilator is an open-source System Verilog to C++ cycle-based simulator. Cocotb is a COroutine based Cosimulation TestBench environment implemented in Python for verifying VHDL and SystemVerilog RTL. All the code runs in a docker container that can be downloaded from docker hub.

Our example is an N deep fifo and the goal of the stimulus agent is to alternate between filling and emptying the fifo as many times as possible in a 200 cycle interval. We derive a reward function that captures this goal. The set of actions for this design would be any combination of a fifo push or pop each clock cycle. The simulation will be run live to demonstrate how the agent learns optimal stimulus.

Lastly, we will tackle approaches to solving real world problems with large state spaces of the order of 10200. We demonstrate use of a Deep Q Network (DQN) to approximate this lookup. The advantage of having a test bench in python becomes apparent, since we leverage the PyTorch Machine Learning framework in building our DQN. We show how the DQN comes up with the same optimal policy as Q-Learning. We discuss possibilities for future work in this area, such as applying policy based methods and relaxing the strict Markov property requirement.

We start by introducing the fundamentals of a Reinforcement Learning (RL) Framework agnostic of any particular domain. This includes the definition of the Agent, Environment, Actions and Cumulative Reward. We define the Markov property of states that will allow us to use Markov Decision Process (MDP) techniques to arrive at an optimal solution. We explore in detail one such method (Q-Learning) used in our case study.

We rigorously show how Digital Simulation can be treated as an RL episodic task with a discrete state space and a finite set of actions. The Stimulus RL agent starts with no knowledge of the environment and the desired goal of the stimulus is expressed as the cumulative reward in the RL framework. We implement Q-Learning to derive a table to map an optimal action for each state of the Digital System.

For our hands-on case study we will use an open-source environment based on Verilator and Cocotb. Verilator is an open-source System Verilog to C++ cycle-based simulator. Cocotb is a COroutine based Cosimulation TestBench environment implemented in Python for verifying VHDL and SystemVerilog RTL. All the code runs in a docker container that can be downloaded from docker hub.

Our example is an N deep fifo and the goal of the stimulus agent is to alternate between filling and emptying the fifo as many times as possible in a 200 cycle interval. We derive a reward function that captures this goal. The set of actions for this design would be any combination of a fifo push or pop each clock cycle. The simulation will be run live to demonstrate how the agent learns optimal stimulus.

Lastly, we will tackle approaches to solving real world problems with large state spaces of the order of 10200. We demonstrate use of a Deep Q Network (DQN) to approximate this lookup. The advantage of having a test bench in python becomes apparent, since we leverage the PyTorch Machine Learning framework in building our DQN. We show how the DQN comes up with the same optimal policy as Q-Learning. We discuss possibilities for future work in this area, such as applying policy based methods and relaxing the strict Markov property requirement.