DescriptionThe recent generation of high-end FPGAs provide a high computational capability, and energy-efficient makes them attractive in high-performance computation and architecture research thanks to their reconfigurability. Exploring the implementation of a high-bandwidth accelerator on these large FPGAs presents a unique challenge in how to best utilize the available variety of computing resources to achieve the maximum performance. In this work, we implemented Vortex, a full-scale PCIe-based GPGPU accelerator on modern high-end Intel FPGAs. Vortex implements the RISC-V ISA with an extension to support Single-Instruction Multiple-threads (SIMT) execution model. Vortex implements a multi-core architecture with high-bandwidth fully pipelined non-blocking caches, and a scratchpad shared memory to achieve maximum throughput. We leveraged the hardened Integer arithmetic and Floating-Point DSPs available on the FPGA to achieve maximum computational efficiency. The Vortex platform is highly customizable and scalable with a complete open-source compiler, driver, and runtime software stack with OpenCL support to enable research in GPU architectures. We managed to fit a 16- core processor configuration with high-bandwidth caches on Intel Arria10 FPGA, clocking at 203-234 MHz, making Vortex a practical framework for GPU hardware research.