Aneo is present with various clients whose need for computing performance is significant, especially in finance and industry. We are interested in both functional and technological aspects to meet their requirements.
The search for performance in scientific and parallel computing is one of our main themes, and we try to remain as relevant as possible regarding numerous technologies. At Aneo, we have a team of about ten members with extensive experience in parallel architectures like Components off-the-Shelf (COTS). These processors include Intel or AMD x86 multicore, IBM POWER architecture, low-power ARM chips, NVIDIA or AMD GPUs, or Intel Xeon Phi. Our expertise in compilation, vectorization, or Simultaneous Multithreading (SMT) has led us to encounter various languages and tools. With the aim of expanding this knowledge, we decided to explore FPGAs.
In a few words, what is an FPGA?
An FPGA, Field-Programmable Gate Array, or Programmable Logic Circuit, is a circuit that can be reprogrammed after its manufacturing. Unlike fixed hardware like COTS architectures, an FPGA is a flexible architecture composed of millions of logic gates, lookup tables, registers, arithmetic units, and an interconnect linking them. All these resources can reconfigure to generate logical and arithmetic operations and memory spaces to define a custom architecture for an application just before its execution on the FPGA. Traditionally, reconfiguring an FPGA required substantial design work and in-depth knowledge of specific hardware description languages such as VHDL or Verilog. Below is a schematic view of the three main components of this architecture.
Three reasons to talk about FPGAs today
FPGAs have been used in the industry for a long time, so why the recent interest, and in what context?
Several reasons led us to study this. First, in 2014, Altera proposed an SDK that aims to generate FPGA configuration, called design, from kernels written in OpenCL. OpenCL is a programming language that allows developing applications for heterogeneous architectures, with a C/C++ API. On GPUs, it is known for being one of the few alternatives to NVIDIA's CUDA language, which only addresses products from the same manufacturer. Thus, FPGA appears more attractive to the parallel and scientific computing community, thanks to the generality of a more familiar language that was already used for several commonly used architectures.
The second reason is that FPGA floating-point computation performance began to reach a promising level in 2016. Until then, it was well below what a GPU could provide. The use of FPGAs might democratize among many professionals looking for performance, provided it is suitable for their applications.
Finally, FPGA power consumption is much lower than that of other known components. At a time when the trend is to use more and more multiprocessors and parallel accelerators to seek more performance (both industries and research centers have systems using up to millions of computing cores), reducing the electricity bill and reducing energy dependencies can be beneficial.
In short, FPGAs are a bit trendy right now. Between Intel's acquisition of ALTERA in late 2015, collaboration with Xiliinx on the IBM POWER architecture, Microsoft and Amazon adding FPGAs to their Azure and EC2 clouds, and the Chinese company Baidu using them for machine learning in its (large) data centers... there is no shortage of reasons to be interested in the subject.
All these arguments alone could motivate us to conduct a study on FPGAs. However, in our case, these arguments are mainly triggers. Indeed, the engine behind this study is to enrich our ability to support our clients in areas related to HPC. Therefore, one of the natural questions that Aneo poses is the trade-off between performance gain and the cost of porting an application to this architecture.
Study scope
In this context, we partnered with companies Bittware and EMG2 and Altera. This allowed us to obtain temporary access to a machine with an FPGA Stratix V S5PH-Q D8 card and all the tools of the OpenCL SDK (version 14.1 at the time of the study). The main objective of the partnership was to understand the architecture through a high-level language and evaluate our ability to assist a client who wants to industrialize such a solution or support their skills development.
To achieve this, we selected a well-known application, AES 128-bit encryption. We started with a sequential C implementation, then ported it to GPU in CUDA, then to OpenCL. We then ported this implementation to FPGA. We decided to start from a natural implementation and not use already optimized implementations for a given architecture.
Our study was conducted during the summer of 2015. Since then, a new generation of FPGAs has arrived, and floating-point computing capabilities continue to improve. However, we preferred to focus on an application that could best take advantage of FPGA strengths. Therefore, we chose an application using integer calculations. Additionally, the study is more interested in understanding the SDK and the user experience from the perspective of difficulties, prospects, and methodologies, rather than raw performance, as we are not FPGA experts.
Our study is covered by a series of posts addressing the following points:
- FPGA architecture and integration
- GPU/CUDA to FPGA/OpenCL conversion
- Introduction to the Altera OpenCL SDK
- AES presentation and CPU to GPU porting
- Single Work-Item FPGA Design
- Vectorized FPGA Design
- Constant Memory
- Perspectives
We hope this article has sparked your interest in FPGAs, and we look forward to discussing its architecture and various integration possibilities in more detail soon.