Building a 40G MACsec FPGA IP core Source: Paul Dillien
The deliverables for an encryption core are unlike other IP cores in some important respects.
I was writing this press release about a new Intellectual Property (IP) core from Algotronix, when it occurred to me that many FPGA designers may not fully realize what is involved in creating the cores they use every day.
You might think that the obvious starting point would be to download the specification of the function. I consider that the answer to this is both "Yes and No" -- "Yes" because clearly the core needs to comply with the standards if it is to be of any use, but "No" because this is not the real starting point, as I will explain.
Media Access Control Security, more commonly shortened to MACsec, is an extension of the Ethernet suite of standards that adds security. The MACsec specification runs to over 150 pages, but -- like most specifications -- it calls up other documents; for example, the AES-GCM standard used to secure and authenticate the messages.
This gives an indication as to why I think the real starting point is a deep understanding of FPGA architectures from all vendors combined with expertise at creating efficient encryption implementations. For example, there are significant elements in the encryption algorithm that can use either FPGA block memory or lookup tables (LUTs) in the programmable fabric. By careful design of the core, engineers are given the choice to use either resource to squeeze the core into a tight layout. However, to hit the 40G number, it's necessary to fine-tune the implementation. The memory access in some FPGA architectures is faster than others, so the generic core needs to have personalization options to support different target technologies. The current encryption core design incorporates many subtle tweaks that have evolved over years of refining the performance and pipelining.
In addition, the core designer must add the logic to complete the MACsec function. Originally, MACsec was conceived as a point-to-point technology for Metropolitan Area Networks. This meant that the transmitting station always used its unique encryption key, so the receiving system had confirmation of the message origin. This point-to-point limitation curtailed its usefulness in many applications and a subsequent addition allowed the transmitting station to use any one of multiple keys. (This is called Multi-SecY in the specification.) This significantly complicates the core design, but enhances the practicality of MACsec by allowing logical partitioning of the outputs. As an example, in a data center, each customer can be allocated with a unique key to allow their data to be secure from every other customer. This means that any rogue packets that arrive, either from a routing error or from a malicious attempt to intercept or alter data, are detected and rejected.
A further advantage is that the IP has been designed to offer an easy future upgrade to 100G. The core has been designed for the minimum change to the interfaces so that customers using the 40G core can smoothly migrate their equipment to the faster speed. The enhancement will be achieved by replacing the 40G AES-GCM encryption core with a 100G version.
While it would, no doubt, be theoretically possible to describe the MACsec core using C code, and to then take this through to an RTL design, the issue would be to simultaneously achieve the 40G throughput and to make the core efficient in terms of both resources used and power consumption. An additional complication is to ensure that the system is robust against tampering and attempts to glean information about the encryption key. This is partly addressed by features built into MACsec, but it also relies on all sensitive elements of the encryption core remaining discretely separate from access via device pins or JTAG. Remember that the two prime reasons for choosing a hardware-based solution are faster speed and higher security.
The most difficult and time-consuming element of creating an IP core is the simulation and verification of the design. This relies on a comprehensive testbench, which needs to confirm that the design works correctly for all types of valid inputs and provides the expected encrypted output along with the modified Ethernet header with the appendage of the authentication data. The operation for invalid inputs is equally important for MACsec, as the core collates statistics on aspects such as the number of packets that arrive out of order or using the incorrect key. Clearly the core must reject any invalid inputs, but it must also correctly categorize and record the reason for dropping the packet. This data is used by a System Administrator to ensure the integrity of the network.
Once the IP core has been designed and verified by the testbench, the next task is to create a data sheet. The most frequently asked question is: "What FPGA resources are consumed?" This is a very sensible question. The difficulty is that the answer depends on several variables. The key length can be either 128 or 256 bits, for example, and this can make around 30% difference to resource consumption. Some customers may want the point-to-multipoint option (multi-SecY), which affects the storage and logic requirements. Some users will want to store more, or perhaps fewer, statistics which also impacts the core's size. Another variable is the target technology and design software. Realistically, to achieve 40G requires a high end FPGA from either Xilinx or Altera, but slower MACsec versions are also available that fit a large number of devices. The design software (i.e., place-and-route) and effort settings have an influence on the final result, but this requires a fully populated design to evaluate because the MACsec core would not be the only logic on the chip. So the data sheet provides a guide, but "your mileage may vary," as they say (just ask your local VW dealer...)
The deliverables for an encryption core are unlike other IP cores in some important respects. For a start, not all customers are experts in the application of complex security products, so a lot of time can be spent by the core vendor discussing the minutia of the application to ensure the optimum security is achieved. However, users also need to ensure that the core has no malware or operating modes that might jeopardize security. As a result, Algotronix always ships source code. This gives customers full visibility (and a warm fuzzy feeling). Finally, once the customer has licensed the core, designers will need the same sort of support as any IP in achieving timing closure.
I hope this insight has shed some light on what's involved in building a complex intellectual property core. Customers leverage many years of expertise when they take a license and benefit from an efficient and optimized design.
| }
|