

# C-RAN Optimized 100 Gb/s Polar Code Decoder FGPA Demonstrator



Polaran LTD.

## Introduction

The rising demand for network capacity and increasing energy consumption of the fronthaul link has led to centralized Radio Access Network (RAN) architectures, named Cloud-RAN (C-RAN), where the centralized baseband unit (BBU) pools are connected to the remote radio head (RRH) units using fiber links. C-RAN aims to improve the energy efficiency of the fronthaul link and reduce the network management costs.

In order to provide both high throughput and reliable communication between RRH units and the BBU pool we propose a low-complexity and an energy efficient FEC solution based on polar coding. For a target SNR systematic polar codes are finely tuned by Bhattacharyya parameter based code design. Using an optimized code design, polar code encoders and decoders are developed with various block lengths and code rates to meet the requirements of C-RAN. The performance of the polar code chain is demonstrated on FPGA with real-time end-to-end tests for benchmarking and measurements. The measurements indicate that the 2048-length polar code decoder reaches 100 Gb/s throughput and approaches 10 dB coding gain.



Figure 1. C-RAN architecture

## Requirements

The FEC requirements for the fronthaul link of the C-RAN architecture are [1], [2]:

- Throughput  $\geq$  100 Gb/s
- Coding Gain  $\geq$  10 dB at 10<sup>-15</sup> BER
- Coding Overhead ≤ 20 %
- Energy Efficiency ≤ 40 pJ/b
   Power Discipation < 4 W/</li>
- **Power Dissipation** ≤ 4 W

## **Decoder Algorithm and Architecture**

The polar decoder employs a modified Successive Cancellation (SC) algorithm aided with Majority-Logic (MJL) decoding. Although SC is a low-complexity sequential decoding algorithm [3], the data dependencies of the intermediate calculation steps cause throughput and latency bottlenecks. For relaxing the data dependencies and making parallel hard decisions, we propose a complex but fully-parallel MJL algorithm. The integration of SC and MJL algorithms established a favorable tradeoff between complexity and throughput, when unrolled and pipelined implementation architecture is utilized. Due to the pipelined architecture the decoder can process a new codeword for every clock cycle and the throughput does not depend on the long latency of the SC decoder. In addition to that the unrolled processing modules perform limited number of operations to restrict the complexity of the decoder.

# N = 16 Decoder Architecture

In order to illustrate the decoder architecture, we show the data flow of an unrolled and pipelined (N=16) polar decoder in Fig 2. The F and G forward processing functions separate the polar code into two smaller codes. When the constituent code has 8-bits length, the MJL block is activated to decode 8-bits in parallel. The constituent codes are encoded with a partial sum update logic (PSUL) module and the systematic output bits are ready at the 6th clock cycle. Due to recursive structure, larger decoders can be designed using the same architecture.



Figure 2. Unrolled and pipelined data flow of the N=16 polar decoder

# **Performance Results**

The polar encoder and decoder modules are embedded in software and hardware simulation chain including BPSK modulation and AWGN channel modules. Fig. 3 shows the software simulation and FPGA implementation performance of the polar decoders. The implementation loss is negligible when the LLR values are represented with 6 bits. The theoretical performance of polar decoders is shown in Fig. 4. With respect to the BEC upper bound, 10 dB coding gain is achieved for only the (65536, 54614) decoder configuration as it reaches 10<sup>-15</sup> BER before 5 dB Eb/No.



Figure 3. Simulation performance



Figure 4. Theoretical performance

## **Implementation Results**

The polar decoder is implemented on Xilinx Kintex-7 FPGA for N=1024 block length. For N=2048, Xilinx Virtex-7 Ultrascale+ FPGA is utilized. The LLR bit precision is 6 bits for both configurations. Due to the pipelined architecture, the memory utilization of the decoder is significantly high. To reduce the memory footprint, the decoder uses the dedicated BRAM resources on FPGA. The implementation results show that all of the decoder configurations exceed 100 Gb/s throughput with the energy efficiency between 4.28 to 20.83 pJ/b.

 Table 1. Polar decoder latency and resource usage on FPGA

| Block<br>Length | Code<br>Rate | Latency<br>(Clock Cycle) | LUT     | FF      | BRAM |
|-----------------|--------------|--------------------------|---------|---------|------|
| 1024            | 5/6          | 168                      | 95,858  | 56,761  | 172  |
|                 | 15/16        | 76                       | 61,067  | 37,937  | 129  |
| 2048            | 5/6          | 256                      | 213,214 | 115,449 | 471  |
|                 | 15/16        | 118                      | 165,571 | 86,355  | 279  |

 Table 2. Polar decoder implementation results

| Block<br>Length | Code<br>Rate | Power<br>(W)* | Energy Efficiency (pJ/b)* | Clock<br>Frequency<br>(MHz) | Net Throughput (Gb/s) |
|-----------------|--------------|---------------|---------------------------|-----------------------------|-----------------------|
| 1024            | 5/6          | 0.71          | 7.06                      | 120                         | 102                   |
|                 | 15/16        | 0.48          | 4.28                      | 120                         | 115                   |
| 2048            | 5/6          | 2.08          | 20.83                     | 60                          | 102                   |
|                 | 15/16        | 1.06          | 9.41                      | 60                          | 115                   |

<sup>\*</sup> Power consumption is calculated using the gate-level netlist and scaled to 28nm technology. Power consumption of the clock tree is excluded.

## Conclusion

We proposed a family of polar decoders based on SC aided with MJL decoding algorithms. The decoders are implemented on FPGA for demonstration. The FPGA implementation results show that the decoders exceed 100 Gb/s throughput under 40 pJ/b energy efficiency and 20 % coding overhead constraints. Moreover, the theoretical performance analysis of the decoders show that the coding gain at 10<sup>-15</sup> BER is expected to be between 7 dB and 10 dB.

#### **Contact**

Altuğ Süral
Polaran Ltd.
Cyberpark, A-201
Bilkent, Ankara, 06800
Turkey
Email: altug.sural@polaran.com
Website: www.polaran.com
Phone: +90 312 265 02 24

#### References

- 1. G. Tzimpragos, C. Kachris, I. B. Djordjevic, M. Cvijetic, D. Soudris, and I. Tomkos, "A Survey on FEC Codes for 100G and Beyond Optical Networks," IEEE Communications Surveys & Tutorials, vol. 18, no. 1, pp. 209–221, 2014.
- 2. A. Checko, H. L. Christiansen, Y. Yan, L. Scolari, G. Kardaras, M. S. Berger, and L. Dittman, "Cloud RAN for Mobile Networks A Technology Overview," IEEE Communications Surveys and Tutorials, vol. 17, no. 1, pp. 405–426, 2015.
- 3. E. Arikan, "Channel Polarization: A Method for Constructing Capacity-Achieving Codes for Symmetric Binary-Input