# Low-Power Bus Transform Coding for Multilevel Signals 

Fakhrul Zaman Rokhani and Gerald E. Sobelman<br>Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN 55455 USA<br>Email: \{rokh0001, sobelman\}@ umn.edu


#### Abstract

In this paper, we propose a novel extension of BusInvert coding to handle 4-level pulse amplitude modulated (PAM-4) signals. A generalized mathematical model for energy consumption and energy dissipation for PAM-4 signals is presented and a family of coding schemes are developed that can reduce the average power consumption and dissipation by up to $54 \%$ compared to un-coded PAM-4 buses. This technique is attractive for high-speed data transmission systems that employ PAM-4 signals on general high capacitance buses such as global wires, off-chip buses, I/O and backplanes.


## I. Introduction

The need for high communication bandwidth and high speed on wires has led to the widespread use of multilevel signals on point-to-point parallel links in various applications [2, 3, 4, 5, 6]. In [5], PAM-5 together with differential signal lines was adopted in gigabit Ethernet ( 1000 Base-T). Work in [2, 3, 4] showed the use of PAM-4 in a backplane system. In [6], PAM-8 was used to provide high-speed transceiver in a future hard disk channel IC. In the multi-valued logic discipline, data is sent over a bus using a quaternary or radix-4 number system (equivalent to PAM-4 signal) to provide high speed communication between modules. This has been applied in a DSP processor [7], on data and address buses in an SOC application [8] and for on-chip data buses [9].

An important component of power consumption in microprocessors involves the transmission of data through high capacitance busses. Many power saving techniques have been published including low-swing signaling [10], charge recycling [11], frequency scaling and data coding [12-18]. From the general dynamic power equation, we know that reducing clock frequency will result in a linear reduction of power but at the expense of throughput. Lowering the voltage has a quadratic effect and is thus an efficient way of reducing the dynamic power. To even further decrease the switching power, lowering the activity factor by using low power coding schemes together with low voltage signaling is a useful avenue of investigation.

Coding has emerged as a promising solution to power, delay, and reliability problems in buses. Previous work in this area includes coding for low-power buses through self [12] and coupling [13, 14] transition activity reduction, delay reduction $[15,16]$ and improved reliability in low-swing buses [17, 18]. References [19-23] investigated methods to improve bit error
rate (BER) by coding PAM-4 signals on parallel buses while work in [24] focused on serial links.

In this paper, we introduce novel low power coding schemes to handle multilevel signals by reducing the transitions on high capacitance bus lines at no cost in communication throughput. The required encoder and decoder structures are also presented. Our techniques may be applied to chip-to-chip and multi-valued logic communications where power is a major constraint. To the best of our knowledge, there has been no previous work on reducing dynamic power by minimizing transitions on multi-level coded buses.

## II. PAM-4 TRANSMITTER

The PAM-4 transmitter and receiver circuits used in this paper are based on proposal in [1]. The relation between digital data, voltage levels and corresponding PAM-4 symbols for the circuits are summarized in Table1.

Table 1. Digital Data, Voltage Level and Corresponding PAM-4 SYMBOL (ASSUMING VDD $=1.5 \mathrm{~V}$ )

| Digital data <br> (bit $_{\mathbf{1}}$ bit $_{\mathbf{2}}$ ) | Voltage <br> Level | PAM-4 <br> symbol |
| :---: | :---: | :---: |
| 00 | 1.5 V | +3 |
| 01 | 1.0 V | +1 |
| 10 | 0.5 V | -1 |
| 11 | 0 V | -3 |

Given this mapping, we can then derive the dynamic power during a transition. Power drawn from the supply voltage (also known as power consumption) and power dissipated in the circuit as a result of charging and discharging the line capacitance can be described in equation (1) and equation (2), respectively.

$$
\begin{gather*}
P_{\text {drawn }}= \begin{cases}C_{L} V_{s}\left[V_{f}-V_{i}\right] f_{c l k}, & \text { if } V_{f}>V_{i} \\
0 & \text { if } V_{f}<V_{i}\end{cases}  \tag{1}\\
P_{\text {dissipated }}= \begin{cases}C_{L}\left(V_{s}\left[V_{f}-V_{i}\right]-\frac{\left[V_{f}^{2}-V_{i}^{2}\right]}{2}\right) f_{c l k} & , \text { if } V_{f}>V_{i} \\
-\frac{C_{L}}{2}\left[V_{f}^{2}-V_{i}^{2}\right] f_{\text {clk }} & \text { if } V_{f}<V_{i}\end{cases} \tag{2}
\end{gather*}
$$

In both equations, power is clearly a function of load capacitance $\left(C_{L}\right)$, voltage supply $\left(V_{S}\right)$, initial value of voltage before transition took place $\left(V_{i}\right)$ and final voltage value after the transition $\left(V_{f}\right)$ and clock frequency $\left(f_{c l k}\right)$. Here we define the activity factor $\alpha_{\mathrm{j}, \mathrm{k}}$ to denote the average fraction of clock cycles in which each possible transition (i.e. symbol $j$ at time- $t$ to symbol $k$ at time- $(t+1)$ ) occurs Since we have 4 different symbols in a PAM-4 signal, there are $4^{2}$ possible transitions which lead to 16 different activity factors.

Based on equations (1) and (2), we constructed a $4 \times 4$ matrix of generalized energy consumption values (in Table 2) and energy dissipation constants (in Table 3) for the 16 possible transitions between symbols of a PAM-4 signal. For example, a transition from symbol -3 $\left(V_{i}=0 \mathrm{~V}\right)$ to symbol +3 ( $V_{f}=\mathrm{V}_{\mathrm{DD}}$ ) will consume an energy of $1 \mathrm{xC}_{\mathrm{L}} \mathrm{V}_{\mathrm{DD}}$ and dissipate $1 / 2 \mathrm{xC}_{\mathrm{L}} \mathrm{V}_{\mathrm{DD}}$ of energy.

To calculate the total power of the communication, we just need to sum all activity factors multiplied by the corresponding energy constant and the clock frequency:

$$
\begin{equation*}
P=\sum_{\text {for all } \mathrm{j}, \mathrm{k}} \alpha_{\mathrm{j}, \mathrm{k}} E_{j, k} \cdot f_{\text {clk }} \tag{3}
\end{equation*}
$$

TABLE 2. ENERGY CONSUMPTION MATRIX ( $\mathrm{X} \mathrm{C}_{\mathrm{L}} \mathrm{V}_{\mathrm{DD}}{ }^{2}$ )

| Symbol |  | $k$ at time- $(t+1)$ |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: |
|  |  | -3 | -1 | +1 | +3 |
|  | -3 | 0 | 1/9 | 4/9 | 1 |
|  | -1 | 0 | 0 | 2/9 | 6/9 |
|  | +1 | 0 | 0 | 0 | 3/9 |
|  | +3 | 0 | 0 | 0 | 0 |

TABLE 3. ENERGY DISSIPATION MATRIX $\left(\mathrm{X} \mathrm{C}_{\mathrm{L}} \mathrm{V}_{\mathrm{DD}}{ }^{2}\right)$

| Symbol |  | $k$ at time- $(t+1)$ |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: |
|  |  | -3 | -1 | +1 | +3 |
|  | -3 | 0 | 1/18 | 2/9 | 1/2 |
|  | -1 | 1/18 | 0 | 1/18 | 2/9 |
|  | +1 | 2/9 | 3/18 | 0 | 1/18 |
|  | +3 | 1/2 | 4/9 | 5/18 | 0 |

## III. Bus Transform Coding Method

Recall that bus-invert coding [1] is a transition-count based method. We consider an $L$-wire non-multiplexed bus. Businvert coding counts $H_{d}$, the Hamming distance between the next data value and the present bus value (not including the invert line). The idea is that if $H_{d}$ is larger than $L / 2$, then we would form the one's complement of the next data value and send that on the bus. In addition, the invert line is set to 1 . Otherwise, we place the original form of the next data value on the bus and set the invert line to 0 . One redundant bit, the invert line, is needed to distinguish between the original and inverted patterns on the bus. In this way, the receiver can always determine the original data value by performing another one's complement operation on the received data bus lines if the invert line is observed to be a 1 .

For multilevel signals, the set of possible transformations is much richer than for bus-invert coding. For example, for the case of PAM-4 there are many possible mappings of 4 voltage levels into the same set. In order to keep track of which change is being made during any given time slot, the invert line of businvert coding is generalized to a signal line, which will carry its own multilevel value. In the above example, we would append a PAM-4 signal line to indicate to the receiver what transformation has been made on the data line in any given time slot. In the case of PAM-4, we arbitrarily initialize the value on the signal line to -3 . Its subsequent values will depend on the series of transformations that we make to the data lines.

The objective of bus transform coding is to find a beneficial mapping of the values on the bus, i.e. one in which the sum of the voltage swings (i.e. activity factors $\alpha_{\mathrm{j}, \mathrm{k}}$ ) is minimized between two adjacent time slots. In the case of PAM-4 signals, there are a total of 16 possible transition pairs which can occur: 4 of these cases are the non-transition lines, in which the symbol on the line does not change. We say that a nontransition line has a distance of 0 , since the absolute value of the difference in the symbols between the two slots is 0 . The other 12 cases are the transition lines. These are the cases in which the symbol on the line changes from any one of 4 symbols to one of the 3 other symbols. These transitions can have a distance of either 2 (e.g., a transition from +1 to -1 ), 4 (e.g., a transition from -3 to +1 ) or 6 (e.g., a transition from +3 to -3 ).

We will consider and evaluate three specific configurations (i.e., mappings) of bus transform coding for the case of PAM-4 signals. The corresponding transformations for each of these configurations are specified in Table 4. Configuration A encodes transitions with distance 0,2 (i.e. $+1 \rightarrow+3$ ), 4 (i.e. $1 \rightarrow+3$ ) and 6 (i.e. $-3 \rightarrow+3$ ). Configuration $B$ encodes only those transitions with a distance of 0 or 4 . Configuration C encodes transitions with distance 0 and 6 and 2 transitions of distance 4 (i.e. $-1 \rightarrow+3$ and $+1 \rightarrow-3$ ). These configurations are chosen based on the energy matrix in Table 2 and 3 which will give us highest energy gain when doing coding.

Table 4 Example of Mapping for Bus transform Coding

| Conf. | Transition line |  | Non-transition line |  | Signal line |
| :---: | :---: | :---: | :---: | :---: | :---: |
|  | Before coding | After coding | Before coding | After coding |  |
| A | $-3 \rightarrow+3$ | $-3 \rightarrow-3$ | $-3 \rightarrow-3$ | $-3 \rightarrow+3$ | $-3 \rightarrow-1$ |
|  | $-1 \rightarrow+3$ | $-1 \rightarrow-1$ | $-1 \rightarrow-1$ | $-1 \rightarrow+3$ |  |
|  | $+1 \rightarrow+3$ | $+1 \rightarrow+1$ | $+1 \rightarrow+1$ | $+1 \rightarrow+3$ |  |
| B | $-3 \rightarrow+1$ | $-3 \rightarrow-3$ | $-3 \rightarrow-3$ | $-3 \rightarrow+1$ | $-3 \rightarrow-1$ |
|  | $-1 \rightarrow+3$ | $-1 \rightarrow-1$ | $-1 \rightarrow-1$ | $-1 \rightarrow+3$ |  |
|  | $+1 \rightarrow-3$ | $+1 \rightarrow+1$ | $+1 \rightarrow+1$ | $+1 \rightarrow-3$ |  |
|  | $+3 \rightarrow-1$ | $+3 \rightarrow+3$ | $+3 \rightarrow+3$ | $+3 \rightarrow-1$ |  |
| C | $-3 \rightarrow+3$ | $-3 \rightarrow-3$ | $-3 \rightarrow-3$ | $-3 \rightarrow+3$ | $-3 \rightarrow-1$ |
|  | $-1 \rightarrow+3$ | $-1 \rightarrow-1$ | $-1 \rightarrow-1$ | $-1 \rightarrow+3$ |  |
|  | $+1 \rightarrow-3$ | $+1 \rightarrow+1$ | $+1 \rightarrow+1$ | +1 $\rightarrow$-3 |  |
|  | $+3 \rightarrow-3$ | $+3 \rightarrow+3$ | $+3 \rightarrow+3$ | $+3 \rightarrow-3$ |  |

In Figures 1 and 2, the encoder and decoder schemes are shown. The input and output of the encoder/decoder are binary data going to/coming from the PAM-4 transmitter as proposed in [1]. The encoder/decoder is an $L$-wire non partitioned encoder/decoder which consists of L-internal circuits, one for each wire. Each internal circuit has a delay element (D).


Figure 1 General block diagram of encoder circuit
For encoder circuit, transition/non-transition detector is simply to indicate activity of mapping transition on each wire. For example in the Configuration C scheme, if there is a transition of either $-3 \rightarrow+3,-1 \rightarrow+3,+1 \rightarrow-3$ or $+3 \rightarrow-3$, the majority voter and decision logic will be acknowledged by the transition detector which allows the 4 -to- 1 mux to send the correct binary output and likewise for the signal line. For the decoder circuit, it is simply constructed to apply the inverse transformation of Table 4 based on the signal line data which controls the decision logic circuit.


## IV. Simulation Results

The encoder-decoder scheme with the transformations described in Section III was simulated using C programming and the results are compared with un-coded PAM-4 signals. 10 different files (JPEG, MPEG EXE, WAV and TXT files), each of size 2 Mbits were used to estimate the possible power savings that can be achieved. Finally, simulations were run for 1, 2, 4 and 8 -wire buses. Figure 3 shows the percentage of power savings when the coding technique is used, which is equal to,

$$
\begin{equation*}
100\left(1-\frac{P_{c}}{P_{o}}\right) \% \tag{4}
\end{equation*}
$$

Here, $P_{c}$ is the total power consumed/dissipated when the coding was used and $P_{o}$ is the total power consumed/dissipated without coding. The power is calculated using Equations (1) and (2). Interestingly, the value of power consumption is the same as value power dissipation in the simulations. In this initial research, the power consumption of the encoder and decoder circuits was ignored. This approximation remains valid for very high capacitance busses and for buses having a small number of wires, in which case the complexity of the circuits is low. The results presented here however, provide a bound on the energy savings.

The improvement from using Bus Transform coding decreases as bus width increases. In order to address this problem, partitioning a wider bus into several narrower subbuses may be applied. In this case, each partition has its own corresponding signal line. The impact on power savings of using different sizes partitions on an 8-wire bus is shown in Figure 4.

## V. Conclusions

In this paper, the general concept of bus transform coding for multilevel signal has been introduced and applied to the particular example of PAM-4. We developed generalized dynamic power equations to capture the impact of different possible transitions (activity factors) in PAM-4 signals. We then showed possible transformations based on the selection of the activity factors. Encoder and decoder implementations were also proposed. Simulation results have shown that this technique can reduce power consumption and dissipation up to $54 \%$ and can be applied to other types of multilevel signals. In our future work, we plan to investigate other coding configurations and other types of multilevel signals.


Figure 3 Power savings from using different transformations


Figure 4. Effect of partitioning on power savings.

## References

[1] U.Avci, and S. Tiwari, "A novel compact circuit for 4-PAM energyefficient high speed interconnect data transmission and reception," Microelectronic journal, Vol. 36, pp. 67-75, Jan 2005.
[2] J.L. Zerbe et al. "A $2 \mathrm{~Gb} / \mathrm{s} /$ pin 4-PAM parallel bus interface with transmit crosstalk cancellation, equalization, and integrating receivers," IEEE International Solid-State Circuits Conference, pp. 66-67, 5-7 Feb. 2001.
[3] H.-J. Liaw, G.-J. Yeh, P.S. Chau, and G. Pitner, "A 1.6 Gbit/s/pin Multilevel Parallel Interconnection," 2001 High-Performance System Design Conference, http://www.rambus.com/downloads/DC2001_qrsl.pdf.
[4] J.L. Zerbe, P.S. Chau, C.W. Werner, T.P. Thrush, H.J. Liaw, B.W. Garlepp, K.S. Donnelly, "1.6 Gb/s/pin 4-PAM signaling and circuits for
a multidrop bus," IEEE Journal of Solid-State Circuits, Vol. 36, Issue 5, pp. 752-760, May 2001.
[5] IEEE P802.3ab, http://www.ieee802.org/3/ab/index.html
[6] J. Park, R. Sun, L.R. Carley, and C.P. Yue, "A 10-Gbps, 8-PAM parallel interface with crosstalk cancellation for future hard disk drive channel ICs," IEEE International Symposium on Circuits and Systems, pp. 11621165, 23-26 May 2005.
[7] I.B. Dhaou, E. Dubrova, and H. Tenhunen, "Power efficient intermodule communication for digit-serial DSP architectures in deepsubmicron technology," Proceedings of $31^{\text {st }}$ IEEE International Symposium on Multiple-Valued Logic, pp. 61-66, 22-24 May 2001.
[8] E. Ozer, R. Sendag, and D. Gregg, "Multiple-Valued Logic Buses for Reducing Bus Size, Transitions and Power in Deep Submicron Technologies," Advanced Networking and Communications Hardware Workshop (ANCHOR), June 2005.
[9] A. Mochizuki, T. Takeuchi, and T. Hanyu, "Intrachip Address-Presetting Data-Transfer Scheme Using Four-Valued Encoding," Proc. of the 34th International Symposium on Multiple-Valued Logic, pp. 192-197, 19-22 May 2004.
[10] H. Zhang, V. George, and J.M. Rabaey, "Low-swing on-chip signaling techniques: effectiveness and robustness," IEEE Trans on VLSI Systems, Vol. 8, Issue 3, pp. 264-272, June 2000.
[11] H. Yamauchi, H. Akamatsu, and T. Fujita, "An asymptotically zero power charge-recycling bus architecture for battery-operated ultrahigh data rate ULSI's," IEEE Journal Solid-State Circuits, pp. 423-431, April 1995.
[12] M.R. Stan, and W.P. Burleson, "Bus-Invert Coding for Low Power I/O," IEEE Trans on VLSI Systems, Vol. 3, No. 1, pp. 49-57, March 1995.
[13] Y. Zhang, J. Lach, K. Skadron, and M.R. Stan, "Odd/Even bus invert with two-phase transfer for buses with coupling," International Symposium on Low Power Electronics and Design, pp. 80-83, 2002.
[14] P.P. Sotiriadis, and A.P. Chandrakasan, "Bus energy reduction by transition pattern coding using a detailed deep submicrometer bus model," IEEE Trans on Circuits and Systems I: Fundamental Theory and Applications, Vol. 50, Issue 10, pp. 1280-1295, Oct. 2003.
[15] T. K. Konstantakopoulos, "Implementation of Delay and Power Reduction in Deep Sub-Micron Buses using Coding," Master's thesis, MIT, May 2002
[16] S. R. Sridhara, A. Ahmed, and N. R. Shanbhag, "Area and energyefficient crosstalk avoidance codes for on-chip buses," Proc. of IEEE International Conf. on Computer Design, pp. 12-17, 11-13 Oct. 2004.
[17] D. Bertozzi, and L. Benini, G.D. Micheli, "Error Control Schemes for On-Chip Communication Links: The Energy-Reliability Tradeoff," IEEE Trans on CAD Systems, Vol. 24, No. 6, June 2005
[18] R. Hegde, and N. R. Shanbhag, "Toward achieving energy efficiency in the presence of deep submicron noise," IEEE Trans. On VLSI Syst., Vol. 8, pp. 379-391, Aug. 2000.
[19] K. Farzan, and D.A. Johns, "A power-efficient 4-PAM signaling scheme with convolutional encoder in space for chip-to-chip communication," Proceeding of the 30th European Solid-State Circuits Conference, pp. 315-318, 21-23 Sept. 2004.
[20] K. Farzan, and D.A. Johns, "A low-complexity power-efficient signaling scheme for chip-to-chip communication," International Symposium on Circuits and Systems, Vol. 5, pp. 77-80, 25-28 May 2003.
[21] K. Farzan, and D.A. Johns, "Power efficient chip-to-chip signaling schemes," IEEE International Symposium on Circuits and Systems, Vol. 2, pp. 560-563, 26-29 May 2002.
[22] K. Farzan, and D.A. Johns, "A low-power crosstalk-insensitive signaling scheme for chip-to-chip communication," International Symposium on Circuits and Systems, Vol. 4, pp. 441-444, 23-26 May 2004
[23] K. Farzan, "Space Coding Applied to High-Speed Chip-to-Chip Interconnects," PhD thesis, University of Toronto, 2004.
[24] A.G. Bessios, W.F. Stonecypher, A. Agarwal, J.L. Zerbe, "TransitionLimiting Codes for 4-PAM Signaling in High Speed Serial Links," IEEE GLOBECOM, Dec. 2003, pp. 3747-3751.

