Uncovered Compiler Bugs

September 9, 2024

Uncovered Compiler Bugs: How Certora Identified a Critical Error in LLVM

Author:

John Toman

Editors:

During the activation of Aave V3 on ZKsync, Aave Labs, BGD Labs, Matter Labs, and Certora uncovered a critical optimization bug within the widely-used LLVM compiler – a vulnerability with the potential to drain funds from Aave's instance on ZKsync. Compiler bugs are notoriously tricky to identify, as many tools operate on higher-level programming code. In this blog, we’ll share the story of how we traced the issue back to its root cause.

Background

The problem was hidden in the code managing the UserConfigurationMap.. A user’s UserConfigurationMap records, for each reserve, whether the user borrows from that reserve or uses it as collateral. The actual “map” (self.data in the snippet) is actually a uint256 bitmask where sequential pairs of bits indicate the borrowing/collateral status for each reserve. Thus, the two least significant bits of the map record the collateral/borrowing status of the reserve with index 0, the next two significant bits the status of the reserve with index 1, and so on. Within each pair, the least significant bit represents the borrowing status, and the most significant bit represents the collateral status. Graphically then, the user configuration map is laid out like so, with the least significant bits to the right of the diagram:

Each color group represents the status of a single reserve; the c bit within each group represents the collateral status for that reserve, and the b bit represents the borrowing status.

The function setBorrowing updates the user’s UserConfigurationMap. A simplified version of its code is presented below.

function setBorrowing(
DataTypes.UserConfigurationMap storage self,
uint256 reserveIndex,
bool borrowing
  ) internal {
   uint256 bit = 1 << (reserveIndex << 1);
   if (borrowing) {
     self.data |= bit;
   } else {
     self.data &= ~bit;
   }
}

We can see that we first compute the bit for the borrowing status of the reserve with index reserveIndex. This is accomplished by shifting the binary digit 1 left two spaces for each reserve index (recall that reserveIndex << 1 is equivalent to reserveIndex * 2). After the shift operations, we have a bitmask whose sole set bit is at the location of the borrowing flag for the indicated reserve index. If we are requested to set the borrowing status (the borrowing argument) then we simply bitwise-or this bit with the existing configuration map; in other words we set the borrowing bit in self.data to 1, and leave all other bits untouched.

The case for clearing the borrowing status applies bitwise and to the configuration map (aka, self.data) and the bitwise not of the bit. Recall that bit is a bitmask with all 0s except at the index of the borrowing flag of the indicated reserve. Applying bitwise not will yield the opposite, a bitwise mask with all 1s except for a zero at the borrowing bit for the reserve. Applying this mask via bitwise and to the user map has the effect of leaving all status bits unchanged, with the exception of the relevant borrowing flag which is set to 0.

This process is visualized below using an example with a reserveIndex of 3 (the orange group in the above example):

The anomaly appeared during this clear operation: after clearing the borrowing bit for a reserve, it was observed that all status flags (both borrowing and collateral) in lower bits (i.e., those less significant) would actually be cleared. The implications are alarming: if a reserve is no longer marked as collateral for a user, their position could be deemed unhealthy, leading to liquidation. Even more concerning, if assets are not properly flagged as borrowed or collateral, the system might erroneously allow the withdrawal of collateral, potentially compromising the solvency of the entire system.

How to Find a Compiler Bug via Manual Inspection

The first order of business was to find a human-readable textual representation of the low-level bytecode that was being executed and causing this anomaly. Luckily, the ZKsolc compiler produces an assembly representation of the compiled bytecode as a normal part of the compilation.

Once we knew how to read the output of the compiler, and with the textual description of the assembly format, we could begin to search for the bytecode responsible. However, we did not immediately start reading the assembly of the affected Aave contract. Even finding the assembly code responsible for the setting/clearing of the borrowing bit would be like trying to find a needle in a haystack. We instead extracted the core of the code affected into a smaller, self-contained test contract:

contract Test {
	mapping(address => DataTypes.UserConfigurationMap) s;
	mapping(address => uint) loanAmt;
	function doIt(uint res, uint loanValue) external {
    	DataTypes.UserConfigurationMap storage ref = s[msg.sender];
    	UserConfiguration.setBorrowing(ref, res, loanAmt[msg.sender] == loanValue);
    	loanAmt[msg.sender] -= loanValue;
	}

	function borrow(uint res, uint loanAmount) external {
    	bool isFirstBorrowing = loanAmt[msg.sender] == 0;
    	loanAmt[msg.sender] += loanAmount;
    	if(isFirstBorrowing) {
        	UserConfiguration.setBorrowing(s[msg.sender], res, true);
    	}
	}

	function supply(uint res, uint amount) external {
    	UserConfiguration.setUsingAsCollateral(s[msg.sender], res, amount == 0);
	}
}

The actual logic for each function is meaningless; it is simply to exercise the UserConfigurationMap library in different ways.

After reading through the relatively short assembly, we found the following snippet:

Translating this into pseudocode, we have:

r3 := r2 << 1
r2 := 1 << r3
r3 = storage[r6]
r4 := 0 - 1
r2 := r4 ^ r2
r2 := r3 & r2
storage[r6] := r2

The first two operations look very much like the 1 << (reserveIndex << 1) we expect to see. The next few operations are somewhat tricky to interpret but ultimately perform a bitwise not. The 0 - 1 subtraction underflows to be a bitmask of all 1s, which is the xor’d with r2, which is the bitmask with a single one at the borrowing bit for reserveIndex. The result is thus an inversion of r2: all ones except for a zero at the borrowing index. Finally, this bitmask is applied to a value read out of storage (r3) and then written back into storage at the same location (r6).

This gives us a fingerprint to identify the “clear borrow flag” operation in the assembly of the original Aave contract. In particular, we then looked for shl.s 1, rX, rY as this likely corresponded to (reserveIndex << 1). To narrow the search down further, we also checked if this computation was roughly followed by this assembly pattern:

sload rA, rV
// compute bitmask rM with rY
rV := rM & rV
ssstore rA, rV

That is, the value that “looks like” computing the borrowing bit for a reserve was used to compute another bitmask that was combined via bitwise and to a value read out of storage.

This search yielded exactly one hit:

shl.s 1, r1, r1
rol @CPI0_60[0], r1, r1
sload r3, r2
and r2, r1, r1
sstore r3, r1

rol is a “rotating left shift”; unlike a regular shift left, bits that are shifted off the left end of the bitmask are shifted onto the right end of the bitmask, preserving order. For example, with a bitwidth of 8: rol 0b0010111, 1 would yield 0b00101110 (here and later 0b will denote a literal number in binary).

Bitshift

The operand being rotated @CPI0_60[0] is a reference to a constant 18446744073709551614. In hexadecimal, this is 0xFFFFFFFFFFFFFFFE which is 63 one bits, followed by a single 0 bit. We can now start to see how this bug took shape. If we were working with 64 registers, rotate left shifting this constant by the index of the borrowing bit for the reserve (i.e., reserveIndex << 1) has the same effect as left shifting the constant 1 and applying a bitwise not. In other words, instead of shifting a single one into place and then flipping all bits to create a “hole” for the bit we want to clear, we have pre-created the hole and then rotated it into the correct position. This process is visualized in our example from earlier (using a hypothetical bit-width of 10) and reserve index 3:

Here the six dark red bits were shifted off the left of the bitmask and placed on the right. When masked with the original flag, we can see that only the borrow flag for reserve index 3 is cleared.

Unfortunately, the ZkEVM does not operate on 10-bit or 64-bit registers, it uses 256-bit registers. This rotating left shift trick only works if, in the constant being rotate left shifted, all 256 bits are set to 1 except the least significant bit. In other words, we want 0b11111…11110. However, the constant is only 64-bits wide, so the remaining 192 upper bits are all implicitly filled with 0. So instead, the constant used in the bytecode is 0b00…0011…110. When this mask is rotate left shifted, the bits below the borrowing index that is being cleared will also be 0; the upper 0 bits are shifted off the left end and shifted onto the right end. Thus, clearing the borrowing status for one reserve clears all status flags for reserves with smaller indices.

To confirm our findings, we mutated the original Aave code to perform an equivalent computation but in an extremely roundabout way so as to avoid this optimization[1]:

uint256 orig = self.data;
uint lowerMask = bit - 1;
uint lowerData = orig & lowerMask;
uint toClear = orig >> (reserveIndex << 1);
uint clearLowerBitMask = 1;
clearLowerBitMask = ~clearLowerBitMask;
uint cleared = toClear & clearLowerBitMask;
uint shiftedBack = cleared << (reserveIndex << 1);
uint newValue = shiftedBack | lowerData;
self.data = newValue;

Rerunning the compiler with this new implementation gave assembly code without the `rol` instruction in a similar position as without this alternative implementation. This confirmed multiple working hypotheses, namely:

The original `rol` assembly snippet indeed corresponds to the `setBorrowing` implementation in question
The problematic behavior was due to a compiler optimization

We shared our findings with Aave Labs, BGD Labs and Matter Labs. Matter Labs were able to then pinpoint a bug in LLVM that was recently solved; an LLVM optimization pass did not properly account for the architecture bit-width (hence the use of a constant that would work for a 64-bit architecture but not a 256-bit one).

Takeaway

Compilers are often overlooked by developers and security researchers when analyzing code correctness, as logical errors within the code itself are typically the primary focus. However, the case we’ve just presented demonstrates that compiler bugs, while less common, cannot be dismissed. Even widely trusted compilers, such as LLVM, are not immune to introducing critical errors.

When verifying the correctness of your code, it is crucial to ensure that the analysis is performed on the actual executable code, rather than on an abstract representation (like the high-level source code). Whether you use testing, fuzzing, or formal verification, the tool you choose must operate directly on the bytecode to provide reliable results. This ensures that the validation process reflects the true behavior of the code in production.

Moreover, it is essential to test the bytecode compiled with the exact parameters intended for production, including optimization levels and linked libraries. Each of these parameters can significantly alter the resulting bytecode, potentially introducing unforeseen issues. We must approach even minor changes with caution, always questioning the integrity of our code to safeguard against hidden errors.

We would like to thank the Aave, BGD Labs, and ZKsync teams for their collaboration in responding to this issue.

[1] A reasonable question is why the rol pattern didn’t appear in the minimized test. While we can’t provide an exact answer in this particular case, it is common for compilers to apply these sorts of optimizations (so called “peephole” optimizations) depending on the surrounding context which our test contract may not have had.

September 9, 2024

Background

How to Find a Compiler Bug via Manual Inspection

Takeaway

Get every blog post delivered