ChatGPT for security audits

Author: Konstantin Nekrasov
Security researcher at MixBytes

Introduction

Before You Start

Beware of the Problem with Input Size

1. “Find vulnerabilities!”

2. “Simplify the code…”

3. Checking Invariants

4. Question Mining

Conclusion

Introduction

ChatGPT has brought about a considerable transformation in programming and research. This article will delve into various prompts that can be immensely beneficial for a security auditor.

Before You Start

To ensure OpenAI doesn’t utilize your conversation for model training while auditing, remember to disable the checkbox:

Beware of the Problem with Input Size

As you may have anticipated, GPT has its limitations when it comes to input size:

We also encountered an unexpected bug in GPT-4, where it silently disregarded the lower part of the input without any indication of an error.
For instance, when we fed a big contract into GPT-3.5 and asked it to disregard everything above and just answer “1+1=?”, it impressively provided the correct answer:

However, GPT-4 simply summarized the top part of the input, completely overlooking the question:

This could lead to a negative situation where GPT-4 appears to offer analysis but actually cuts off your request, missing crucial information at the bottom. Beware!

1. “Find vulnerabilities!”

The most obvious way to use ChatGPT is to ask ChatGPT to discover vulnerabilities in the code.

Let’s try doing this for a small staking contract [→ see code].

The contract has several issues that we would like to identify via GPT:

A DOS vulnerability in unstake() when dealing with a large stake.
The confusion between_reward and _amount in unstake().
The problem of calling stake() twice overwrites the user’s previous stake.

We used the following prompt: “Find vulnerabilities” → “More” → “More”:

The result:

Both GPT-3.5 and GPT-4 raised several false alarms, with GPT-3.5 performing the worst. However, GPT-4 impressively identified two out of the three vulnerabilities! Great job!

In summary, GPT can be effectively used as a scanner for small, isolated contracts or functions, thereby enhancing the quality of audits. However, brace yourself for a considerable number of false alarms and don’t expect it to uncover all vulnerabilities.

Keep in mind that applying this method to large contracts poses challenges:

GPT cannot handle large inputs, necessitating the division of the contract into parts. But be cautious, as GPT starts to forget various details from the beginning of the conversation as you send parts, ultimately affecting the result.
Additionally, if the contract you’re auditing involves external integrations or inherits from other contracts, ensure you provide GPT with documentation for these dependencies. Without understanding the context, GPT may yield subpar results. So, proceed with care!

2. “Simplify the code…”

At the outset of an audit, your primary mission is to grasp the inner workings of the protocol at a high level. With the aid of GPT, you possess a powerful tool to streamline complex code, clarify variable names, remove unnecessary comments or events, reduce the code by substituting certain arguments. All of this accelerates your immersion into the protocol.

Let’s embark on this with a practical example – simplifying the liquidate_extended() and _liquidate() functions from crvUSD [→see code].

The code of these functions is fed into GPT-4, and prompts are employed one by one, watching the result:

1. Merge _liquidate() into liquidate_extended()
2. Simplify the code knowing that the argument `user` is always equal to `msg.sender`
3. Simplify the code knowing that the argument `use_eth` is always equal to `true`
4. Simplify the code knowing that the argument `frac` is always equal to `10**18`
5. Simplify the code knowing that the argument `min_x` is always equal to `0`
6. Strip types
7. Replace `xy = ...` to `stablecoin_amount, collateral_amount = ...`
8. Simplify the code by adding a `require(debt > stablecoin_amount)`
9. Simplify the code knowing that `callbacker` is always `empty(address)`
10. Replace unsafe_ by respective math ops

This is what we get in the end:

def liquidate_extended():
    """
    Perform a bad self-liquidation if health is not good
    """
    debt, rate_mul = self._debt(msg.sender)

    stablecoin_amount, collateral_amount = AMM.withdraw(msg.sender, self._get_f_remove(10**18, 0))

    require(debt > stablecoin_amount)
    STABLECOIN.transferFrom(AMM.address, self, stablecoin_amount)

    to_repay = debt - stablecoin_amount

    self._withdraw_collateral(msg.sender, collateral_amount)
    STABLECOIN.transferFrom(msg.sender, self, to_repay)

    self.redeemed += debt
    self.loan[msg.sender] = Loan({initial_debt: 0, rate_mul: rate_mul})
    self._remove_from_list(msg.sender)

    d = self._total_debt.initial_debt * rate_mul / self._total_debt.rate_mul
    self._total_debt.initial_debt = max(d, debt) - debt
    self._total_debt.rate_mul = rate_mul

The outcome reveals a transformed code, now remarkably more comprehensible than its original version.

Now we can scroll up our dialogue with GPT-4 and modify certain prompts. For example, we could ask it to simplify the code for the case callbacker==msg.sender, to see a different code flow for this function.

These simplifications can be highly beneficial during the initial stages of an audit especially when you need to quickly understand the high-level workings of the protocol.

Important notes:

GPT-4 performed great!
GPT-3.5 failed and produced wrong code.
GPT-4 also fails when we ask it to do everything at once with a single combined prompt instead of 10 consecutive requests.

3. Checking Invariants

Harness the GPT to unlock the secrets of invariants, especially when confronted with formidable tasks like navigating through vast functions and unraveling code paths with significant ramifications.

Let’s consider adjustTrove() and _adjustTrove() functions from Ethos (clone of LUSD) as an example [→ see code].

Suppose we seek to uncover scenarios where the fee eludes the grasp of the caller. We use the prompt:

GPT-4 accurately suggests that the fee is only levied under specific conditions: _isDebtIncrease=true && isRecoveryMode=false.
Great!

With the ability to automatically identify logical paths with specific consequences, you can quickly and effortlessly check your suspicions about various vulnerabilities.

4. Question Mining

An auditor’s prowess soars with each challenging question they pose and answer, for it enriches the tapestry of the audit’s quality. Why not harness GPT’s intellect to generate questions for us? We’re not after mundane checklists; we crave astute inquiries that are relevant and easily verifiable.

Behold the magic template we conjured:

Imagine you are a security researcher and you are auditing [DESCRIBE THE PROJECT]. [DESCRIBE TECHNICAL DETAILS]. To find vulnerabilities in the project, you must read these functions and ask three of the most important edge-case questions about them. This will help you identify bugs or vulnerabilities. What would those three questions be? Ask very specific questions and provide suspicious arguments and code path you want to check.

Yet, heed the delicate balance! The quality of GPT’s responses dances upon the prompt.

Take for instance some code from crvUSD. First, we merged several of its functions [1,2] into one [→ see code]. And then prompted:

To assess the quality of generation, we divided the resulting questions into three categories:

Nonsense. For example, when GPT suggests checking negative values of uint256.
Lazy. For example, when it asks what will happen if you pass the collateral==0, even though there’s a check in the code assert collateral * X / Y > 100, which forbids zero values.
Vague. For example, when it suggests checking all unsafe math operations in the code. This is a typical question from a static checklist and we don’t need that.
Good, specific question. For example, when GPT states that create_loan() can accept a collateral value that doesn’t match msg.value, and there’s a call to an unknown part of the code, _deposit_collateral(collateral, msg.value). And GPT wonders: does this unknown function correctly deal with these two values possibly not matching?

Merging several functions, we engaged GPT-4 to weave 15 questions, and here’s what we got:

Great result!

Conclusion

We’ve unveiled some techniques that can accelerate your understanding of a protocol and elevate the grandeur of your audits. The realm of AI is a burgeoning landscape, and fear not, for mastery of this amazing tool shall lead you to extraordinary feats. So, embrace the power of GPT, and together, we shall embark on an endless journey of discovery!

Who is MixBytes?

MixBytes is a team of expert blockchain auditors and security researchers specializing in providing comprehensive smart contract audits and technical advisory services for EVM-compatible and Substrate-based projects. Join us on X to stay up-to-date with the latest industry trends and insights.

Disclaimer

The information contained in this Website is for educational and informational purposes only and shall not be understood or construed as financial or investment advice.