Ensure the security of your smart contracts

Storage for Upgradable Ethereum Smart Contracts

Author: MixBytes team
This is the first article in the series "Upgradable Smart Contracts: Storage Highlights and Challenges".
We're going to take a closer look at upgradable smart contracts, their functions and storage options available for a developer.
Upgradable Ethereum Smart Contracts
Smart contracts on the Ethereum blockchain are immutable. Once a smart contract is deployed, it is impossible to change the code at the contract address. You can remove a contract altogether or, more precisely, a smart contract can destroy itself if such function was originally written in the code. On the one hand, the matter of trust is solved and users can be sure that everything is fully controlled by algorithms. On the other hand, bug fixing is out of question now.

As a result, upgradable Ethereum smart contracts come to our rescue. Wait, what? We have just said there are no such contracts in Ethereum (unlike EOS, for instance). However, upgradable smart contracts can be emulated. The idea is that smart contract address and code remain unchangeable, the code forwards execution to another contract and then returns the results. In this case, the main smart contract is called proxy. After saving another contract address in a variable, we can change it as easily as the contract state, whereas the code remains immutable. In the end there can be multiple smart contract versions; migration is carried out by recording the new version address.
Storing Upgradable Smart Contract State
Similar to any other software, developers have to address data migration issues whenever a new version is released. In case of proxy, where at all is a smart contract state supposed to be stored? Well, we have three totally different approaches.
Separate storages for each version
The first approach implies that each version separately stores its state in its own storage. This ensures maximum isolation and control, excludes conflicts, yet increases complexity and gas costs incurred from migrating separate records to the storage. Let's assume a basic token contract is being developed. In this case, core data is balances:

mapping (address => uint256) private _balances;
Directly calling _balances from the new version is not possible; to allow it, data has first to be migrated from the previous version. Note that migration must be performed only once.

mapping (address => uint256) private _balances;

// previous version of a token smart contract
ERC20 private _previous;

// flag indicates that migration of certain user balance was performed
mapping (address => bool) private _migrated;

function balanceOf(address owner) public view returns (uint256) {
    return _migrated[owner] ? _balances[owner] : _previous.balanceOf(owner);
}

function setBalance(address owner, uint256 new_balance) private {
    _balances[owner] = new_balance;
    if (!_migrated[owner])
        _migrated[owner] = true;
}
At this point additional issues arise: migration cannot be carried out on the spot, at any request, as data recording to the storage may be required and it is not available in view-only functions. As a result, all requests to balances, even internal, must be carried out via the balanceOf and setBalance functions. Boilerplate code is higher, let alone increased gas consumption.

At worst, calls to the view-only functions traverse the whole token version chain collecting data and failing to record operations results related to the latest version, as they have no modification rights. Calling to these functions from other versions than the latest is possible but has little sense.

Simultaneously migrating data and recording operation results for the current user in the latest token code version requires calling to functions that can change the state of the latest version. Thus, further calls to any other function will not take passing through the whole token version chain. Only the proxy contract has to be allowed to call functions that change the state of the latest version; for previous versions access to these functions must be completely denied.
Contract as a Database
Another storage option can be suggested. Let's see how the issue is tackled in conventional programs. Data is separated from the code! Moreover, when it comes to complex programs and systems, data is stored in SQL or NoSQL storages.

An ad-hoc smart contract written for the purpose can be used as a storage. Thus, data will always be kept in the storage of this contract regardless of the current token code version. The code of this contract can be moved to a library, but it is not on the agenda now. There will be no need to migrate data from storage to storage; instead, storage access rights are transferred from version to version. Yet, using this type of storage is not without issues. It will require defining an interface available to any version of the token smart contract, e.g. SQL-like or document-oriented. Speaking of an this storage type examples, have a look at EOS tables.

Let's unite structure, field names and data types under the data scheme umbrella. A storage smart contract code can consist of a static part (the code that doesn't change regardless of the current data scheme) and a dynamic part (the scheme-dependent code). It is the dynamic part that contains a lot of boilerplate code, therefore it makes sense to automatically generate it, as it is implemented in Protocol Buffers or in Apache Thrift. I happened to handle a similar task developing Ethereum columnar data storage prototype at the ETHBerlin hackathon.

The data item is described by the following structure:

struct Cafe {
        string name;
        uint32 latitude;
        uint32 longitude;
        address owner;
    }
..that we generate a "driver" for GitHub. The driver calls the static code from Github, for instance, `CDF.writeString`, `CDF.chunkDataPosition` and other functions.

As I already mentioned, the solution covers other issues and serves as an example of external storage operation. Currently, there are no working implementations of SQL/NoSQL storage over Ethereum smart contract storages I know of. It seems an interesting topic that may appear to be a promising solution to the problem of data storage in changeable smart contracts.
The state is stored in the contract used as a DB and called via the call not the delegatecall instruction. Access to write-calls should be protected and only available to the proxy contract. The common code of this DB contract can be moved to a library.
Delegatecall and storing data in a proxy contract
Finally, the third option is storing data in a proxy contract storage. How does a particular code version access the data if proxy is a stand-alone smart contract? The EVM delegatecall feature makes it possible. It executes the code at the target address but uses the storage of the contract that executed a delegatecall instruction (see more at Solidity).
Calling functions of previous contract versions makes little sense as these are no more than "code pieces" and all state is stored in the proxy contract. Delegatecall is used for invoking library contracts. The library code easily locates necessary data via the pointer. However, the instruction may be a potential threat to proxy contracts. Unfortunately, the official Solidity documentation barely warns us with the note: "If state variables are accessed via a low-level delegatecall, the storage layout of the two contracts must align in order for the called contract to correctly access the storage variables of the calling contract by name."
Conclusion
We looked into upgradable smart contract development and examined 3 data storage approaches. Next time we will deep dive into delegatecall and issues that may arise along the way. Safe contracts to you!
Other posts