New journal publication (TSE) on the code cloning practices on the Ethereum blockchain platform

Our paper Code Cloning in Smart Contracts on the Ethereum Platform: An Extended Replication Study in collaboration with Faizan Khan, Shane McIntosh, and Daniel Varro has been accepted for publication in the reputable IEEE Transactions on Software Engineering (TSE) journal.

Pre-print available.
Official article: 10.1109/TSE.2022.3207428.

Smart contracts are programs deployed on blockchains that run upon meeting predetermined conditions. Once deployed, smart contracts are immutable, thus, defects in the deployed code cannot be fixed. As a consequence, software engineering anti-patterns, such as code cloning, pose a threat to code quality and security if unnoticed before deployment. In this paper, we report on the cloning practices of the Ethereum blockchain platform by analyzing 33,073 smart contracts amounting to over 4MLOC. Prior work reported an unusually high 79.2% of code clones in Ethereum smart contracts. We replicate this study at the conceptual level, i.e., we answer the same research questions by employing different methods. In particular, we analyze clones at the granularity of functions instead of code files, thereby providing a more fine-grained estimate of the clone ratio. Furthermore, we analyze more complex clone types, allowing for a richer analysis of cloning cases. To achieve this finer granularity of cloning analysis, we rely on the NiCad clone detection tool and extend it with support for Solidity, the programming language of the Ethereum platform. Our analysis shows that most findings of the original study hold at the finer granularity of our study as well; but also sheds light on some differences, and contributes new findings. Most notably, we report a 30.13% overall clone ratio, out of which 27.03% are exact duplicates.
Our findings motivate improving the reuse mechanisms of Solidity, and in a broader context, of programming languages used for the development of smart contracts. Tool builders and language engineers can use this paper in the design and development of such reuse mechanisms. Business stakeholders can use this paper to better assess the security risks and technical outlooks of blockchain platforms.