Towards Auto Contract Generation and Ensemble-based Smart Contract Vulnerability Detection
Keywords:Blockchain, Smart Contract Vulnerabilities, Ethereum, Machine Learning, Ensemble Model, BPMN
Smart contracts (SC) are computer programs that are major components of Blockchain. The "intelligent contract" is made up of the rules accepted by the parties concerned. When the transactions started by the parties obey these established rules, then only their transactions will be completed without the involvement of a third party. Because of the simplicity and succinct nature of the solidity language, most smart contracts are written in this language. Smart contracts have two limitations, which are vulnerabilities in SC and that smart contracts can't be understood by all stakeholders, especially non-technical people who are involved in the business, since they are written in a programming language. Hence, the proposed paper used the XGBoost model and BPMN (Business Process Modeling Notation) tool to solve the first and second limitations of the SC respectively. Attackers are drawn to attention because of the popularity and fragility of the Solidity language. Once smart contracts have been launched, they can’t be changed. If that smart contract is vulnerable, attackers may then cash it. BPMN is used to represent business rules or contracts in graphical notation, so everyone involved in the business can understand the business rules. This BPMN diagram can be converted into a smart contract template through the BPMN-SOL tool. A few publications and existing tools exist on smart contract vulnerability detection, but they require more time to forecast and interpretation of vulnerability causes is also difficult. Thus, the proposed model experimented with several deep learning approaches and improved F1 score results by an average of 2% using the XGBoost model based on the ensemble technique to detect vulnerabilities of SCs, which are: Denial of Service (DOS), Unchecked external call, Re-entrancy, and Origin of Transaction. This paper also combined two important features to construct a data set, which are code snippets and n-grams.