In GBC.AI we are focused on implementing the best tech to our products. We’ve started working on implementing AI for the blockchain industry long before the current hype. Today we’d like to share our experience. What did it take us to implement machine learning (ML) within Wallet Guardian product to detect scams.
We decided to utilize ML to build a tool that can detect scams very fast and be reliable. It may sound easy but realistically we’ve had to do a big job and spend tons of time researching.
The power of ML comes in a way, that instead of prewritten rules, you train models on a data set. It can utilize multiple factors in very different ways, that are hardly predictable and at first glance they can seem not important.
Let’s start with the basics here. First of all we need to understand the issues with applying ML and why it’s not an easy job. Also we’ll discuss what can make it easier to implement ML to smart contracts.
In order to train ML algorithms to detect scam smart contracts, a training sample must be created. However, there is no single database that contains all scam contracts, and there are only a few databases that provide information about such contracts. Moreover, most of these databases provide data through APIs, which makes it difficult to collect a sample. Additionally, it should be noted that there is no universal concept or methodology for identifying scam contracts. For instance, should a smart contract token that has lost liquidity due to project and team issues be considered a scam? Some tokens do not require liquidity based on their unique characteristics. Furthermore, how strongly do high sales commissions correlate with scams? Thus, when forming a database of scam contracts, it is crucial to establish clear criteria for identifying which contracts are considered scams.
In order to train the model to differentiate between legitimate and scam contracts, it is crucial to establish the criteria for “normality” and prepare a sample of such contracts. However, this task is not straightforward, as we must ensure that scam contracts are not accidentally included in the sample. Additionally, there is the challenge of handling contracts that are not overtly fraudulent, yet have certain issues that render them less than trustworthy.
The first and foremost task is to create a database of smart contract addresses, with each address classified into a specific type such as trusted, suspicious, or scam. This labeling process can be challenging and mostly requires manual or semi-manual input, but it carries significant research value in the future. Additionally, the algorithm’s ability to learn accurately will be enhanced with a larger and more diverse sample.
The next important task is to create a feature description for each smart contract. This means presenting each contract as a set of features, such as verification on Etherscan, liquidity, tax value, and more. The more extensive the feature description, the more chances there are for the ML model to learn and accurately detect the level of risk. However, there are difficulties in collecting indicators, as not all indicators can be quickly and correctly collected. Most existing services only provide information for a small sample of contracts. Additionally, tokens can be placed on different decentralized exchanges, making it challenging to collect data from a large number of exchanges. It is necessary to strive to collect most of the contract data independently, without relying on existing APIs, to avoid being caught off guard by the discontinuation of a data source. Technical audit indicators of contract code are also important features, and provide a vast space for research and feature generation to help more accurately determine the level of risk. Off-chain data such as the presence of a website and social media activity are also among the most common indicators.
When tokens are swapped on an exchange, Wallet Guardian analyzes the token and assigns it a risk level. To confirm the transaction, users must approve it through their crypto wallet. The extension needs to work quickly because users don’t want to wait long for a response. This requires collecting data. One way to avoid long waiting times is to continuously collect data from contracts, including historical contracts, to form a database of all contract indicators and keep it updated.
Continuous data collection is also necessary because historical data is important for training models to work with smart contracts dynamically and track how indicators change over time. Time-series data provides additional information on the dynamics of indicators and allows for the use of new classes of models, such as recurrent neural networks, to achieve higher accuracy.
It’s important to note that each blockchain requires a unique dataset and a new model to detect risk levels, but research is also being done on multi-chain learning, where contract samples from different networks are used to train the model.
Once there is enough training data, the focus is on data preprocessing, generating model features based on the collected data, selecting the appropriate model, and training it. During feature selection, as many different indicators as possible are collected, but some are discarded if they don’t contribute significantly to the final model’s performance.
In summary, having data and labeling it is key to the model’s success. Therefore, it’s important for the entire blockchain community to participate in creating a unified open knowledge base of scam smart contracts and creating universal tools that can help collect different indicators (including historical data), aggregate data from different exchanges, and cover as many smart contracts as possible.
About GBC.AI
Guardians of the Blockchain create Security tools supported by Artificial Intelligence to protect Blockchains users. Vulnerabilities are proactively dealt with before they become problems, keeping blockchains efficient and users safe while allowing for exponential growth.
“Wallet Guardian” protects Cryptocurrency and Digital Assets users’ wallets, providing “Do Your Own Research” with one click to detect malicious smart contracts before interacting.
Follow and join the GBC.AI community for the latest news: