Clustering Bitcoin addresses
Using Bitcoin, it is important for both those who value their privacy and those who seek to collect information disseminated in the blockchain to know what clustering is and how it works.
Basic knowledge
Clustering addresses means assuming that they belong to the same owner which could be a person, a firm, a service like an exchange, a darknet market, a payment processor, etc.
This can be done by knowing how a Bitcoin transaction works. It allows users that have a valid private key to sign transactions that forward any amount that was previously received to an arbitrary address. This implies that it is not possible to spend any desired amount of bitcoin, but only amounts derived summing previously received ones that have not been spent yet. Those amounts are named UTxOs (unspent transaction outputs).
This seems to be a limit of the Bitcoin protocol, but this apparent issue is quickly solved, allowing users to send bitcoins to more addresses with the same transaction. So, when a user specifies to his client the amount he wants to send, the client chooses enough UTxOs to reach this value, sending the desired amount to the specified recipient address and sending the remaining part to another address (which could be the same sending address or a new address). This last address is called the change address and the amount it receives is named change of the transaction.
Let's see an example.
Anubis received three transactions, one of 0.5 bitcoins, one of 0.9 bitcoins and one of 0.3 bitcoins. Now, Anubis wants to send 1 bitcoin to Ramses.
Anubis' wallet will choose UTxOs to reach the amount of 1 bitcoin (some clients spend every time the entire available amount).
In this case, we can assume that the wallet chooses the amount of 0.9 bitcoins and the amount of 0.3, reaching a sum of 1.2 bitcoins. So, the client is going to create a transaction that sends 1.1 bitcoins to Ramses' address and 0.1 (minus the transaction fees) to another address belonging to Anubis.
If the UTxOs used by Anubis' wallet were received by the same address, the transaction will show only one sending address. Unless so, we will see more sending addresses. To make the transaction valid, it needs to be signed with the private keys corresponding to all the sending addresses (all belonging to Anubis).
Let's cluster!
With these premises, we can find out some methods that can be used to cluster addresses together. Here there are some clustering techniques, with examples obtained using blockchain.com's explorer.
Co-spending
As stated above, when a transaction shows more input addresses, it means that it has been signed by the private key related to each of the sending addresses. It is technically possible to partially sign a transaction and then let it sign by another person, but it is an absolutely rare circumstance. In this case, it is possible to assume that all the sending addresses belong to the same entity and can be clustered together.
Some methods and solutions could give us false results. For instance, there are CoinJoin transactions, related to softwares like Samourai and Wasabi, where there are a lot of different input addresses that can't be clustered together. The addresses used by these services are spending bitcoins of different users, and clustering them together could give us wrong information. This kind of transactions can easily be recognized because there are many addresses spending the same amount.
Other false results could be obtained with PayJoin transactions, supported by softwares like the above-mentioned Samourai and Wasabi. In this case, the UTxOs of different users are joined together in the same transactions, making it very difficult to recognize any single flow of coins. This kind of transactions, unlike CoinJoin transactions, is almost impossible to recognize, because they do not have any distinctive feature.
Change address
Looking at how change addresses work lets us assume that not only the input addresses can be clustered together, but also the one receiving the change. The only problem could be how to find out which one is receiving the change and which one is the effective recipient address (the one that is receiving the bitcoins that the user wanted to transfer).
Amounts
First of all we, can look at the transferred amounts. Usually, if an address receives an amount with a very low number of decimal digits, it could be the effective recipient address because:
- humans do not like complex numbers;
- it is very unlikely to send an amount with eight decimal digits that would give a change (minus the transaction fees) with only a few decimal digits.
Considering that some clients choose the UTxOs so that the amount involved in the transactions reaches the amount that the user wants to transfer, the amount that necessarily requires all the involved UTxOs should be the effective recipient address. In the example of Anubis and Ramses, Anubis' client wouldn't have involved both the UTxOs if he intended to send only 0.1 bitcoins, so the address receiving that amount is necessarily the change address.
Also in the transaction shown in the picture, if the user wanted to send 0.000888 bitcoins, he wouldn't have needed to involve three UTxOs. This rule works pretty well but, nowadays, lots of clients have their optimization algorithms and do not allow users to make coin selection. In some cases, to avoid address reuse, preserve users' privacy or consolidate the amounts, they spend all the available UTxOs of each address that is involved in the transaction.
Address types
The majority of the Bitcoin wallets usually use addresses of the same type (for example Electrum supports different kinds of addresses only when creating a wallet importing the private keys). When a transaction is sent to two addresses of different types, the change address is mostly the one in the same format as the sending addresses.
Consolidation transactions
There is another situation to consider. Some users do not like to have many UTxOs in many addresses or sometimes move funds to cold storages or new wallets. In this case, we'll have a transaction without any change address, because users are moving all of their funds. So all the addresses involved in the transactions (senders and receiver) can be reconducted to the same person. Indeed, it is very unlikely that a user has available different UTxOs whose total amount is exactly what he intended to send including the exact transaction fee.
We have to be careful with this kind of assertion, because it could also happen that users decide to empty their wallet sending all their coins to an exchange, maybe to convert virtual currency into FIAT currency or to do some trades. So, clustering the recipient address with the sending ones would mean clustering the user's addresses with an address belonging to an exchange. So, we need to observe how the recipient address is being used before clustering it with the sending ones.
Final thoughts
The mentioned techniques are all based on statistics and the observation of a huge amount of data, so they can mostly be considered valid. Someone that knows very well how Bitcoin works could circumvent these clustering methods, but it is very hard to do it without doing any mistakes. In any case, these techniques work very well in a great part of the cases. We only need to remember that exceptions happen and we have to be ready to question these methods at any time.