- Why is
m_assumeutxo_data
hardcoded in the first place if we don’t want to trust other’s UTXO set? (We’re getting forced to use only that UTXO set version)
The concern is people putting up websites with instructions for “even faster sync time!” with UTXO set downloads. If such a site would become popular, and then compromised, there is a non-negligible chance of this actually resulting in a malicious UTXO set being loaded and accepted by users, even if briefly (anything is possible in such a UTXO set, including the attacker giving themselves 1 million BTC).
By putting the commitment hash in the source code, it becomes subject to Bitcoin Core’s review ecosystem. I think it’s unfair to call this just a “developers decide”, because:
- Active review community Anyone can, and many people do, look over the changes to the source code. A change to the
m_assumeutxo_data
value is easy to review (just check an existing node’s hash), and gets a lot of scrutiny. - Bitcoin Core has reproducible builds. Anyone, including non-developers, can participate in building releases, and they should end up with bit-for-bit identical binaries as the ones published. This establishes confidence that the binaries which people actually run match the released source code, including the
m_assumeutxo_data
value.
If you think of “developers” as the entire group of people participating in these processes, then it’s of course not incorrect to state that it’s effectively this group making that decision. But I think the scale and transparency of the whole thing matters. This isn’t a single person picking a value before a release, without oversight, as an instruction on a website might be. And of course, the users is inherently trusting this group of people/process anyway for the validation software itself, even if we try to minimize the extent this trust is needed.
- Why is the
m_assumeutxo_data
set to 840.000 and not to the same block asassumevalid
?
The original idea, even though nobody is working right now on completing it, behind assumeutxo included automatic snapshotting and distribution of snapshots over the network, so that users would not need to go find a source.
In such a model, there would be a predefined schedule of heights at which snapshots would be made. For example, there could be one every 52500 blocks (roughly once per year), and all nodes supporting the feature would make a snapshot at that height when reached, and keep the last few snapshots around for download over the P2P network. New nodes starting up, with m_assumeutxo_data
values set to whatever the last multiple of 52500 was at the time of release, can then synchronize from any snapshot-providing node on the network, even if the provider is using older software than the receiver.
While there is no progress currently on the P2P side of this, it still suggests using a snapshot height schedule that is not tied to Bitcoin Core releases.
- I understand that we don’t want people to start trusting random UTXO sets because of laziness for waiting to sync, but couldn’t we use some kind of signed-by-self UTXO sets? It would be great if as a user you can backup the actual UTXO set, sign it in some way, and be able to load+verify it in the future to sync a new node.
If it’s just for yourself, you can make a backup of the chainstate
directory (while the node is not running). Assumeutxo has a number of features that matter in the wide distribution model, but don’t apply to personal backups:
- The snapshot data is canonical. Anyone can create a snapshot at a particular height, and everyone will obtain an identical snapshot file, making it easy to compare, and distribute (potentially from multiple sources, bittorrent-style).
- Snapshot loading still involves background revalidation. It gives you a node that is immediately synced to the snapshot point, and can continue validation from that point on, but for security, the node will still separately also perform in the background a revalidation of the snapshot itself (from genesis to the snapshot point).
If you trust the snapshot creator and loader completely (because you are both of them yourself), the overhead of these features is unnecessary. By making a backup of your chainstate (which holds the UTXO set), you can at any point, on any system, jump to that point in validation. It’s a database, so it is not byte-for-byte comparable between systems, but it is compatible. The side “restoring” the backup won’t know it’s loading something created externally, so it won’t perform background re-validation, but if you ultimately trust the data anyway, this is just duplication of work.