2024-09-03
This article primarily delves into a crucial mechanism within the AntDB database kernel: the MVCC mechanism.
Introduction to MVCC
MVCC (Multi-Version Concurrency Control) is a mechanism employed in AntDB to achieve transaction isolation levels. It enables multiple transactions to concurrently read, write, and modify data without interfering with each other. In MVCC, each database transaction views a specific version of the data when reading, allowing simultaneous read-write operations between transactions without conflicts. Each transaction operates on its version of the data, thereby achieving higher concurrency and better performance.
The core idea of MVCC is that instead of modifying the original data directly for each update operation, a new version of the data is created, and the modifications are applied to this latest version. This way, other transactions can still access the old version of the data without being impacted by ongoing modifications. Only when a transaction commits the new version of the data replaces the old one, ensuring data consistency.
Implementation Principles of MVCC
1. Hidden Fields
Before understanding MVCC, it's essential to introduce several hidden fields in the database kernel that collaborate to implement this mechanism:
oid: Object Identifier, a globally unique value assigned to tables, indexes, and views.
cited: Physical location identifier for each record (tuple) within a table.
xmin: When a record (tuple) is created, this field records the current transaction ID.
xmax: Defaults to 0 upon tuple creation; upon deletion, it records the current transaction ID.
cmin/cmax: Sequence values starting from 0, used within the same transaction to determine version visibility among multiple statement commands.
2. Data Versions
In AntDB, each data item contains a unique version number called xmin and a deletion version number called xmax. These version numbers track the creation and deletion transactions of row versions. When a transaction reads or modifies data, a new data version is created in memory, and a unique version number associated with the transaction ID is assigned to track modifications.
3. Transaction States
Each transaction has a unique transaction ID which is called xid. It can identify the transaction's state. When a transaction starts, it is assigned a unique xid and recorded in the transaction status log file, which tracks transaction status information. Through this log file, AntDB can track each transaction's state and select the appropriate data version based on the xid.
4. Concurrency Control
AntDB employs a technique called "read-write skew" for concurrency control. The basic idea is that when a transaction is modifying data, other transactions cannot concurrently read or modify the same data. By restricting concurrent access to data, AntDB avoids read-write conflicts.
When a transaction needs to read data, AntDB checks the data's latest version number (xmax). If xmax equals the current transaction's xid, the data is being modified, and the current transaction cannot read it. If xmax is less than the current transaction's xid, the data has been committed, and the current transaction can read it. If xmax falls between two transaction IDs, the data is in an intermediate state, and the current transaction must wait for it to become available.
When a transaction modifies data, AntDB writes the new data version to disk and updates the latest version's xid in the transaction status log file. Other transactions reading the data will check if the data's xid matches theirs; if so, they read that version; if not, they read a higher version. This approach enables AntDB to control concurrency and avoid read-write conflicts.
5. Visibility Determination
Visibility determination is crucial in concurrency control, determining whether a transaction can see data committed by another. In AntDB, this is achieved by checking xid. If one transaction's commit order is after another's, the former can see the latter's committed data. Conversely, if it's before, it cannot. In this way, AntDB realizes visibility determination.
Advantages of MVCC
Concurrency Optimization
Reads and writes do not block each other, improving concurrent access efficiency. Writes do not block other transactions' reads, and before a write transaction commits, earlier versions are read.
Fast Rollback
Transactions can rollback quickly since each modified tuple carries the current xid, allowing rollback by simply marking the corresponding transaction status in the log file.
Disadvantages of MVCC
Index Maintenance Overhead
UPDATE operations require updating all indexes for the table and adding entries for new versions. These index updates increase memory pressure and disk I/O, especially for tables with a large number of indexes, and the overhead incurred when updating tuples increases as the number of indexes in the table increases.
Transaction Rollback Issues
Transaction ID rollbacks can affect concurrent access and submission orders, potentially causing deadlocks, phantom reads, and other issues. Specifically, when the new xid is rolled back to the old xid, it may cause the old transaction to fail to commit properly because the new transaction overwrites the old transaction's version of the data.
Garbage Data Issues
In MVCC, updated and deleted records are not physically removed, leading to accumulations of outdated data in frequently operated tables. This consumes disk space and requires more I/O during scans, reducing query efficiency. However, expired data can be cleaned up using the vacuum command.
Conclusion
This article comprehensively explains the working principles, advantages, and disadvantages of the MVCC mechanism and its impact on database transactions and concurrent access. MVCC is a powerful concurrency control mechanism that enhances database concurrency and performance while mitigating performance issues and concurrency conflicts associated with traditional locking mechanisms.
However, its usage requires attention to optimization due to its drawbacks.
About AsiaInfo Anhui AntDB
Established in 2008, AntDB serves over 1 billion users in 24 provinces, cities, and autonomous regions across the country on the core systems of service provider. It has product features such as high performance, elastic scalability, and high reliability. It can handle millions of core communications transactions per second at peak values, ensuring the continuous and stable operation of the system for over a decade, and has been successfully commercialized in industries such as communications, finance, transportation, energy sources, and the IoT.