AntDB-T Database Kernel's MVCC Mechanism of AsiaInfo Anhui

2024-09-03 Asiainfo

This article primarily delves into a crucial mechanism within the AntDB database kernel: the MVCC mechanism. 

 

Introduction to MVCC 

 

MVCC (Multi-Version Concurrency Control) is a mechanism employed in AntDB to achieve transaction isolation levels. It enables multiple transactions to concurrently read, write, and modify data without interfering with each other. In MVCC, each database transaction views a specific version of the data when reading, allowing simultaneous read-write operations between transactions without conflicts. Each transaction operates on its version of the data, thereby achieving higher concurrency and better performance. 

 

The core idea of MVCC is that instead of modifying the original data directly for each update operation, a new version of the data is created, and the modifications are applied to this latest version. This way, other transactions can still access the old version of the data without being impacted by ongoing modifications. Only when a transaction commits the new version of the data replaces the old one, ensuring data consistency. 

 

Implementation Principles of MVCC 


1. Hidden Fields 

 

Before understanding MVCC, it's essential to introduce several hidden fields in the database kernel that collaborate to implement this mechanism: 

 

oid: Object Identifier, a globally unique value assigned to tables, indexes, and views. 

cited: Physical location identifier for each record (tuple) within a table. 

xmin: When a record (tuple) is created, this field records the current transaction ID. 

xmax: Defaults to 0 upon tuple creation; upon deletion, it records the current transaction ID. 

cmin/cmax: Sequence values starting from 0, used within the same transaction to determine version visibility among multiple statement commands. 


2. Data Versions 

 

In AntDB, each data item contains a unique version number called xmin and a deletion version number called xmax.  These version numbers track the creation and deletion transactions of row versions. When a transaction reads or modifies data, a new data version is created in memory, and a unique version number associated with the transaction ID is assigned to track modifications. 

 

3. Transaction States 


Each transaction has a unique transaction ID which is called xid. It can identify the transaction's state. When a transaction starts, it is assigned a unique xid and recorded in the transaction status log file, which tracks transaction status information. Through this log file, AntDB can track each transaction's state and select the appropriate data version based on the xid.

 

4. Concurrency Control

 

AntDB employs a technique called "read-write skew" for concurrency control. The basic idea is that when a transaction is modifying data, other transactions cannot concurrently read or modify the same data. By restricting concurrent access to data, AntDB avoids read-write conflicts. 

 

When a transaction needs to read data, AntDB checks the data's latest version number (xmax). If xmax equals the current transaction's xid, the data is being modified, and the current transaction cannot read it. If xmax is less than the current transaction's xid, the data has been committed, and the current transaction can read it. If xmax falls between two transaction IDs, the data is in an intermediate state, and the current transaction must wait for it to become available. 

 

When a transaction modifies data, AntDB writes the new data version to disk and updates the latest version's xid in the transaction status log file. Other transactions reading the data will check if the data's xid matches theirs; if so, they read that version; if not, they read a higher version.   This approach enables AntDB to control concurrency and avoid read-write conflicts. 

 

5. Visibility Determination

 

Visibility determination is crucial in concurrency control, determining whether a transaction can see data committed by another. In AntDB, this is achieved by checking xid. If one transaction's commit order is after another's, the former can see the latter's committed data. Conversely, if it's before, it cannot. In this way, AntDB realizes visibility determination.

 

Advantages of MVCC

 

Concurrency Optimization

 

Reads and writes do not block each other, improving concurrent access efficiency. Writes do not block other transactions' reads, and before a write transaction commits, earlier versions are read.

 

Fast Rollback

 

Transactions can rollback quickly since each modified tuple carries the current xid, allowing rollback by simply marking the corresponding transaction status in the log file.

 

Disadvantages of MVCC

 

Index Maintenance Overhead

 

UPDATE operations require updating all indexes for the table and adding entries for new versions. These index updates increase memory pressure and disk I/O, especially for tables with a large number of indexes, and the overhead incurred when updating tuples increases as the number of indexes in the table increases.

 

Transaction Rollback Issues

 

Transaction ID rollbacks can affect concurrent access and submission orders, potentially causing deadlocks, phantom reads, and other issues. Specifically, when the new xid is rolled back to the old xid, it may cause the old transaction to fail to commit properly because the new transaction overwrites the old transaction's version of the data.

 

Garbage Data Issues

 

In MVCC, updated and deleted records are not physically removed, leading to accumulations of outdated data in frequently operated tables. This consumes disk space and requires more I/O during scans, reducing query efficiency. However, expired data can be cleaned up using the vacuum command.

 

Conclusion

 

This article comprehensively explains the working principles, advantages, and disadvantages of the MVCC mechanism and its impact on database transactions and concurrent access. MVCC is a powerful concurrency control mechanism that enhances database concurrency and performance while mitigating performance issues and concurrency conflicts associated with traditional locking mechanisms.

 

However, its usage requires attention to optimization due to its drawbacks.


About AsiaInfo Anhui AntDB

 

Established in 2008, AntDB serves over 1 billion users in 24 provinces, cities, and autonomous regions across the country on the core systems of service provider. It has product features such as high performance, elastic scalability, and high reliability. It can handle millions of core communications transactions per second at peak values, ensuring the continuous and stable operation of the system for over a decade, and has been successfully commercialized in industries such as communications, finance, transportation, energy sources, and the IoT.