we can dive into the details of how MySQL implements Multi-Version Concurrency Control (MVCC)
Thank you for reading this article. More Interview Questions here:
https://programmerscareer.com/software-interview-set/
Topic: Unpacking MVCC
MultiVersion Concurrency Control (MVCC) is a technique used in database management systems to avoid conflicts between users accessing the same data simultaneously, thereby increasing processing efficiency.
In essence, MVCC allows multiple transactions to access the same data without conflicts or the need for read locks, which can slow down performance significantly. This is achieved by creating a “snapshot” of the data at the point a transaction begins. Other transactions happening concurrently won’t affect this snapshot.
Now, let’s break down the term MVCC:
- MultiVersion: This suggests that the database maintains several versions of the same record. The “version” is a snapshot of the data at a certain point in time.
- Concurrency Control: This implies that the technique is used to handle simultaneous transactions without conflict, ensuring that each transaction maintains the illusion that it’s the only one accessing the database.
MySQL implements MVCC in its InnoDB storage engine. When a transaction updates a row in InnoDB, the original row is not immediately overwritten or deleted. Instead, InnoDB stores the old version of the row internally for any other active transactions that need to see the original version. This is how “versions” of data rows come into existence.
So, why do we use MVCC in a system like MySQL? Primarily, it’s for performance reasons. By allowing multiple transactions to access the same snapshot of data without conflict, we avoid the need for read locks that can bottleneck the system. In a system where many users might be querying and updating the same database, this can significantly speed up processing times.
Topic: How MVCC works in MySQL
Let’s now zoom into the internals of how MVCC operates within MySQL.
When a transaction starts in MySQL (InnoDB), it gets assigned a unique transaction ID. This ID is used to create its view of the database. This view consists of all the committed data up until the time the transaction started, and any changes made by the transaction itself. The transaction can’t see any data modified by other simultaneous transactions, providing a consistent snapshot and ensuring isolation.
When a row is modified within a transaction, InnoDB won’t overwrite the existing data. Instead, it writes the new row version and maintains the old version in an area called an undo log. This log contains information needed to revert changes if a transaction is rolled back, and it provides older versions of a row to other transactions that might need them.
Now let’s talk a bit about some related topics: Read Views, Undo Logs, and Purging.
Read view is the mechanism InnoDB uses to implement consistent reads, i.e., reading the snapshot of the database corresponding to the point in time when the transaction started.
Undo logs are a crucial part of MVCC. As mentioned earlier, when a transaction modifies data, InnoDB writes the new row to the database and stores information necessary to reconstruct the older version of the row in an undo log record. If another transaction needs to see the older version of the row, InnoDB uses the undo log record to reconstruct it.
Purging relates to how InnoDB cleans up old versions of a row that are no longer needed by any ongoing transactions. Once all transactions that might need to access an older row version have completed, InnoDB can free space held by this version. This process is referred to as purging.
Topic: ACID Properties and MVCC
One of the crucial aspects of any reliable database management system is ensuring that it maintains certain properties defined by the ACID principle, which stands for Atomicity, Consistency, Isolation, and Durability.
- Atomicity: If a transaction involves multiple operations, atomicity means that either all the operations are executed successfully, or none of them are. There’s no in-between — a transaction can’t be partially complete. If an error happens during any operation in a transaction, the whole transaction is rolled back.
- Consistency: Consistency means that a transaction should bring the database from one consistent state to another, according to predefined rules. For example, if an account doesn’t have sufficient balance for a withdrawal, the transaction should be rejected to maintain consistency.
- Isolation: Isolation comes into play when multiple transactions are being executed simultaneously. It means each transaction should behave as if it’s the only one being executed. The intermediate state of a transaction should not be visible to other transactions.
- Durability: Durability ensures that once a transaction has been committed, it will remain, even in the event of power loss, crashes, or other system errors. In other words, the results of a transaction are permanent.
Now, how does MVCC (MultiVersion Concurrency Control) relate to the ACID properties? Here’s the connection:
In the context of MySQL (and more specifically its InnoDB storage engine), MVCC provides isolation and consistency.
Isolation is ensured as each transaction works with its snapshot of the data, isolated from the changes made by others. Even if multiple transactions are trying to read and write the same data simultaneously, each will see its consistent snapshot, as if it’s the only transaction happening.
Consistency is maintained thanks to the use of undo logs in MVCC. If a transaction fails or is rolled back, the changes made within that transaction can be undone to ensure the database remains in a consistent state. Furthermore, by creating a transaction-specific view of the data, MVCC ensures that the transaction always works with a consistent set of data.
Topic: Snapshot Read and Current Read
In MySQL, there are two main types of reads that are utilized when MVCC (MultiVersion Concurrency Control) comes into play: snapshot read and current read. Let’s dive into these concepts.
Snapshot Read
A snapshot read, as the name suggests, provides a consistent snapshot of the data as it was when the transaction started. It doesn’t see changes made by other concurrently executing transactions. This read is the default mode for SELECT
statements when not in LOCK IN SHARE MODE
or FOR UPDATE
modes. Snapshot read is crucial in providing the “consistent view” of data which is integral to the concept of MVCC.
Current Read
Unlike a snapshot read, a current read sees the latest committed data, including changes made by other transactions. Modes like SELECT…FOR UPDATE
and SELECT…LOCK IN SHARE MODE
use current reads. It’s also used when a query modifies data (like UPDATE
, INSERT
, DELETE
).
These two types of reads offer flexible ways of handling data in transactions. The use of snapshot read or current read depends on whether you want a transaction to see only the data as it was when the transaction began, or if it needs to see the latest data, including modifications made by other transactions.
Topic: Managing Deadlocks with MVCC
Now that we’ve explored the snapshot and current reads in MySQL’s MVCC, let’s understand another critical aspect of transaction handling — dealing with deadlocks.
A deadlock happens when two or more transactions mutually hold and request for locks, creating a cyclic dependency that can’t be resolved. Without intervention, these transactions could wait indefinitely, which is obviously not ideal.
MySQL handles deadlocks in MVCC by utilizing a wait-for graph. In layman’s terms, whenever transaction A waits for transaction B to release a row lock, an edge is added from A to B in the wait-for graph. If adding this edge creates a cycle, a deadlock is detected.
Upon detecting a deadlock, MySQL needs to resolve it. It does so by choosing a transaction as the ‘victim’ and rolling it back. In most cases, it chooses the one that has done the least amount of work, so less work is lost. After rolling back the victim transaction, the deadlock is resolved.
In MySQL, you can use SHOW ENGINE INNODB STATUS;
to get information about the most recent deadlock, which can aid in debugging.
Deadlock management, while mostly automatic, demands caution on the design and execution of transactions. It’s advisable to keep transactions as short as possible and commit them as soon as possible to minimize the chances of deadlocks.
Topic: Performance Implications of MVCC
Despite all the benefits that MultiVersion Concurrency Control underpins in MySQL, it’s important to recognize that MVCC isn’t without its performance trade-offs. Let’s delve into some of these:
- Disk Space: One of the main overheads of MVCC is increased disk space. Because MVCC keeps different versions of a row to provide isolated, consistent views to transactions, more disk space is required. This could be significant in heavy read-write mixed workloads.
- CPU and I/O Resources: The process of producing multiple versions of data, maintaining them, and removing unnecessary versions (purging) can put a burden on CPU and I/O resources.
- Locking Overhead: While MVCC reduces the need for locking, it does not eliminate it entirely, especially for write transactions (Inserts, Updates, Deletes). These locks add to the performance overhead.
- Increased Complexity: MVCC adds complexity to the database engine. It needs to manage multiple versions of data, handle undos, resolve conflicts, and clean up old versions. This complexity adds overhead to the overall performance.
When might you consider alternative methods to MVCC?
While MVCC offers excellent benefits regarding multi-user access, it might not be the perfect choice in specific scenarios. For instance, in cases where data is largely written once and read many times, like a logging application, it might be better to consider a storage engine like MyISAM which does not support MVCC.
Additionally, in applications where you want the absolute latest data, you can also look into other approaches since MVCC provides a “snapshot” of the data, not the latest version.
It’s necessary to understand the workload, performance expectations, and hardware resources available before making a decision regarding the use of MVCC.
Topic: Review and Assessments
Now that we’ve explored different aspects of MultiVersion Concurrency Control (MVCC) in MySQL, it’s time to recap some of the key takeaways:
- MVCC is a method used in database management systems to handle concurrent transactions.
- MVCC provides each transaction a ‘snapshot’ of the database, allowing multiple transactions to read (and write to) a single data item simultaneously, which dramatically improves the database’s performance and scalability.
- Snapshot and current read are two critical concepts in MySQL’s MVCC. Snapshot read gives a consistent view of the data as of the transaction start, and current read considers the latest committed data.
- The ACID properties in databases are closely related to MVCC, ensuring Atomicity, Consistency, Isolation, and Durability of transactions.
- MVCC helps in managing deadlocks in MySQL, which are situations where two transactions wait for each other to release resources.
- While MVCC offers several advantages, it’s not without some performance trade-offs, such as increased disk space and CPU usage.
These are some of the many aspects we’ve covered about MVCC in MySQL. With that, it’s now time to assess your understanding of the topic. Consider the following questions:
- How does MVCC improve concurrent transactions in MySQL?
- What’s the difference between snapshot read and current read in MySQL’s MVCC?
- How does MVCC in MySQL manage deadlocks?
- Can you explain the performance implications that come with using MVCC?
Reflect on these questions. You can write down your responses or simply think through them for self-evaluation.
- MVCC improves concurrent transactions by allowing multiple users to access the same row of a table at the same time, instead of locking the data entirely. Each transaction gets a “snapshot” of the data at the point the transaction began, ensuring a consistent state of the data is used throughout the transaction.
- Snapshot read and current read in MySQL’s MVCC are two different isolation levels for reading data. A snapshot read refers to the consistent read of the database state at the time the transaction starts, ensuring data consistency for that transaction. A current read, on the other hand, is a type of read where the most recent committed data (including those made by other transactions) is considered.
- MVCC manages deadlocks through a wait-for graph. When a transaction is waiting for a resource locked by another transaction, an edge is added to the wait-for graph. If adding this edge creates a cycle, a deadlock is detected. MySQL then chooses a transaction to be the ‘victim’ and rolls it back to resolve the deadlock.
- MVCC is associated with some performance trade-offs. It requires additional disk space to keep multiple versions of each row in a table. Additional CPU and I/O resources are used for managing these versions, handling undo operations, and cleaning up old row versions. The complexity of MVCC handling can also add to the overhead in the overall performance.
中文文章: https://programmerscareer.com/zh-cn/mysql-interview17/
Author: Wesley Wei – Twitter Wesley Wei – Medium
Note: If you choose to repost or use this article, please cite the original source.
Comments