Working hard at the Till |
This means potential concurrency issues. Now what sort of problems could we expect if we just built this system and never thought about concurrency?
Problem number 1: Dirty Reads
Trent Reznor - how many copies left? |
Problem number 2: Non-repeatable reads
Eric Saade is Popular! |
The only enhancement you could make here is to block the other user from making the changes until you're finished. This would mean you'd lock the rows you're reading. Because this would slow things down, it really depends on what your system needs to determine if this approach makes sense or not.
Problem number 3: Phantom reads
Andy Weatherall before the smoking ban! |
You decide to check again in the same transaction but before you do that, the inventory guy got a shipment with more of Andy's stuff. He has added it to the system and committed his transaction. When you re-read you get back the same as above but you also get back the new additions:
- Fabric19 - Andrew Weatherall
- Andrew Weatherall 9 O' Clock Drop
- and Andrew Weatherall's essential mix from 1993 which just got released on CD!
Again, this is probably not the riskiest thing in the world - at least you're being told the truth! The solution would again involve blocking the second transaction until you have finished your transaction completely. As was the case with non - repeatable reads, it really depends on your system requirements (and how probable this problems will happen) to decide if it is actually worth going to the hassle of locking rows.
Isolation levels
In databases, "isolation level" is a property which defines how much concurrent transactions can see or can't see of each other. Isolation is the 'I' in ubiquitous ACID database acronym coined by
Andreas Reuter and Theo Harder in 1983. Isolation level is a configurable property in most databases. So what can it be set to? Well there are usually four settings, namely:
Serializable, Repeatable Reads, Read Committed, Read uncommitted
The table below shows the differences.
Isolation level | Dirty reads | Non-repeatable reads | Phantoms |
---|---|---|---|
Read Uncommitted | Y | Y | Y |
Read Committed | N | Y | Y |
Repeatable Read | N | N | Y |
Serializable | N | N | N |
Y means that the problem can happen, N means it can't
My recommendation is set to Read Committed. The reason why is because serializable can be over protective and will impact performance because it incurs a lot of locking. Repeatable reads only protect you when you re-read the same data in a transaction. This is something that is uncommon in most systems because too much re-reading usually results in bad performance. You also need to consider if your transactions are generally long or short. If you don't have long running transactions it's less likely you are going to have collisions and hence you may not have to be so strict in how isolated the transactions need to be.
Dirty reads can be really problematic. They are not only misleading but produce unusual problems that are difficult to diagnose. This is because the incorrect data you got was never even persisted.
Therefore I would advise to set to at least read committed to eliminate dirty reads. Only go for a more stronger setting if your system needs it and if the problems such as phantom reads are probable.
As stated, more protection means more locking. The more locking means the slower your system will go as users get blocked. Details of the locking are in the table below.
Isolation level | Write Lock | Read Lock | Range Lock |
---|---|---|---|
Read Uncommitted | N | N | N |
Read Committed | Y | N | N |
Repeatable Read | Y | Y | N |
Serializable | Y | Y | Y |
Y means locking is incurred. N means it is not.
Is that all I need to do?
You wish! Isolation levels for concurrent transactions deal only with concurrent transactions. It is much more likely you'll have concurrent usage of the same data even though you don't have concurrent transactions on the same data. For example:
- User 1 reads in the data at the beginning of his session
- User 2 reads in the same data at the beginning of his session. User 2's session is independent of User 1's session.
- User 2 then makes some changes and commits change. His transaction is finished.
- User 1's data has become stale, before he even starts a transaction. He begins his transaction after User 2 has committed User 2's transaction. But User 1's data is stale. Surely he should have to respect the most up to date version of the data before making changes!
No comments:
Post a Comment