Summary: ORM tools like Objection.js allow you to write database transactions in code. This allows code logic to exist between multiple queries of a transaction. But this has the potential of slowing down the database significantly.
Transactions Via ORM
Frameworks like Objection.js allow us to start transactions via the code. This gives the developer immense power, as it allows the logic between 2 queries to be written in code instead of SQL.
Here is a sample transaction —
await OrderModel.transaction(async transaction => {
const highValueOrder = await OrderModel.query(trx)
.findOne({
type: ORDER_TYPE.HIGH_VALUE,
userId: userId
})
.forUpdate()
if(highValueOrder)
throw new Error('a user can have only one high value order')
return OrderModel.query(trx)
.insert({
userId: userId,
type: ORDER_TYPE.HIGH_VALUE,
productId: ...
<remaining details here>
})
});
The developer decided to write the code above in a transaction, instead of relying on database constraints and where clauses. For additional safety, this code also has a forUpdate
clause. This would lock the table and not allow another parallel thread to insert a high value order.
For the purpose of this article, I’ve intentionally taken an example which could have been been solved by adding proper constraints or a where clause. Real life could be a lot more complex, and the alternatives not so obvious.
How is the transaction performance measured?
To measure the transaction performance, I enabled logs on my local postgres, and got logs similar to the ones below. The logs I’ve created are fake, but are based on how they actually look in postgres —
05:08:56.867 UTC [1411] LOG: statement: BEGIN;
05:08:56.867 UTC [1411] LOG: duration: 0.070 ms
05:08:56.869 UTC [1411] LOG: duration: 0.061 ms parse <unnamed>: select * from "orders" where "type" = $1 and "user_id" = $2
05:08:56.869 UTC [1411] LOG: duration: 0.071 ms bind <unnamed>: select * from "orders" where "type" = $1 and "user_id" = $2
05:08:56.869 UTC [1411] DETAIL: parameters: $1 = 'HIGH_VALUE', $2 = '5121'
05:08:56.869 UTC [1411] LOG: execute <unnamed>: select * from "orders" where "type" = $1 and "user_id" = $2
05:08:56.869 UTC [1411] DETAIL: parameters: $1 = 'HIGH_VALUE', $2 = '5121'
05:08:56.869 UTC [1411] LOG: duration: 0.130 ms
<....remaining statements>
05:08:56.873 UTC [1411] LOG: duration: 0.094 ms
05:08:56.908 UTC [1421] LOG: duration: 0.288 ms
05:08:56.910 UTC [1411] LOG: statement: COMMIT;
05:08:56.911 UTC [1411] LOG: duration: 0.907 ms
In my real life work, a transaction slightly more complex than the one mentioned above took a long time to execute — 30–40 ms, which went upto 300ms when 2 of them were executing in parallel. This blog attempts to analyse why writing a transaction in code using an ORM can hit the performance.
Why is this transaction slow?
This example assumes Read-Commit Isolation.
To explain why transactions are significantly slower through code (in this case Objection.js), let’s use the same transaction mentioned above —
await OrderModel.transaction(async transaction => {
const highValueOrder = await OrderModel.query(trx)
.findOne({
type: ORDER_TYPE.HIGH_VALUE,
userId: userId
})
.forUpdate()
if(highValueOrder)
throw new Error('a user can have only one high value order')
return OrderModel.query(trx)
.insert({
userId: userId,
type: ORDER_TYPE.HIGH_VALUE,
productId: ...
<remaining details here>
})
});
Let’s list the steps of the transaction so we can analyse it —
Let’s assume that every query takes 1 ms, Code to DB latency is 10 ms, and every piece of code logic in the transaction takes 1 ms. So the total time the transaction takes is 25 ms, as described below. It’s not the queries (or the logic) that slow the transaction, but the latency -
To understand why transactions become slower when 2 of them execute in parallel, let’s look at the SELECT
query again. It is a SELECT FOR UPDATE
query. A SELECT FOR UPDATE
query waits for another SELECT FOR UPDATE / UPDATE
query to finish if they’re affecting the same rows. The sample transaction mentioned above needs it to avoid race conditions.
This means that if there are 2 select queries running in parallel that are going to affect the same row/rows, one transaction waits for the entire other transaction to finish before moving beyond the SELECT FOR UPDATE line.
This is shown in the diagram below —
In the example above, it is unlikely that the user will try to create a high-value order from 2 different places at the same time, but in real life scenarios these situations are common.
As mentioned in the beginning, these transactions would finish in 30–40 ms on the local system, but slow down significantly 2–3 were running in parallel, upto 10 times in some cases.
On servers, the latency would be even higher, resulting in worse transaction performance.
Summary of the bottlenecks
In summary, there are 2 bottlenecks. The transactions that take the most time are those that face both bottlenecks together -
- For update for 1 transaction blocks the other.
- The back and forth between the code and the transaction takes up a significant amount of time.
What is / are the solutions?
The solutions mentioned below are only valid in scenarios that allow it. Sometimes the scenario is so specific that none of these solutions are applicable.
- Remove the transaction: This is the ideal solution, and removes bottleneck number 1 above. Try working with database constraints to remove the additional select query (which is the best solution in the example above). If constraints don’t solve the problem, sometimes it can be done by either using a single query to achieve the job (which is simple for the example above), or writing multiple queries in a way to avoid overwriting results of a parallel query.
- Move the entire transaction into a single raw sql query: This removes bottleneck number 2 above. Put all the statements including the code logic into SQL. Most ORMs have a mechanism to execute raw queries. This way, there is no latency between code and database, and thus the performance is faster. This results in faster performance, but less readable code. Therefore it should be used when the first solution is not possible.
- Make Transactions Non-Blocking: In the example above, this means removing the ‘For Update’ from the select statement. This also removes bottleneck number 1 mentioned above. This is a good solution, but its important to figure out in which scenarios it works. In the above example, removing this would result in a race condition.
- Parallel Queries: This is not applicable to the example above. If the queries in the transaction are independent of each other, trigger them in parallel from the code, and wait for them together using Promise.all. While they will run serially at the database level, the wait due to code latency is reduced. The first example on this blog gives an example — link.
Summary
As with any other part of the code, performance tuning requires performance testing, measurement, and isolation of bottlenecks. Often the biggest bottleneck is just a small piece of code which can be tuned with specific changes in the design.
SQL can accomplish fairly complex tasks, and often large operations can be performed without blocking transactions. But if blocking transactions are absolutely unavoidable, first see if they can be written in a Raw SQL block, and if not, use them sparingly.
ORMs provide nice interfaces for developers to use, but sometimes the developer needs to peer under the hood of the framework to understand how it works in order to understand and solve specific issues.