Efficient Data Modification With Entity Framework Core

The rule of dumb tells that for optimal performance we need to make as few database requests as possible. This is especially relevant for insert and update scenarios, where we sometimes need to work with thousands of objects. Sending those objects to the database one by one is usually significantly slower than in a batch. (Spoiler - interactive examples of what's discussed below).

The number of possible cases is pretty small, but our options and best approach might differ significantly. So, the cases:

INSERT

For this case we have 2 possible optimizations

  • Insert multiple rows in a single statement (batch insert)
  • INSERT INTO ... SELECT ... construct, which inserts data from another table (bulk insert)

UPDATE

  • UPDATE ... WHERE / UPDATE JOIN - so called bulk update. Pay attention, this is not the conventional update where we want one value to be replaced with another one. This optimization makes the same change on multiple rows. And the gain is that it's performed in "one shot", without pulling the rows to application server.
  • DELETE ... - same idea for the delete case (bulk delete).

Upsert

Upsert means UPDATE or INSERT in a single statement. In cases when a table has more than 1 unique constraint (in addition to PK), a plain INSERT can fail on duplicate keys. In those cases, we usually want to ignore, replace, or combine some existing fields with new values. Most vendors support this capability, but use different syntax and provide slightly different features.

  • MERGE - SQL Server and Oracle. In fact this is the official standard. In its simplistic form, it lets to specify what to do WHEN MATCHED , i.e. when there is a unique key collision; and what to do WHEN NOT MATCHED, i.e. there is no collision so we can INSERT safely.
  • INSERT ... ON DUPLICATE ... - MySQL and Postgres. The syntax is much simpler but lets you handle only the most common cases (compared to the feature-packed MERGE).

Summary

Batch/bulk updates are usually an order of a magnitude faster than working with entities one by one. Optimizing those scenarios is usually a low hanging fruit improvement. Don't miss the opportunity! Here you can find the interactive examples.

Comments

Popular posts from this blog

Hierarchical Query in SQL with C# and Entity Framework Core

SQL: SELECT DISTINCT on Some Columns