How update multiple rows with different values using Repo.update_all
3 minutes to readSpecial Announcement
Hey folks, I launch my first Elixir/Phoenix LiveView course. In this course I am teaching you how to create the Wordle Game with Elixir and Phoenix LiveView. In this course I teach you about elixir, TDD, how to organize your project, how to create and use function components, how to create and use animations with TailwindCSS and Phoenix LiveView and in the end you will have a complete functional wordle game to show to your friends.
The course is with a special offer with 50% of discount. Go here and check it out:
https://indiecourses.com/catalog/54c9e6b0-f39e-43a5-b775-a0de3f634b58
A little bit about batch updates
Batch updates are awesome and super fast and it is the way to go when you need to update multiple rows and you know all of them can be updated with the same value. Let’s see an example. Imagine you have a shopping cart, with that you will probably have a business process that can remove items from carts that are forgotten there for more than a week or a month. There are multiple ways of doing this, the naive way would be to do something like the following:
two_weeks_ago = DateTime.add(DateTime.utc_now(), -14, :day)
forgotten_carts = Repo.all(from carts in Cart, where: cart.last_visited_at < ^two_weeks_ago)
Repo.transaction(fn ->
forgotten_carts
|> Enum.map(fn cart ->
Repo.update(cart, %{items: []})
end)
end)
Ok, something like that. I didn’t think about the details and for this example we really don’t care. The fact is that there is a better way to do this, instead of updating one by one we can send to the database a command to update all that matches the query.
We do this by using the update_all
function like this:
two_weeks_ago = DateTime.add(DateTime.utc_now(), -14, :day)
Repo.update_all(from carts in Cart, where: cart.last_visited_at < ^two_weeks_ago, [set: [items: []]])
This is way better and you are sending just one command to the database. But, what happens if your problem is slightly different than this. If you now need to update a lot of records but for each record the value that you need to update is different. What would you do now? Well, the most common scenario is to actually update the records one by one. You will probably open a transaction and issue a single update for each record that you need to update.
But calling Repo.update
for each record, although it is ok is not the only way to do it. There is another way of doing this that is faster and by doing it you will leverage all the niceties that postgres can give you. We can use the from
clause when issuing an update.
I will show you an example using SQL so you can understand a little bit better before we move to the elixir with ecto example.
With Postgres we can use the update...from
to issue an update for one or multiple fields where each row needs to be updated with a different value.
So, imagine you have an employees
table. Here it is the table:
Employees
- id: integer
- name: varchar
- employee_identification: varchar (this field is unique, can't have repeated values)
- contract_id: integer (this is a foreign key to a contracts table)
You have a lot of employees but they do not have the contract id set. This is a new field and you need to update all of them with the correct contract. In this weird example someone gave you a CSV report with two columns:
employee_identification, contract_id
1HEDH7, 1
ABCDE8, 2
QWERT5, 3
ZXDVD2, 4
...
Now you need to update the contract_id
in the employees
table. With Postgres you can get the information in the file and transform it into a list of tuples and use a SQL query like this:
update employees as employees set
contract_id = report.contract_id
from (values
('1HEDH7', 1),
('ABCDE8', 2),
...
) as report(employee_identification, contract_id)
where report.employee_identification = employees.employee_identification;
With this you will issue an update that will update all the rows in the employees
table with the correct contract_id
that exists in each row of the report.
This query is updating each employee contract id that was filtered by the employee identification in the where
clause. One thing you need to make sure is that the values in the from
clause need to be unique. In this case each value should be for a different employee_identification
otherwise the database will select one of the rows to update the employees
table. In other words the value from the from
list used in the where
clause should be unique in the list of values.
If, instead of receiving a CSV report you just received the task to get the data from another table to the employees
table your query would be like this:
update employees as employees set
contract_id = report.contract_id
from (
select identification, contract_id from old_employees_table
) as report(employee_identification, contract_id)
where report.employee_identification = employees.employee_identification;
Now you know what we need to do, but we need to do with elixir and Ecto.
How to update multiple rows with update…from clause with Ecto
With ecto you can’t issue a update...from
clause, you need to use join
with subqueries instead.
I will get the same example. You have an old employees table with employee_identification
and contract_id
and you need to update the new employees
table setting the correct contract_id
for each employee.
For this example we have an old employee schema:
OldEmployee:
- employee_id
- contract_id
With Ecto you would do something like this
old_table_query = from old_employee in OldEmployee
query =
from employees in Employee,
join: values in subquery(old_table_query),
on: values.employee_id == employees.employee_identification,
update: [
set: [{:contract_id, values.contract_id}, updated_at: ^DateTime.utc_now()]
]
Repo.update_all(query, [])
What you are doing here is joining another table and for each row in that table you are setting the respective contract id. It seems way easier with Ecto, right?
Yes, it is that simple, so now you know how to update multiple records issuing only on update...from
to the database.
I hope you like it.