Cross-shard UPDATE and DELETE
UPDATE and DELETE statements that provide none or more than one sharding key in the WHERE clause are cross-shard and will be sent to all shards concurrently.
For example, assuming the users table is sharded on the id column, filtering rows by any other column produces a cross-shard query:
Consistency
Much like cross-shard INSERT statements, any updates to multiple rows on multiple databases outside a two-phase transaction are not guaranteed to be atomic. It's always best to send updates inside a transaction, like so:
BEGIN;
UPDATE orders SET delivered_at = now() WHERE delivered_at IS NULL;
DELETE FROM orders WHERE user_id = $1;
COMMIT;
Both DELETE and UPDATE statements follow the same rules and work largely the same, with the notable exception of sharding key updates.
Sharding key updates
When the sharding key of a row is updated, its sharded mapping is no longer valid. Queries searching for that row will be sent to the wrong shard, causing data loss. To avoid this, PgDog supports moving the row between shards automatically.
How it works
Taking the same users table as an example (sharded on the id column), a statement mutating the sharding key will, frequently, take the following form:
When a client sends such a statement, PgDog will rewrite it into three statements and execute them automatically:
SELECTstatement to fetch the row from the databaseINSERTstatement to create that row on the new shard, with updated valuesDELETEstatement to remove the now stale row from the old shard
The entire exchange looks like this:
-- Old `id` value.
SELECT * FROM users WHERE id = $1;
-- New `id` value, with columns retrieved by the previous query.
INSERT INTO users (id, email, created_at)
VALUES ($1, $2, $3);
-- Old `id` value.
DELETE FROM users WHERE id = $1;
Transaction required
Since PgDog needs to execute multiple statements across different databases, a transaction is required to maintain data consistency (as per MVCC). PgDog expects all sharding key updates to be executed inside a transaction, like so:
If the application doesn't start a transaction, PgDog will return an error and abort the request.
Configuration
This feature is disabled by default and can be enabled with configuration in pgdog.toml:
Updating multiple rows
While multi-row updates are supported, changing multiple rows' sharding keys is not. If the UPDATE statement's WHERE clause matches more than one row, PgDog will abort the transaction and return an error.
This check happens early in the transaction and will not create partial updates to the database.
Efficiency
It's common practice for ORMs, like ActiveRecord or SQLAlchemy, to mutate entire rows when saving records. Take the following example:
While the sharding key (id) is technically updated in the query, its value doesn't change. To avoid unnecessary overhead, PgDog performs multiple checks on the new row before performing the three statement sharding key update flow:
- If the value of the sharding key parameter (
SET id = $1) is the same as it is in theWHEREclause (WHERE id = $1), the query is routed directly to the shard without modifications. - If the values are different, or the
WHEREclause doesn't specify the sharding key, PgDog will check the value of the key returned by the firstSELECTstatement in the flow. If both keys map to the same shard, PgDog will send the query directly to that shard, without modifications.
Updating the sharding key isn't a frequent operation and both of these mechanisms ensure that the more expensive algorithm isn't triggered unnecessarily.