The Tunnel-Switch Bug: A Mobile Development Playbook for ETag-Based Sync, Idempotency Keys, and Conflict-Safe Drafts

Last month, I watched a product demo fail in a very familiar way. The app looked perfect on office Wi-Fi. Then someone walked to the lift, the phone jumped from Wi-Fi to patchy LTE, and a “saved” expense draft quietly disappeared. The user didn’t rage. They just stopped trusting the app.

That moment is why mobile sync reliability is not a “background task” problem. It is a trust problem. If users can’t predict what happens to their edits when networks flap, they assume your app is unsafe.

In this guide, I’ll show the architecture we now use for high-confidence mobile sync in unreliable networks, built on three boring but powerful ideas: ETag conditional requests, idempotency keys, and explicit conflict resolution. This is not theoretical. It is what keeps drafts and actions stable when users switch networks, kill the app, or tap “Save” twice.

Why retries alone keep hurting mobile apps

Most teams already retry failed requests. The issue is that retries without a contract can duplicate writes, overwrite newer content, or return “success” for stale data. In other words, retries increase traffic but not correctness.

Three standards-backed signals matter here:

ETag + If-Match / If-None-Match: lets client and server agree on object version and freshness, with 304 Not Modified and 412 Precondition Failed behavior (RFC 9110 + MDN).
Idempotency-Key on write APIs: allows safe retry of the same logical operation without duplicate side effects, a pattern documented clearly by Stripe.
Crash-safe local queueing: if your local write buffer is fragile, network correctness won’t save you. SQLite WAL exists for a reason.

Tradeoff to accept early: this design adds server state (idempotency records), version bookkeeping (ETags), and conflict UX. It is extra complexity, but it moves complexity from user pain into engineering control.

The four contracts that make sync feel boring (in a good way)

1) Read contract: always carry validators

Fetch resources with ETags and cache them locally. On refresh, send If-None-Match. If unchanged, accept 304 and skip body parsing, battery use, and UI churn.

2) Write contract: every mutation has an operation identity

Generate one UUID per logical action (for example, “create expense draft #8f…”) and reuse it for retries. Do not generate a new key after timeout if you are retrying the same user action.

3) Concurrency contract: protect updates with If-Match

When updating an object you previously read, send the object’s ETag in If-Match. If someone else changed it, the server returns 412, and your app can show a merge screen instead of silent overwrite.

4) Local durability contract: queued ops survive app death

Persist unsent operations in SQLite, ideally WAL mode when your workload fits it. WAL improves writer/reader concurrency and crash resilience, but remember the operational tradeoffs from SQLite docs (checkpoint behavior, file growth, and environment constraints).

Android client example: conditional sync + idempotent writes

This stripped-down Kotlin sample shows the core pattern. It avoids magic SDK assumptions and keeps behavior explicit.

data class PendingOp(
    val opId: String,          // UUID per logical action
    val endpoint: String,
    val method: String,
    val bodyJson: String,
    val ifMatch: String? = null,
    val createdAtMs: Long = System.currentTimeMillis()
)

class SyncApi(private val client: OkHttpClient, private val baseUrl: HttpUrl) {

    suspend fun fetchExpense(id: String, cachedEtag: String?): Pair<Int, String?> {
        val req = Request.Builder()
            .url(baseUrl.newBuilder().addPathSegments("v1/expenses/$id").build())
            .header("Accept", "application/json")
            .apply {
                if (!cachedEtag.isNullOrBlank()) header("If-None-Match", cachedEtag)
            }
            .build()

        client.newCall(req).execute().use { res ->
            val newEtag = res.header("ETag")
            return res.code to newEtag
        }
    }

    suspend fun flush(op: PendingOp): Response {
        val req = Request.Builder()
            .url(baseUrl.resolve(op.endpoint)!!)
            .method(op.method, op.bodyJson.toRequestBody("application/json".toMediaType()))
            .header("Content-Type", "application/json")
            .header("Idempotency-Key", op.opId)
            .apply {
                if (!op.ifMatch.isNullOrBlank()) header("If-Match", op.ifMatch)
            }
            .build()

        return client.newCall(req).execute()
    }
}

// Handler idea:
// 2xx: mark op delivered
// 412: fetch latest, open conflict UI
// 5xx/timeout: retry same opId with backoff

Important behavior detail: retry with the same opId only for the same logical action. New user action, new key.

Server example: enforce idempotency and precondition checks

Here is a practical Node/Express sketch. Production systems should add auth, tenant scoping, and stronger transaction boundaries, but the control points are the same.

app.put('/v1/expenses/:id', async (req, res) => {
  const idempotencyKey = req.header('Idempotency-Key');
  const ifMatch = req.header('If-Match');
  const userId = req.auth.sub;

  if (!idempotencyKey) {
    return res.status(400).json({ error: 'Idempotency-Key is required' });
  }

  await db.tx(async (trx) => {
    const idem = await trx.oneOrNone(
      `select status_code, body_json
         from api_idempotency
        where user_id = $1 and key = $2`,
      [userId, idempotencyKey]
    );

    if (idem) {
      return res.status(idem.status_code).json(idem.body_json);
    }

    const current = await trx.oneOrNone(
      `select id, amount, note, version
         from expenses
        where id = $1 and user_id = $2`,
      [req.params.id, userId]
    );

    if (!current) {
      const body = { error: 'Not found' };
      await trx.none(
        `insert into api_idempotency(user_id, key, status_code, body_json)
         values($1,$2,404,$3::jsonb)`,
        [userId, idempotencyKey, JSON.stringify(body)]
      );
      return res.status(404).json(body);
    }

    const expectedEtag = `"expense-${current.id}-v${current.version}"`;
    if (ifMatch && ifMatch !== expectedEtag) {
      const body = {
        error: 'Precondition failed',
        latestEtag: expectedEtag,
        latest: current
      };
      await trx.none(
        `insert into api_idempotency(user_id, key, status_code, body_json)
         values($1,$2,412,$3::jsonb)`,
        [userId, idempotencyKey, JSON.stringify(body)]
      );
      return res.status(412).json(body);
    }

    const next = await trx.one(
      `update expenses
          set amount = $1,
              note = $2,
              version = version + 1,
              updated_at = now()
        where id = $3 and user_id = $4
      returning id, amount, note, version`,
      [req.body.amount, req.body.note, req.params.id, userId]
    );

    const body = { ok: true, expense: next };
    await trx.none(
      `insert into api_idempotency(user_id, key, status_code, body_json)
       values($1,$2,200,$3::jsonb)`,
      [userId, idempotencyKey, JSON.stringify(body)]
    );

    res.set('ETag', `"expense-${next.id}-v${next.version}"`);
    return res.status(200).json(body);
  });
});

Key tradeoff: storing every idempotency response forever is expensive. Most teams keep them for a bounded retention window (for example, 24-72 hours) and prune with a scheduled job.

Where this fits with your existing stack

If you’ve already followed our deep-link reliability and push reliability patterns, this sync model complements them nicely. Deep links get users to the right screen, push wakes can trigger refresh, and this contract keeps data honest once network reality hits.

A rollout plan that does not require a full rewrite

You do not need to freeze product work for a quarter to get this right. The least painful path is incremental:

Pilot one high-value write path (for example, “save draft”) with idempotency keys and server-side replay.
Add ETag preconditions to that same resource so overwrite bugs surface as controlled 412 flows.
Move from “best effort” to policy: block new mutable endpoints from shipping without idempotency and precondition rules.

This sequence matters. Teams that start with a giant platform rewrite usually stall. Teams that harden one path, publish internal patterns, and then enforce templates tend to finish.

What to measure so you know trust is improving

If your dashboards only show request success rate, you will miss the actual problem. Track these instead:

Duplicate-operation suppression rate: how many retries were safely deduped by idempotency keys.
Precondition failure rate: percentage of writes returning 412, split by app version.
Conflict resolution completion: users who saw conflict UI and successfully finished merge.
Offline queue age distribution: oldest undelivered operation by cohort (helps catch stuck sync loops).
User trust signal: support tickets tagged “lost changes” per 10k active users.

Tradeoff warning: when you first ship proper preconditions, your visible error rate might go up, because hidden overwrites become explicit conflicts. That is healthy. It means you are measuring reality, not hiding it.

Troubleshooting: when sync still misbehaves

Symptom: duplicate records after flaky network retries

Likely cause: key regenerated per retry, or idempotency key not scoped to user/action.
Fix: create key once per user action, persist locally before first send, include key in retry telemetry.

Symptom: users report “my edit vanished” after reconnect

Likely cause: blind overwrite without If-Match.
Fix: require If-Match on mutable resources and return merge payload on 412.

Symptom: local DB grows and app slows after long offline periods

Likely cause: queue cleanup/checkpoint policy missing.
Fix: expire delivered operations, cap retry age, and monitor SQLite WAL/checkpoint metrics.

FAQ

1) Should I send Idempotency-Key on GET requests?

No. GET is already idempotent by HTTP semantics. Use idempotency keys on mutation endpoints where retries can duplicate side effects.

2) Is ETag enough for conflict resolution UX?

ETag detects conflicts, but UX still needs a strategy, like field-level merge, “keep mine/keep latest,” or explicit diff. Detection and resolution are separate concerns.

3) Does SQLite WAL automatically solve offline queue reliability?

Not by itself. WAL improves durability and concurrency characteristics, but you still need queue lifecycle rules, checkpoint awareness, and crash-recovery tests.

Actionable takeaways for this week

Pick one write endpoint and enforce Idempotency-Key end-to-end, including retry logs.
Add If-Match to one update path and intentionally test a conflict to validate your 412 UX.
Store operation IDs locally before network send, not after response.
Define idempotency retention policy (TTL + cleanup job) before scale forces emergency fixes.
Run a “tunnel test”: start edit on Wi-Fi, switch to cellular, force-close app, reopen, verify no duplicate and no lost edit.

If you do just those five, your sync layer will stop feeling like a mystery and start feeling like infrastructure.

7Tech – Programming and Tech Tutorials