At 8:40 a.m. on a Monday, our support inbox lit up with the same complaint: “I got the payment reminder, but my receipt still shows pending.” The push had arrived. The data sync had not. By lunch, product called it a notification bug. It was actually a background work design bug, and Android was doing exactly what we asked, not what we intended.
This is a practical Android 16 runbook for teams that need background task reliability without draining battery or abusing system privileges. The core idea is simple: let push wake the app only when needed, do the minimum immediate work, and hand off durable work to WorkManager with explicit quota behavior.
If this sounds close to your stack, you might also like our guides on Android startup performance under real-world conditions, mobile deep linking fallbacks, queue discipline on the backend, and idempotent event ingestion in PHP.
Why teams still lose reliability on modern Android
Most failures come from one of three assumptions:
- “A high-priority push guarantees full processing.” It does not. Firebase documents a short processing window in
onMessageReceived. If you try to do heavy network work there, you risk being cut off mid-flight. - “If the device sleeps, the sync will run anyway.” Under Android Doze mode, background network and jobs are constrained. Non-urgent work gets deferred.
- “Expedited means unlimited fast lane.” WorkManager expedited jobs are quota-limited and can fall back depending on your out-of-quota policy.
So the goal is not to “beat” Android power management. The goal is to design a pipeline that remains correct under those constraints.
The architecture that works in production
Use a two-stage model:
- Stage 1, immediate UX: In
onMessageReceived, parse payload, validate minimum fields, and show a user-visible notification quickly when appropriate. - Stage 2, durable sync: Enqueue a unique WorkManager job for any network or disk-heavy reconciliation.
This design maps directly to current platform guidance. Firebase recommends immediate handling in the message callback and using WorkManager for additional processing. Android’s Doze/App Standby guidance also favors FCM for wake-up events instead of app-owned persistent sockets.
Code pattern #1, keep push callback short and intentional
class AppMessagingService : FirebaseMessagingService() {
override fun onMessageReceived(message: RemoteMessage) {
val orderId = message.data["order_id"] ?: return
val eventId = message.data["event_id"] ?: return
// 1) User-visible signal first (only if event is actually user-facing)
NotificationRenderer.showOrderUpdate(
context = this,
orderId = orderId,
title = message.notification?.title ?: "Order update",
body = message.notification?.body ?: "Tap to refresh status"
)
// 2) Durable sync handoff
val input = workDataOf(
"order_id" to orderId,
"event_id" to eventId
)
val work = OneTimeWorkRequestBuilder<OrderSyncWorker>()
.setInputData(input)
.setExpedited(OutOfQuotaPolicy.RUN_AS_NON_EXPEDITED_WORK_REQUEST)
.addTag("push-sync")
.build()
WorkManager.getInstance(this).enqueueUniqueWork(
"order-sync-$orderId",
ExistingWorkPolicy.KEEP,
work
)
}
}
Tradeoff: RUN_AS_NON_EXPEDITED_WORK_REQUEST protects correctness when quota is exhausted, but latency can increase. For hard real-time flows, you may choose DROP_WORK_REQUEST and rely on user pull-to-refresh, but only if your product explicitly accepts that behavior.
Code pattern #2, make the worker resumable and idempotent
class OrderSyncWorker(
appContext: Context,
params: WorkerParameters
) : CoroutineWorker(appContext, params) {
override suspend fun doWork(): Result {
val orderId = inputData.getString("order_id") ?: return Result.failure()
val eventId = inputData.getString("event_id") ?: return Result.failure()
return try {
// Backend should treat event_id as idempotency key
api.reconcileOrder(orderId = orderId, eventId = eventId)
cache.markOrderFresh(orderId)
Result.success()
} catch (e: IOException) {
Result.retry()
} catch (e: HttpException) {
if (e.code() in 500..599) Result.retry() else Result.failure()
}
}
}
fun enqueueColdSync(context: Context) {
val constraints = Constraints.Builder()
.setRequiredNetworkType(NetworkType.CONNECTED)
.build()
val request = OneTimeWorkRequestBuilder<OrderSyncWorker>()
.setConstraints(constraints)
.setBackoffCriteria(
BackoffPolicy.EXPONENTIAL,
30, TimeUnit.SECONDS
)
.build()
WorkManager.getInstance(context).enqueue(request)
}
The worker matters more than the callback. If retries are unsafe or duplicate writes are possible, your “reliability fix” just moves the bug from delivery to data integrity.
Server payload discipline, where many Android bugs actually start
For truly urgent, user-visible updates, use FCM high priority messages sparingly and include enough payload for immediate rendering. For non-urgent refreshes, use normal priority and let Android deliver during maintenance windows. Sending everything as high priority can trigger deprioritization over time if behavior suggests messages are not genuinely user-visible.
{
"message": {
"token": "DEVICE_TOKEN",
"notification": {
"title": "Payment received",
"body": "Tap to view your updated receipt"
},
"data": {
"order_id": "ord_7843",
"event_id": "evt_2026_04_26_001"
},
"android": {
"priority": "high",
"ttl": "300s"
},
"fcm_options": {
"analytics_label": "receipt_update"
}
}
}
Tradeoff: Short TTL protects users from stale notifications but can reduce eventual delivery during connectivity issues. Longer TTL improves eventual delivery, but risks “late and wrong” UX for time-sensitive events.
Measure the pipeline end to end, not just send success
One operational mistake I still see is celebrating “message accepted” as if that equals “user got consistent state.” Firebase delivery docs separate accepted, received, impressions, and opens, and those signals are not interchangeable. On Android, the FCM Data API also exposes aggregated delay and drop patterns, including priority-lowered behavior, but the data is aggregated and delayed, so it is a trend tool, not a per-message debugger.
In practice, pair three metrics in the same chart: (1) provider acceptance, (2) worker completion within your target window, and (3) user-visible freshness at screen open. When those three move together, on-call debugging becomes faster and product discussions become less emotional, because the team can see exactly whether the problem is send classification, OS scheduling, or backend reconciliation.
Troubleshooting, when push arrives but data is stale
1) Notification shown, but detail screen still old
Likely cause: heavy async work in onMessageReceived that never completed. Fix: move all non-trivial work to WorkManager and keep callback minimal.
2) Delays increase on devices left idle overnight
Likely cause: relying on normal-priority traffic for urgent use cases during Doze windows. Fix: reclassify only truly urgent events to high priority, keep the rest normal.
3) Some events never reconcile when traffic spikes
Likely cause: expedited quota exhaustion plus drop policy. Fix: switch to RUN_AS_NON_EXPEDITED_WORK_REQUEST unless strict latency is non-negotiable.
4) Duplicate backend writes after retries
Likely cause: non-idempotent server endpoint. Fix: make event_id an idempotency key server-side and return deterministic responses for repeats.
FAQ
Do expedited jobs bypass Android battery restrictions completely?
No. They are less likely to be delayed than ordinary work, but they are still governed by system load and app quota. Expedited is a priority hint, not a bypass switch.
Should every high-priority FCM message create an expedited worker?
Not always. If the notification is already complete from payload and no immediate reconciliation is needed, skip extra work. Reserve expedited work for user-visible flows where stale data causes real confusion.
Is requesting battery-optimization exemption a good fix?
Usually no. Android guidance treats exemption as a narrow exception for specific app classes. Most apps should get reliability by combining FCM, WorkManager, and clean retry/idempotency design.
Actionable takeaways you can ship this week
- Audit
onMessageReceivedand remove any network call that can move into WorkManager. - Adopt one explicit expedited out-of-quota policy and document why.
- Add idempotency keys to every push-triggered backend reconcile endpoint.
- Split event types into urgent and non-urgent lanes, with different priority and TTL defaults.
- Track delivery labels and worker success rates together, not as separate dashboards.
Reliable Android background execution in 2026 is less about tricks and more about honest system design. If you respect Doze, treat priority as a scarce resource, and keep sync flows idempotent, users see what they expect: the notification and the data matching each other when it matters.

Leave a Reply