The Startup That Looked Fast on Wi-Fi: An Android 16 Playbook for Macrobenchmark, Baseline Profiles, and Real-World ANR Prevention

Last month, one of our Android builds looked excellent in the office. On Pixel test phones connected to fast Wi-Fi, cold launch felt snappy. The release went out on Friday evening. By Saturday noon, support tickets started landing: “App freezes after tapping notification,” “white screen on first open,” “sometimes asks me to close app.”

Nothing crashed. Our dashboards were green. But Play Console started surfacing user-perceived ANRs, and the pattern was obvious in hindsight. We optimized for “developer desk startup,” not for real startup paths on mid-range phones, warm storage, and background contention.

This is the runbook we now use for Android startup performance in production. It combines three pieces that work better together than alone: Macrobenchmark for repeatable timing, Baseline Profiles for first-run code optimization, and Perfetto tracing for root-cause analysis when numbers drift.

The hallway test that changed our launch process

Before talking tools, here is the mindset shift. We stopped asking, “Is startup fast?” and started asking:

Is startup fast on a throttled phone after a fresh install?
Is startup still responsive when push handling runs in parallel?
Can we explain every extra 300 ms in TTID/TTFD?

If you have read our reliability pieces on mobile push delivery failure modes or backpressure and graceful shutdown in workers, this will sound familiar: production behavior is almost always about load shape, not ideal-path averages.

What the platform signals actually mean

According to Android documentation, ANR can occur when input dispatch times out (commonly the 5 second path), and user-perceived ANR rate is a core Play vital. Google also publishes bad-behavior thresholds for discoverability pressure (overall and per-device). The exact numbers may evolve, but the product takeaway is stable: startup and UI thread stalls are not just UX debt, they can become growth debt.

For startup, focus on two metrics:

TTID (time to initial display): first frame appears.
TTFD (time to fully drawn): the screen is genuinely usable.

Chasing TTID only can mislead you. A fast first frame with blocked interactivity still feels broken.

Step 1: lock measurement with Macrobenchmark (not ad hoc stopwatch tests)

Macrobenchmark gives you repeatable startup measurements and trace artifacts you can diff in CI. We run startup benchmarks against a benchmark build type that mirrors release behavior (minified/shrunk), not debug defaults.

// app/build.gradle.kts (excerpt)
plugins {
    id("com.android.application")
    kotlin("android")
    id("androidx.baselineprofile")
}

android {
    buildTypes {
        getByName("release") {
            isMinifyEnabled = true
            isShrinkResources = true
        }
        create("benchmark") {
            initWith(getByName("release"))
            signingConfig = signingConfigs.getByName("debug")
            matchingFallbacks += listOf("release")
        }
    }
}

dependencies {
    implementation("androidx.profileinstaller:profileinstaller:1.4.1")
    androidTestImplementation("androidx.benchmark:benchmark-macro-junit4:1.4.1")
}

Then define a startup benchmark that reflects user journeys you care about, not just opening the launcher activity and stopping there.

@RunWith(AndroidJUnit4::class)
class StartupBenchmark {
    @get:Rule
    val benchmarkRule = MacrobenchmarkRule()

    @Test
    fun coldAndWarmStartup() = benchmarkRule.measureRepeated(
        packageName = "co.in.seventech.app",
        metrics = listOf(StartupTimingMetric()),
        iterations = 10,
        startupMode = StartupMode.COLD
    ) {
        pressHome()
        startActivityAndWait()

        // Simulate first interaction to catch "looks loaded but frozen" flows.
        device.waitForIdle()
        device.findObject(By.res("co.in.seventech.app:id/home_feed")).fling(Direction.DOWN)
    }
}

Tradeoff: benchmarks take setup effort and test devices can be noisy. But without this discipline, teams usually regress silently between releases.

Step 2: ship Baseline Profiles for first-run wins

Baseline Profiles tell ART what to ahead-of-time compile for critical code paths. Google documentation says many apps see around 30% faster code execution on first launch scenarios, though your exact gains depend on architecture and hot paths.

In practice, Baseline Profiles helped us most when:

Cold starts were dominated by Kotlin/Jetpack code paths that were previously interpreted/JIT-compiled on device.
Feature modules loaded heavy dependency graphs during first session.
We regenerated profiles every release, not once and forgotten.

Tradeoff: profile generation belongs in release engineering. If you skip regeneration after structural changes, stale profiles can produce false confidence.

Step 3: use Perfetto when metrics move in the wrong direction

Macrobenchmark tells you that startup changed. Perfetto helps you see why. Android docs now frame Perfetto as the primary system tracing tool on modern Android versions. We use it to answer three questions quickly:

What blocked the main thread before first interactive frame?
Did binder calls or disk reads bunch up during launch?
Did background work (analytics, remote config, push processing) compete at the wrong moment?

# Capture a startup-focused trace from a test device
adb shell perfetto -o /data/misc/perfetto-traces/startup_trace.pftrace -t 15s \
  sched freq idle am wm gfx view binder_driver hal dalvik

adb pull /data/misc/perfetto-traces/startup_trace.pftrace ./startup_trace.pftrace
# Open in https://ui.perfetto.dev and inspect Android App Startups + main thread lanes

We also annotate suspect blocks with tracing sections in app code, so trace jumps from “thread busy” to a specific business operation instead of guesswork.

Common startup anti-patterns we still see in 2026

Configuration fan-out in Application.onCreate(): too many network and SDK initializations on app process start.
Synchronous disk/network on main thread: still appears via legacy wrappers and third-party SDK callbacks.
Over-eager dependency injection graphs: creating everything for all screens before first interaction.
Background queues without backpressure: analytics/upload tasks stealing CPU during first draw, a pattern we discussed similarly in our data reliability post.
UI smoothness tunnel vision: optimizing animation while launch pipeline remains unbounded, the same strategic mistake behind poor INP on web apps in our INP deep dive.

Troubleshooting: when startup is still slow after Baseline Profiles

Symptom 1: TTID improves, but users still report “hang on open”

Likely cause: first frame is drawn, but post-draw work blocks interaction. Check TTFD and main-thread slices after first frame in Perfetto.

Fix: defer non-critical work, split heavy initializers, and enforce main-thread budgets per component.

Symptom 2: Benchmarks are stable locally, unstable in CI devices

Likely cause: thermal throttling, background app noise, or inconsistent device state.

Fix: standardize device prep (airplane mode where possible, battery threshold, cooldown window), and compare percentile bands instead of one-off medians.

Symptom 3: ANRs persist despite better startup timings

Likely cause: ANRs occur in specific interaction paths (broadcast receiver/service/job) outside launcher startup benchmark.

Fix: add scenario benchmarks for notification open, deep-link open, and resume-from-background flows. Startup-only tests are necessary, not sufficient.

FAQ

1) Should every app team adopt Baseline Profiles immediately?

If your app has meaningful cold-start traffic or first-session drop-off, yes, it is usually worth it. For tiny apps with very short startup and low release frequency, measurement discipline may deliver more value first.

2) Is Macrobenchmark enough without Perfetto?

No. Macrobenchmark gives trend detection and regression gates. Perfetto gives causal debugging. You will need both once the first unexplained regression happens.

3) Can I move everything off the main thread and call it done?

Not safely. Blind offloading can create lock contention, binder stalls, or priority inversion. The goal is not “zero main-thread work,” it is “bounded, intentional main-thread work.”

Actionable takeaways

Pick one Android startup performance SLO (for example p95 TTID/TTFD by device tier) and publish it weekly.
Add a benchmark build + Macrobenchmark startup test in CI before your next release branch cut.
Generate and package Baseline Profiles every release, then verify effect on representative mid-range devices.
Capture one Perfetto tracing session for each startup regression, and require trace-backed root cause in postmortems.
Expand from launcher startup to deep-link/notification/resume scenarios so ANR risk is covered where users actually enter.