Android in 2026: Build an Offline OCR Scanner App with CameraX, ML Kit, and Room FTS Search

Want to ship an Android offline OCR app that still feels fast and reliable in poor network conditions? In this guide, you will build a production-ready document scanner pipeline using CameraX for capture, ML Kit for on-device text recognition, and Room FTS for instant local search. By the end, you will have a practical architecture for scanning receipts, invoices, or notes directly on device, with background indexing and clean upgrade paths for cloud sync later.

Why this architecture works in 2026

Many scanner apps still depend on server-side OCR, which adds latency, cost, and privacy risk. An offline-first stack solves those pain points:

  • Low latency: users get extracted text in seconds.
  • Privacy by design: sensitive documents stay on-device.
  • Resilience: features keep working without internet.
  • Cost control: less backend OCR infrastructure.

We will use a simple flow: capture image, preprocess, run OCR, persist structured data, then index text for search.

Project setup and dependencies

Core modules

  • camera: CameraX preview + image capture
  • ocr: ML Kit text recognizer
  • data: Room entities, DAO, FTS table
  • work: WorkManager jobs for indexing/retry

Gradle dependencies

dependencies {
    // CameraX
    implementation("androidx.camera:camera-core:1.4.0")
    implementation("androidx.camera:camera-camera2:1.4.0")
    implementation("androidx.camera:camera-lifecycle:1.4.0")
    implementation("androidx.camera:camera-view:1.4.0")

    // ML Kit OCR (on-device)
    implementation("com.google.mlkit:text-recognition:16.0.0")

    // Room + FTS
    implementation("androidx.room:room-runtime:2.7.0")
    implementation("androidx.room:room-ktx:2.7.0")
    ksp("androidx.room:room-compiler:2.7.0")

    // Background work
    implementation("androidx.work:work-runtime-ktx:2.10.0")
}

Data model for OCR documents and full-text search

Store original metadata and OCR text separately. That gives you clean updates and faster queries.

@Entity(tableName = "documents")
data class DocumentEntity(
    @PrimaryKey(autoGenerate = true) val id: Long = 0,
    val uri: String,
    val createdAt: Long,
    val title: String,
    val confidenceAvg: Float
)

@Entity(tableName = "document_text")
data class DocumentTextEntity(
    @PrimaryKey val documentId: Long,
    val rawText: String,
    val normalizedText: String
)

@Fts4(contentEntity = DocumentTextEntity)
@Entity(tableName = "document_text_fts")
data class DocumentTextFts(
    val normalizedText: String
)

@Dao
interface DocumentDao {
    @Insert suspend fun insertDocument(doc: DocumentEntity): Long
    @Insert suspend fun upsertText(text: DocumentTextEntity)

    @Query("""
        SELECT d.id, d.title, d.createdAt
        FROM documents d
        JOIN document_text_fts fts ON fts.rowid = d.id
        WHERE document_text_fts MATCH :q
        ORDER BY d.createdAt DESC
    """)
    suspend fun search(q: String): List<DocumentListItem>
}

This schema keeps lookups quick even when the user scans hundreds of pages.

CameraX + ML Kit pipeline

Capture and run text recognition

After capture, pass an InputImage into ML Kit. Keep OCR in a coroutine on Dispatchers.Default to avoid UI jank.

class OcrProcessor {
    private val recognizer = TextRecognition.getClient(TextRecognizerOptions.DEFAULT_OPTIONS)

    suspend fun extractText(bitmap: Bitmap, rotation: Int): OcrResult =
        suspendCancellableCoroutine { cont ->
            val image = InputImage.fromBitmap(bitmap, rotation)
            recognizer.process(image)
                .addOnSuccessListener { visionText ->
                    val blocks = visionText.textBlocks
                    val text = blocks.joinToString("\n") { it.text }
                    val confidence = blocks
                        .flatMap { it.lines }
                        .mapNotNull { it.confidence }
                        .average()
                        .toFloat()
                        .takeIf { !it.isNaN() } ?: 0.0f

                    cont.resume(OcrResult(text = text, avgConfidence = confidence))
                }
                .addOnFailureListener { e -> cont.resumeWithException(e) }
        }
}

suspend fun saveScan(uri: Uri, bitmap: Bitmap, dao: DocumentDao, ocr: OcrProcessor) {
    val result = ocr.extractText(bitmap, rotation = 0)
    val id = dao.insertDocument(
        DocumentEntity(
            uri = uri.toString(),
            createdAt = System.currentTimeMillis(),
            title = "Scan ${System.currentTimeMillis()}",
            confidenceAvg = result.avgConfidence
        )
    )
    dao.upsertText(
        DocumentTextEntity(
            documentId = id,
            rawText = result.text,
            normalizedText = result.text.lowercase()
        )
    )
}

Quality and performance tips

1) Improve OCR accuracy before recognition

  • Crop to detected document edges.
  • Apply grayscale + adaptive threshold.
  • Run light deskew for tilted pages.

2) Keep scanning responsive

  • Do not OCR every preview frame, OCR only captured frames.
  • Cache resized bitmaps to avoid repeated allocations.
  • Use WorkManager for deferred indexing when the app is backgrounded.

3) Add confidence-aware UX

If confidence is low, prompt users to retake with better lighting. This improves trust and data quality immediately.

Background indexing with WorkManager

For multi-page scans, queue indexing work so capture flow stays fast.

class ReindexWorker(
    appContext: Context,
    params: WorkerParameters,
    private val repository: ScanRepository
) : CoroutineWorker(appContext, params) {

    override suspend fun doWork(): Result {
        return try {
            repository.reindexPendingDocuments()
            Result.success()
        } catch (e: Exception) {
            Result.retry()
        }
    }
}

fun scheduleReindex(context: Context) {
    val request = OneTimeWorkRequestBuilder<ReindexWorker>()
        .setBackoffCriteria(
            BackoffPolicy.EXPONENTIAL,
            30, TimeUnit.SECONDS
        )
        .build()

    WorkManager.getInstance(context)
        .enqueueUniqueWork("ocr_reindex", ExistingWorkPolicy.KEEP, request)
}

Security and privacy baseline

  • Encrypt local DB or sensitive fields where required.
  • Strip EXIF metadata from stored images if location is not needed.
  • Gate cloud export behind explicit user consent.

If your app later adds API sync, apply strong backend hardening patterns similar to this Node.js security guide: Node.js in 2026: Secure Your Backend.

How this fits your broader engineering stack

A robust scanner feature often becomes one piece of a larger product platform. You can combine this approach with:

Conclusion

Building an Android offline OCR app is now straightforward when you combine CameraX, ML Kit, and Room FTS with a clean background-work strategy. You get speed, privacy, and better reliability than cloud-only OCR workflows, while keeping room to scale into sync, analytics, and enterprise controls later. Start small with single-page scans, validate recognition confidence in real lighting conditions, then expand into batch scan and export flows.

FAQ

Is ML Kit OCR accurate enough for receipts and invoices?

Yes for many common layouts, especially with good preprocessing. Accuracy usually improves a lot after cropping, denoising, and deskewing images.

Why use Room FTS instead of plain LIKE queries?

Room FTS is significantly faster and more relevant for large text datasets. It keeps search responsive as scanned document count grows.

Can I sync scanned text to a backend later?

Absolutely. Keep local IDs stable, add a sync queue, and use idempotent server endpoints to avoid duplicates during retries.

What is the best first metric to track in production?

Track OCR confidence and user correction rate together. That pair quickly reveals where camera UX or preprocessing needs improvement.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Privacy Policy · Contact · Sitemap

© 7Tech – Programming and Tech Tutorials