Want to ship an Android offline OCR app that still feels fast and reliable in poor network conditions? In this guide, you will build a production-ready document scanner pipeline using CameraX for capture, ML Kit for on-device text recognition, and Room FTS for instant local search. By the end, you will have a practical architecture for scanning receipts, invoices, or notes directly on device, with background indexing and clean upgrade paths for cloud sync later.
Why this architecture works in 2026
Many scanner apps still depend on server-side OCR, which adds latency, cost, and privacy risk. An offline-first stack solves those pain points:
- Low latency: users get extracted text in seconds.
- Privacy by design: sensitive documents stay on-device.
- Resilience: features keep working without internet.
- Cost control: less backend OCR infrastructure.
We will use a simple flow: capture image, preprocess, run OCR, persist structured data, then index text for search.
Project setup and dependencies
Core modules
- camera: CameraX preview + image capture
- ocr: ML Kit text recognizer
- data: Room entities, DAO, FTS table
- work: WorkManager jobs for indexing/retry
Gradle dependencies
dependencies {
// CameraX
implementation("androidx.camera:camera-core:1.4.0")
implementation("androidx.camera:camera-camera2:1.4.0")
implementation("androidx.camera:camera-lifecycle:1.4.0")
implementation("androidx.camera:camera-view:1.4.0")
// ML Kit OCR (on-device)
implementation("com.google.mlkit:text-recognition:16.0.0")
// Room + FTS
implementation("androidx.room:room-runtime:2.7.0")
implementation("androidx.room:room-ktx:2.7.0")
ksp("androidx.room:room-compiler:2.7.0")
// Background work
implementation("androidx.work:work-runtime-ktx:2.10.0")
}Data model for OCR documents and full-text search
Store original metadata and OCR text separately. That gives you clean updates and faster queries.
@Entity(tableName = "documents")
data class DocumentEntity(
@PrimaryKey(autoGenerate = true) val id: Long = 0,
val uri: String,
val createdAt: Long,
val title: String,
val confidenceAvg: Float
)
@Entity(tableName = "document_text")
data class DocumentTextEntity(
@PrimaryKey val documentId: Long,
val rawText: String,
val normalizedText: String
)
@Fts4(contentEntity = DocumentTextEntity)
@Entity(tableName = "document_text_fts")
data class DocumentTextFts(
val normalizedText: String
)
@Dao
interface DocumentDao {
@Insert suspend fun insertDocument(doc: DocumentEntity): Long
@Insert suspend fun upsertText(text: DocumentTextEntity)
@Query("""
SELECT d.id, d.title, d.createdAt
FROM documents d
JOIN document_text_fts fts ON fts.rowid = d.id
WHERE document_text_fts MATCH :q
ORDER BY d.createdAt DESC
""")
suspend fun search(q: String): List<DocumentListItem>
}This schema keeps lookups quick even when the user scans hundreds of pages.
CameraX + ML Kit pipeline
Capture and run text recognition
After capture, pass an InputImage into ML Kit. Keep OCR in a coroutine on Dispatchers.Default to avoid UI jank.
class OcrProcessor {
private val recognizer = TextRecognition.getClient(TextRecognizerOptions.DEFAULT_OPTIONS)
suspend fun extractText(bitmap: Bitmap, rotation: Int): OcrResult =
suspendCancellableCoroutine { cont ->
val image = InputImage.fromBitmap(bitmap, rotation)
recognizer.process(image)
.addOnSuccessListener { visionText ->
val blocks = visionText.textBlocks
val text = blocks.joinToString("\n") { it.text }
val confidence = blocks
.flatMap { it.lines }
.mapNotNull { it.confidence }
.average()
.toFloat()
.takeIf { !it.isNaN() } ?: 0.0f
cont.resume(OcrResult(text = text, avgConfidence = confidence))
}
.addOnFailureListener { e -> cont.resumeWithException(e) }
}
}
suspend fun saveScan(uri: Uri, bitmap: Bitmap, dao: DocumentDao, ocr: OcrProcessor) {
val result = ocr.extractText(bitmap, rotation = 0)
val id = dao.insertDocument(
DocumentEntity(
uri = uri.toString(),
createdAt = System.currentTimeMillis(),
title = "Scan ${System.currentTimeMillis()}",
confidenceAvg = result.avgConfidence
)
)
dao.upsertText(
DocumentTextEntity(
documentId = id,
rawText = result.text,
normalizedText = result.text.lowercase()
)
)
}Quality and performance tips
1) Improve OCR accuracy before recognition
- Crop to detected document edges.
- Apply grayscale + adaptive threshold.
- Run light deskew for tilted pages.
2) Keep scanning responsive
- Do not OCR every preview frame, OCR only captured frames.
- Cache resized bitmaps to avoid repeated allocations.
- Use WorkManager for deferred indexing when the app is backgrounded.
3) Add confidence-aware UX
If confidence is low, prompt users to retake with better lighting. This improves trust and data quality immediately.
Background indexing with WorkManager
For multi-page scans, queue indexing work so capture flow stays fast.
class ReindexWorker(
appContext: Context,
params: WorkerParameters,
private val repository: ScanRepository
) : CoroutineWorker(appContext, params) {
override suspend fun doWork(): Result {
return try {
repository.reindexPendingDocuments()
Result.success()
} catch (e: Exception) {
Result.retry()
}
}
}
fun scheduleReindex(context: Context) {
val request = OneTimeWorkRequestBuilder<ReindexWorker>()
.setBackoffCriteria(
BackoffPolicy.EXPONENTIAL,
30, TimeUnit.SECONDS
)
.build()
WorkManager.getInstance(context)
.enqueueUniqueWork("ocr_reindex", ExistingWorkPolicy.KEEP, request)
}Security and privacy baseline
- Encrypt local DB or sensitive fields where required.
- Strip EXIF metadata from stored images if location is not needed.
- Gate cloud export behind explicit user consent.
If your app later adds API sync, apply strong backend hardening patterns similar to this Node.js security guide: Node.js in 2026: Secure Your Backend.
How this fits your broader engineering stack
A robust scanner feature often becomes one piece of a larger product platform. You can combine this approach with:
- Offline sync design patterns from Flutter Offline Sync in 2026.
- Progressive rollout ideas from React Feature Flags with Progressive Delivery.
- Phishing-resistant auth guidance from Passkey-First Authentication.
- Reliable integration client patterns from Production-Ready Python API Clients.
Conclusion
Building an Android offline OCR app is now straightforward when you combine CameraX, ML Kit, and Room FTS with a clean background-work strategy. You get speed, privacy, and better reliability than cloud-only OCR workflows, while keeping room to scale into sync, analytics, and enterprise controls later. Start small with single-page scans, validate recognition confidence in real lighting conditions, then expand into batch scan and export flows.
FAQ
Is ML Kit OCR accurate enough for receipts and invoices?
Yes for many common layouts, especially with good preprocessing. Accuracy usually improves a lot after cropping, denoising, and deskewing images.
Why use Room FTS instead of plain LIKE queries?
Room FTS is significantly faster and more relevant for large text datasets. It keeps search responsive as scanned document count grows.
Can I sync scanned text to a backend later?
Absolutely. Keep local IDs stable, add a sync queue, and use idempotent server endpoints to avoid duplicates during retries.
What is the best first metric to track in production?
Track OCR confidence and user correction rate together. That pair quickly reveals where camera UX or preprocessing needs improvement.

Leave a Reply