Question: How to perform OCR on Camera's live frame data? #17

Kommandat · 2021-09-25T14:40:13Z

Kommandat
Sep 25, 2021

Really more of a question (or a feature request), but is it possible to perform OCR on the live frame data from the Camera?

In my app I was trying to get around this by repeatedly taking pictures at regular intervals and scanning them to show red bounding boxes for the text (video below). This would've worked fine, except iOS makes a loud shutter noise when you take a picture (and a shutter noise every 500ms is really really annoying).

It looks like expo-camera (which is what I'm using) doesn't access the live frame data but there are some other options out there that do, but I don't know how to get them to work with MLKit:

Thanks! Appreciate this package a ton and understand what I'm asking here might be difficult to do. If so, I'm also curious to learn more and help out.

RPReplay_Final1632580413.mov

agoldis · 2021-09-25T20:11:03Z

agoldis
Sep 25, 2021
Maintainer

Good question! Let's break it down, here's the pipeline:

Get a feed of images from the camera > 2. Input the images into MLKit (this library) > 3. augment the preview screen with OCR data.

Seems like https://github.com/tensorflow/tfjs/blob/d8a8afeeb9218e39655712c3d8d26977371054bf/tfjs-react-native/src/camera/camera_stream.tsx#L369 would be a good reference for 1 and 3.

I would start with 1 and 3 - i.e. to make sure that I am able to receive a feed of images and augment them (with a static text). 1 would produce a stream of in-memory, platform-specific abstractions of "image". We'll need to feed those images into mlkit. This library need an enhancement - to be able to receive the "image" object instead of a file URL. It is already supported in native packages, we'll just need to expose those methods to RN.

0 replies

Kommandat · 2021-09-28T13:37:06Z

Kommandat
Sep 28, 2021
Author

Thanks for outlining the steps! I don't have time to do this now, but I'm definitely interested in tackling this and making the enhancement to this library in ~1 month or so.

A few questions:

The Tensorflow Camera Stream returns an iterable of Tensor3Ds. How would this be converted into a CMSampleBufferRef or ByteBuffer / ByteArray that MLKit needs? Do you think Tensor3D.buffer() is sufficient?
Which files in this library should I familiarize myself with to understand what would be needed for this enhancement?
I'm not too familiar with how to integrate native code with React native (so far I've just been working with React Native & Expo, using libraries like this one when I need native functionality). Do you have any recommended resources to learn more on this topic?

Thanks!

1 reply

ihxkjggg Mar 25, 2025

Hello, is there an example of selecting an image, displaying it on a mobile screen with auto-resizing, and then marking text regions with bounding boxes? I attempted this but encountered misalignment of bounding boxes due to image scaling.

thank you

Noitidart · 2021-09-28T22:23:32Z

Noitidart
Sep 28, 2021

Thanks all for sharing this, this is complicated and I was also trying frame processor, but was having a tough time.

0 replies

agoldis · 2021-09-29T05:33:10Z

agoldis
Sep 29, 2021
Maintainer

The Tensorflow Camera Stream returns an iterable of Tensor3Ds. How would this be converted into a CMSampleBufferRef or ByteBuffer / ByteArray that MLKit needs? Do you think Tensor3D.buffer() is sufficient?

I am not too familiar with Tensor and its terms. The referenced file https://github.com/tensorflow/tfjs/blob/d8a8afeeb9218e39655712c3d8d26977371054bf/tfjs-react-native/src/camera/camera_stream.tsx#L291 integrates with Expo Camera package. It "renders" two React elements;

return [cameraComp, glViewComponent];

The first one "starts" expo camera, the seconds one shows a GLView that renders augmented camera output.

Camera's output is captured here

Seems like all the magic is happening in nextFrameGenerator.

Replacing Tensor logic with mlkit logic would produce the desired effect I think.

Which files in this library should I familiarize myself with to understand what would be needed for this enhancement?

I'm not too familiar with how to integrate native code with React native (so far I've just been working with React Native & Expo, using libraries like this one when I need native functionality). Do you have any recommended resources to learn more on this topic?

Just follow the official RN guide - it's quite a detailed documentation and this package is very "usual" in terms of implementation - it's just one class for each platform.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Question: How to perform OCR on Camera's live frame data? #17

Uh oh!

{{title}}

Uh oh!

Replies: 4 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Uh oh!

Question: How to perform OCR on Camera's live frame data? #17

Uh oh!

Kommandat Sep 25, 2021

Replies: 4 comments · 1 reply

Uh oh!

Uh oh!

agoldis Sep 25, 2021 Maintainer

Uh oh!

Kommandat Sep 28, 2021 Author

Uh oh!

ihxkjggg Mar 25, 2025

Uh oh!

Noitidart Sep 28, 2021

Uh oh!

Uh oh!

agoldis Sep 29, 2021 Maintainer

Kommandat
Sep 25, 2021

Replies: 4 comments 1 reply

agoldis
Sep 25, 2021
Maintainer

Kommandat
Sep 28, 2021
Author

Noitidart
Sep 28, 2021

agoldis
Sep 29, 2021
Maintainer