Back to Blogs
Banner image with green wave patterns in the background and the text 'Gemini 3 OCR - Quick Findings' above the link tlake.link/blog/gemini-3-findings.

Gemini 3 OCR - Quick Findings

TL;DR

Gemini 3 has good OCR but is unstructured and limited. Tensorlake provides precise page slicing and well-structured JSON output with no cleanup required.

Tensorlake Integration vs Direct Gemini 3#

Gemini 3-Pro brings strong OCR capabilities, but its raw output still needs significant structuring before it’s usable for downstream document workflows. While integrating Gemini 3 into our Document AI pipeline, I captured a few quick observations comparing direct Gemini 3 usage vs. running it through Tensorlake’s unified extraction layer.

PDF Handling#

Gemini 3 accepts PDFs directly, but does not handle page slicing.
If you want to parse only a subset of pages, the control is limited, you have to manually split the PDF and stitch results back together.

Tensorlake supports precise page slicing out-of-the-box:

1parse_id = doc_ai.read( 2 file_url=file_url, 3 page_range="1-3", # Parse only pages 1–3 4 parsing_options=parsing_options, 5)

Result: Users can extract specific pages or ranges without processing the entire document.

Output Structure#

Gemini 3 can generate HTML, but the structure is not well organized for downstream use:

  • sections are not clearly separated
  • layout elements aren’t grouped
  • users must manually reorganize the structure

Let’s look at this example which is the top portion of an invoice pdf

Google 2024 Environmental Report - Water Use Table

Output from Gemini-3 generated html

1<!-- Page 1 --> 2<div class="page-container"> 3 <div class="header"> 4 <div class="company-info"> 5 <h1>ARK GLOSS CLOTHING</h1> 6 <p>123 SAN SEBASTIAN ST.</p> 7 <p>LOS ANGELES, CA 90015 (US)</p> 8 <p>(123) 555-1234</p> 9 <p>info@arkglossclothing.com</p> 10 <p style="margin-top: 10px;">Sales Rep. :</p> 11 </div> 12 <div class="invoice-title"> 13 <h1>I N V O I C E</h1> 14 <h2>INV-20212</h2> 15 <div class="invoice-details"> 16 <table> 17 <tr><td>INVOICE DATE</td><td>01/23/2024</td></tr> 18 <tr><td>CUSTOMER TYPE</td><td>STORE</td></tr> 19 <tr><td>PO NUMBER</td><td></td></tr> 20 <tr><td>SHIP DATE</td><td>01/26/2024</td></tr> 21 </table> 22 </div> 23 </div> 24 </div> 25....

Output as Tensorlake’s unified JSON (Gemini-3 plugged in):

1{ 2 "page_number": 1, 3 "page_fragments": [ 4 { 5 "fragment_type": "title", 6 "content": { 7 "content": "INVOICE", 8 }, 9 "reading_order": 1, 10 }, 11 { 12 "fragment_type": "text", 13 "content": { 14 "content": "INV-20212", 15 }, 16 "reading_order": 2, 17 }, 18 { 19 "fragment_type": "text", 20 "content": { 21 "content": "ARK GLOSS CLOTHING 22 23123 SAN SEBASTIAN ST. 24LOS ANGELES, CA 90015 (US) 25(123) 555-1234 26info@arkglossclothing.com", 27 }, 28 "reading_order": 3, 29 }, 30 { 31 "fragment_type": "table", 32 "content": { 33 "content": "INVOICE DATE01/23/2024CUSTOMER TYPESTOREPO NUMBERSHIP DATE01/26/2024", 34 "html": "<table><tbody><tr><td>INVOICE DATE</td><td>01/23/2024</td></tr><tr><td>CUSTOMER TYPE</td><td>STORE</td></tr><tr><td>PO NUMBER</td><td></td></tr><tr><td>SHIP DATE</td><td>01/26/2024</td></tr></tbody></table>", 35 "markdown": "| INVOICE DATE | 01/23/2024 | 36| CUSTOMER TYPE | STORE | 37| PO NUMBER | | 38| SHIP DATE | 01/26/2024 |", 39 }, 40 "reading_order": 4, 41 }, 42...

How Tensorlake Differs#

Tensorlake’s integration produces clean, well-organized structured output, including:

  • clear layout groups
  • well-defined document sections
  • table structures represented cleanly in both html and markdown
  • consistent fragment types that work across all OCR/VLM backends

Result: Developers receive a clean, predictable document structure without custom parsing or prompt iteration.

Advanced Usage vs. Simple Usage#

Advanced Gemini users can approximate a similar level of structure with multiple prompt iterations and custom post-processing. With Tensorlake, users get a clean, structured result with a single API call.

What's Next#

Try Tensorlake free

Want to discuss your specific use case?
Schedule a technical demo with our team.

Questions about the benchmark?
Join our Slack community

Dr Shanshan Wang

Dr Shanshan Wang

Founding Data Scientist & Document AI Lead at Tensorlake

Founding Data Scientist & Document AI Lead who specializes in document parsing, multimodal OCR, structure extraction, and production-grade applied AI systems.

This website uses cookies to enhance your browsing experience. By clicking "Accept All Cookies", you consent to the use of ALL cookies. By clicking "Decline", only essential cookies will be used. Read our Privacy Policy for more details.