Working with Files and OCR

You can now upload PDF and images to Genum and use LLMs to describe, extract, or transform them into structured data — from receipts and invoices to any document-based workflow.

Genum supports document processing directly in prompts. Upload files via the web interface or API, and your prompts receive both the file content and LLM-powered visual understanding. This enables OCR, document extraction, and structured data conversion without external preprocessing.

Supported File Types

Type	Formats
PDF	`application/pdf`
Images	PNG, JPEG, GIF, WebP

Limits per request: up to 3 files, 50MB total combined.

Files are passed to the LLM as part of the prompt context. Vision-capable models can read and interpret text, layout, tables, and visual elements.

What You Can Do

1. OCR and Text Extraction

Extract text from scanned documents, screenshots, and photos
Recognize handwriting and printed text
Process multi-page PDFs

2. Structured Data Extraction

Transform unstructured documents into JSON, CSV, or other schemas:

Receipts — Items, totals, dates, merchant names
Invoices — Line items, amounts, tax, due dates
Forms — Fields and values
Narrative documents — Summaries, entities, key facts

3. Description and Analysis

Generate summaries of documents
Answer questions about document content
Classify or categorize documents
Compare multiple documents

Using Files in Prompts

Managing Files

The Files section in the side menu lists all files in the current project. From there you can:

Upload — Add new PDF or image files to the project
Open — View or preview a file
Delete — Remove files from the project

Adding Files to a Prompt

On the prompt page, in the playground under the input field, use the Add files button. This opens a dialog that shows the list of files in the project — select the files you want to include in the current run. You can also add new files directly from that dialog.

When you run the prompt, the selected files are sent along with your input; the LLM receives them in context.

For API usage, see API Integration.

Example Use Cases

Use Case	Input	Output
Receipt parser	Receipt photo	`{ items, total, date, merchant }`
Invoice extraction	PDF invoice	Line items, amounts, tax, vendor
Form digitization	Scanned form	Structured field-value pairs
Document Q&A	PDF + question	Answer based on document content
Receipt matching	Receipt + order	Match items and flag discrepancies

Limits and Considerations

File count: Up to 3 files per request
Total size: 50MB total for all files combined
Models: Use vision-capable models (e.g. GPT-4o, Gemini) for images and PDFs
Language: Extraction and OCR work in multiple languages depending on the model

Prompts — Prompt types and authoring
API Integration — Run prompts with files via HTTP

Supported File Types​

What You Can Do​

1. OCR and Text Extraction​

2. Structured Data Extraction​

3. Description and Analysis​

Using Files in Prompts​

Managing Files​

Adding Files to a Prompt​

Example Use Cases​

Limits and Considerations​

Related​