Working with Files and OCR
You can now upload PDF and images to Genum and use LLMs to describe, extract, or transform them into structured data — from receipts and invoices to any document-based workflow.
Genum supports document processing directly in prompts. Upload files via the web interface or API, and your prompts receive both the file content and LLM-powered visual understanding. This enables OCR, document extraction, and structured data conversion without external preprocessing.
Supported File Types
| Type | Formats |
|---|---|
application/pdf | |
| Images | PNG, JPEG, GIF, WebP |
Limits per request: up to 3 files, 50MB total combined.
Files are passed to the LLM as part of the prompt context. Vision-capable models can read and interpret text, layout, tables, and visual elements.
What You Can Do
1. OCR and Text Extraction
- Extract text from scanned documents, screenshots, and photos
- Recognize handwriting and printed text
- Process multi-page PDFs
2. Structured Data Extraction
Transform unstructured documents into JSON, CSV, or other schemas:
- Receipts — Items, totals, dates, merchant names
- Invoices — Line items, amounts, tax, due dates
- Forms — Fields and values
- Narrative documents — Summaries, entities, key facts
3. Description and Analysis
- Generate summaries of documents
- Answer questions about document content
- Classify or categorize documents
- Compare multiple documents
Using Files in Prompts
Managing Files
The Files section in the side menu lists all files in the current project. From there you can:
- Upload — Add new PDF or image files to the project
- Open — View or preview a file
- Delete — Remove files from the project
Adding Files to a Prompt
On the prompt page, in the playground under the input field, use the Add files button. This opens a dialog that shows the list of files in the project — select the files you want to include in the current run. You can also add new files directly from that dialog.
When you run the prompt, the selected files are sent along with your input; the LLM receives them in context.
For API usage, see API Integration.
Example Use Cases
| Use Case | Input | Output |
|---|---|---|
| Receipt parser | Receipt photo | { items, total, date, merchant } |
| Invoice extraction | PDF invoice | Line items, amounts, tax, vendor |
| Form digitization | Scanned form | Structured field-value pairs |
| Document Q&A | PDF + question | Answer based on document content |
| Receipt matching | Receipt + order | Match items and flag discrepancies |
Limits and Considerations
- File count: Up to 3 files per request
- Total size: 50MB total for all files combined
- Models: Use vision-capable models (e.g. GPT-4o, Gemini) for images and PDFs
- Language: Extraction and OCR work in multiple languages depending on the model
Related
- Prompts — Prompt types and authoring
- API Integration — Run prompts with files via HTTP