Fully Automated Invoice Reimbursement with Local AI Agent: From Process Breakdown to Deployment

Using a local AI Agent to compress tedious reimbursement workflows from hours to minutes
Bilibili creator Panther demonstrates how to automate reimbursement workflows using a locally deployed AI Agent. The core methodology is to first map out the manual process, then identify AI intervention points—delegating OCR recognition, information extraction, and table generation to AI while humans only perform final review. The tech stack consists of MinerU + Qwen 3 59B + Qianwen Po, with local deployment hardware costs ranging from ¥20,000-50,000, suitable for data-sensitive scenarios. AI is positioned to "replace tasks" rather than "substitute humans," shifting the human role from executor to reviewer.
The Reimbursement Pain Point: Every Office Worker's Nightmare
You return from a business trip facing a pile of crumpled high-speed rail tickets, taxi receipts, restaurant invoices, and hotel bills. You need to open Excel or your company's OA system and manually enter each invoice's information into a reimbursement form, then arrange and paste them in order before submitting to finance. If it's a week-long or even months-long trip, just organizing invoices and filling out the reimbursement form can eat up most of your day.
Bilibili creator Panther demonstrated in his latest video how to compress this tedious process down to just a few minutes using a locally deployed AI Agent (which he calls "Lobster").

Core Methodology: Forget About AI When Building AI Automation
Panther repeatedly emphasizes a counterintuitive methodology in the video: When building AI automation, first forget about AI. This means first mapping out the process from a human perspective, then looking back to see where AI can help.
The logic behind this methodology is: many people jump straight to asking "what can AI do?" and end up falling into a technology-driven trap—using AI for the sake of using AI, ultimately producing something that's neither user-friendly nor stable. The correct approach is to first conduct Process Analysis, identifying which steps are high-frequency, low-value, rule-based mechanical operations. These are the optimal intervention points for AI.
Four-Step Breakdown of the Reimbursement Process
Breaking down the reimbursement process, there are actually only four steps:
- Print electronic invoices and organize them
- Manually read invoices and enter information into Excel/OA system per finance requirements
- Fill out the reimbursement form and paste invoices in order
- Submit to finance for review
Steps one and four take anywhere from seconds to a few minutes. What truly drives people crazy are the middle two steps—reading, transcribing, organizing, and categorizing. These tasks require no thinking or judgment, only time, patience, and accuracy—which is precisely what AI excels at.

The AI-Optimized Reimbursement Workflow
After introducing AI, the new reimbursement process becomes:
- Manual organization: Place all invoice files (photos or digital copies) into a single folder
- AI performs OCR: Extract structured information from all invoices via OCR
- AI generates reimbursement form: Extract key information from structured data and automatically generate a reimbursement spreadsheet
- AI archives and numbers: Number and archive all invoice files according to the reimbursement form
- Human review: Spend a few minutes checking results, confirm accuracy, then submit
The human workload goes from hours of manual labor to "drop files → give instructions → check results"—totaling just a few minutes.
This workflow design embodies best practices in human-machine collaboration: AI handles high-throughput information processing while humans handle final quality control and exception handling. This pattern is known in industry as "Human-in-the-Loop"—leveraging AI's efficiency advantages while retaining human ultimate control over result correctness.

Technical Solution and Hardware Configuration
Two Deployment Routes
Panther divides the solution into two scenarios:
- Non-confidential scenarios: Use online APIs directly—simple, hassle-free, best results
- Confidential scenarios: Invoice information is sensitive company data and must be processed locally
Local models do lag behind online APIs, but based on Panther's six months of project experience, local deployment has reached a "usable" level, with results proportional to investment. The core consideration for choosing local deployment is data security—invoices contain company names, taxpayer identification numbers, transaction amounts, consumption details, and other sensitive information. Once transmitted through external APIs, there's a compliance risk of data leakage. For finance, government, and defense industries, local deployment is virtually the only option.
Hardware Configurations and Budget Comparison
Panther tested multiple hardware configurations that can run reasonably smoothly:
| Platform | Configuration | Budget |
|---|---|---|
| Windows | RTX 5090 32G + 64G RAM + 9950X3D | ¥45,000-50,000 |
| Mac | MacBook M5 Pro 48G unified memory | ~¥20,000 |
| Linux (AMD) | AMD 395 chip AIPC, up to 128GB unified memory | ~¥25,000 |
| Linux (NVIDIA) | DGX Spark, up to 128GB unified memory | ~¥33,000 |
The Windows solution is fastest, the Mac solution wins on portability, and the two Linux devices are compact with performance between the two.
A note on unified memory architecture: Unified Memory refers to CPU and GPU sharing the same physical memory pool, eliminating data copying between them. Apple Silicon's M-series chips pioneered large-scale adoption of this architecture in consumer products, while AMD's Strix Halo (395 series) and NVIDIA's DGX Spark employ similar designs. For large model inference, the biggest advantage of unified memory is that model weights can be directly accessed by GPU cores without being limited by traditional discrete GPU VRAM capacity. For example, a device with 128GB unified memory can theoretically load nearly 128GB of model weights, whereas even an RTX 5090 only has 32GB of VRAM in a traditional setup. The tradeoff is that unified memory bandwidth is typically lower than dedicated VRAM (such as HBM), so inference speed is somewhat slower—but this is perfectly acceptable for non-real-time batch processing tasks like reimbursement form generation.

Software Stack: OCR + LLM + Agent Framework
OCR Model: MinerU
While MinerU's accuracy isn't the highest, its deployment is extremely simple and very beginner-friendly. For those with technical expertise, Panther recommends exploring PaddleOCR, where preprocessing and fine-tuning can significantly improve recognition rates.
OCR (Optical Character Recognition) is a technology that converts text in images into machine-editable text. Traditional OCR relies on template matching and feature extraction, while modern OCR heavily incorporates deep learning, especially combinations of Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN). In invoice recognition scenarios, OCR faces challenges including: diverse ticket formats (VAT special invoices, general invoices, electronic invoices, train tickets, taxi receipts, etc.), inconsistent print quality, and varying photo angles and lighting conditions. MinerU, as an open-source document parsing tool, primarily targets structured extraction from PDFs and images, with its advantage being out-of-the-box usability. PaddleOCR is Baidu's open-source OCR toolkit supporting a complete pipeline of text detection, direction classification, and text recognition—through fine-tuning for specific ticket types, recognition accuracy can be significantly improved.
Large Language Model: Qwen 3 59B
For tasks like information extraction, table generation, and archiving, the current Qwen 3 59B model is more than capable.
Qwen 3 is the third-generation large language model series from Alibaba Cloud's Tongyi Qianwen team. The 59B (59 billion parameter) version is a medium-to-large model in the series, excelling in reasoning ability, instruction following, and structured output. Compared to larger models (70B+), the 59B version has relatively manageable VRAM and memory requirements for local deployment—approximately 30-35GB under 4-bit quantization—making it runnable on consumer-grade hardware. For invoice information extraction tasks, the model needs accurate JSON/table format output capabilities, understanding of Chinese financial terminology, and fault-tolerant reasoning when facing OCR recognition errors. The 59B-level model has demonstrated sufficient reliability in these areas.
AI Agent Framework: Qianwen Po
Panther chose Alibaba's Qianwen Po as the Agent framework (the "Lobster"). Compared to alternatives like OpenClaw, Qianwen Po has a remarkably high level of completeness, and most critically, domestic deployment requires no VPN access, making it very beginner-friendly. The Agent's "brain" directly uses Qwen 3 59B.
Qianwen Po is an Agent development framework built on Alibaba's Tongyi Qianwen ecosystem, providing core capabilities including tool registration, multi-step planning, memory management, and multi-Agent collaboration. Compared to popular overseas frameworks like LangChain, AutoGen, and CrewAI, Qianwen Po's advantages include: native support for Qwen series model features (such as function calling format), comprehensive Chinese documentation and community support, and deployment that doesn't depend on overseas servers or APIs. In practice, developers can define multiple tools for the Agent (such as "read images from folder," "call OCR model," "generate Excel spreadsheet," "rename files," etc.), and the Agent will automatically orchestrate the calling sequence of these tools based on natural language instructions, achieving end-to-end automation.
An AI Agent (intelligent agent) is one of the core paradigms in current large model applications, fundamentally different from simple conversational AI. An Agent can not only understand natural language instructions but also autonomously plan task steps, invoke external tools (such as file system operations, API calls, code execution, etc.), dynamically adjust strategies based on intermediate results, and ultimately complete complex multi-step tasks. A typical Agent architecture includes: perception layer (receiving user input and environmental information), planning layer (decomposing complex tasks into subtasks), execution layer (invoking tools to complete specific operations), and reflection layer (evaluating execution results and deciding whether corrections are needed). In the reimbursement scenario, the Agent needs to coordinate OCR tools, file management, table generation, and other components—this is precisely where Agents excel compared to single LLM calls.
Reflections on AI's Role: Replacement, Not Substitution
In his summary, Panther raises a thought-provoking point: we use AI to replace the mechanical, repetitive parts of the reimbursement process—but note the wording is "replace" (specific tasks), not "substitute" (the entire human role).
The human role hasn't disappeared from the process; it has shifted from laborer to reviewer. You no longer need to spend hours copying and transcribing—you just need a few minutes to check AI's output. The time saved can be used for:
- Judging whether reimbursements are compliant
- Communicating special cases with finance
- Or—leaving work early
This distinction between "replacing tasks" and "substituting humans" reflects an important consensus in current AI application deployment: in most real business scenarios, AI reliability hasn't yet reached the level of 100% unsupervised operation. Especially in finance, a single numerical error could cause tax compliance issues, so human review isn't just an efficiency consideration—it's a necessary risk control measure. This also explains why the industry tends to position current-stage AI as a "Copilot" rather than "Autopilot."
The right way to use AI: Find the bottlenecks in your workflow, use AI to break through them, then spend the saved time and energy on more valuable things. Let AI do what AI does best, and let humans do what humans do best.

Conclusion
This case demonstrates how AI Agents can be practically deployed in real office scenarios. The key isn't how complex the technology is, but whether the process breakdown is clear and whether AI intervention points are precise. For enterprise users with confidentiality requirements, local deployment solutions require some hardware investment but already offer practical value. Panther will be releasing detailed deployment tutorials in follow-up content, which is worth keeping an eye on.
From a broader perspective, this reimbursement automation case is actually a microcosm of the "RPA (Robotic Process Automation) + AI" convergence trend. Traditional RPA excels at handling repetitive operations with clear rules and fixed interfaces, but struggles with unstructured inputs (like various invoice image formats). The addition of large models fills exactly this gap, expanding automation's applicability from "structured data processing" to "unstructured information understanding." In the future, similar AI Agent solutions are expected to land in more office scenarios such as contract review, resume screening, and customer service ticket processing.
Key Takeaways
- Core methodology for reimbursement automation: first map out the manual process, then identify steps AI can replace
- Local deployment is suitable for confidential scenarios, with hardware costs ranging from ¥20,000 to ¥50,000
- Recommended software stack: MinerU (OCR) + Qwen 3 59B (LLM) + Qianwen Po (Agent framework)
- AI's role is to replace mechanical repetitive tasks; the human role shifts from executor to reviewer
- The entire reimbursement process is compressed from hours to just minutes of review time
Related articles
TutorialsCursor + Codex Dual-IDE Collaboration: A Practical Methodology for Open-Source Project Customization
A complete methodology for open-source project customization based on real-world experience, detailing the Cursor+Codex dual-IDE workflow, seven-stage process, MVP validation, and AI source code reading techniques.
TutorialsCursor Multi-Agent in Practice: Building a Full-Stack Next.js Blog in 50 Minutes
Build a full-stack blog in 50 minutes using Cursor IDE's multi-Agent mode with Next.js, Clerk auth, and Supabase. Learn the 4-phase AI Agent workflow and key integration pitfalls.
TutorialsBuilding an AI Software Factory from Scratch: A Cursor Engineer's Hands-On Experience with Multi-Agent Collaboration
Cursor engineer Eric shares practical insights on building an AI software factory: automation levels, guardrail design, parallel Agent management, and scaling to 1000+ Agents for 24/7 development.