Fully Automated Invoice Reimbursement with Local AI Agent: From Process Breakdown to Deployment

The Reimbursement Pain Point: Every Office Worker's Nightmare

You return from a business trip facing a pile of crumpled high-speed rail tickets, taxi receipts, restaurant invoices, and hotel bills. You need to open Excel or your company's OA system and manually enter each invoice's information into a reimbursement form, then arrange and paste them in order before submitting to finance. If it's a week-long or even months-long trip, just organizing invoices and filling out the reimbursement form can eat up most of your day.

Bilibili creator Panther demonstrated in his latest video how to compress this tedious process down to just a few minutes using a locally deployed AI Agent (which he calls "Lobster").

bilibili source: 本地部署AI龙虾搞定全自动报销——思路&演示篇

Core Methodology: Forget About AI When Building AI Automation

Panther repeatedly emphasizes a counterintuitive methodology in the video: When building AI automation, first forget about AI. This means first mapping out the process from a human perspective, then looking back to see where AI can help.

The logic behind this methodology is: many people jump straight to asking "what can AI do?" and end up falling into a technology-driven trap—using AI for the sake of using AI, ultimately producing something that's neither user-friendly nor stable. The correct approach is to first conduct Process Analysis, identifying which steps are high-frequency, low-value, rule-based mechanical operations. These are the optimal intervention points for AI.

Four-Step Breakdown of the Reimbursement Process

Breaking down the reimbursement process, there are actually only four steps:

Print electronic invoices and organize them
Manually read invoices and enter information into Excel/OA system per finance requirements
Fill out the reimbursement form and paste invoices in order
Submit to finance for review

Steps one and four take anywhere from seconds to a few minutes. What truly drives people crazy are the middle two steps—reading, transcribing, organizing, and categorizing. These tasks require no thinking or judgment, only time, patience, and accuracy—which is precisely what AI excels at.

然后我们需要人工的一张一张识别发票

The AI-Optimized Reimbursement Workflow

After introducing AI, the new reimbursement process becomes:

Manual organization: Place all invoice files (photos or digital copies) into a single folder
AI performs OCR: Extract structured information from all invoices via OCR
AI generates reimbursement form: Extract key information from structured data and automatically generate a reimbursement spreadsheet
AI archives and numbers: Number and archive all invoice files according to the reimbursement form
Human review: Spend a few minutes checking results, confirm accuracy, then submit

The human workload goes from hours of manual labor to "drop files → give instructions → check results"—totaling just a few minutes.

This workflow design embodies best practices in human-machine collaboration: AI handles high-throughput information processing while humans handle final quality control and exception handling. This pattern is known in industry as "Human-in-the-Loop"—leveraging AI's efficiency advantages while retaining human ultimate control over result correctness.

人你需要花几分钟时间检查一遍结果

Technical Solution and Hardware Configuration

Two Deployment Routes

Panther divides the solution into two scenarios:

Non-confidential scenarios: Use online APIs directly—simple, hassle-free, best results
Confidential scenarios: Invoice information is sensitive company data and must be processed locally

Local models do lag behind online APIs, but based on Panther's six months of project experience, local deployment has reached a "usable" level, with results proportional to investment. The core consideration for choosing local deployment is data security—invoices contain company names, taxpayer identification numbers, transaction amounts, consumption details, and other sensitive information. Once transmitted through external APIs, there's a compliance risk of data leakage. For finance, government, and defense industries, local deployment is virtually the only option.

Hardware Configurations and Budget Comparison

Panther tested multiple hardware configurations that can run reasonably smoothly:

Platform	Configuration	Budget
Windows	RTX 5090 32G + 64G RAM + 9950X3D	¥45,000-50,000
Mac	MacBook M5 Pro 48G unified memory	~¥20,000
Linux (AMD)	AMD 395 chip AIPC, up to 128GB unified memory	~¥25,000
Linux (NVIDIA)	DGX Spark, up to 128GB unified memory	~¥33,000

The Windows solution is fastest, the Mac solution wins on portability, and the two Linux devices are compact with performance between the two.

A note on unified memory architecture: Unified Memory refers to CPU and GPU sharing the same physical memory pool, eliminating data copying between them. Apple Silicon's M-series chips pioneered large-scale adoption of this architecture in consumer products, while AMD's Strix Halo (395 series) and NVIDIA's DGX Spark employ similar designs. For large model inference, the biggest advantage of unified memory is that model weights can be directly accessed by GPU cores without being limited by traditional discrete GPU VRAM capacity. For example, a device with 128GB unified memory can theoretically load nearly 128GB of model weights, whereas even an RTX 5090 only has 32GB of VRAM in a traditional setup. The tradeoff is that unified memory bandwidth is typically lower than dedicated VRAM (such as HBM), so inference speed is somewhat slower—but this is perfectly acceptable for non-real-time batch processing tasks like reimbursement form generation.

市场价大约在3.3万元左右

Software Stack: OCR + LLM + Agent Framework

OCR Model: MinerU

While MinerU's accuracy isn't the highest, its deployment is extremely simple and very beginner-friendly. For those with technical expertise, Panther recommends exploring PaddleOCR, where preprocessing and fine-tuning can significantly improve recognition rates.

OCR (Optical Character Recognition) is a technology that converts text in images into machine-editable text. Traditional OCR relies on template matching and feature extraction, while modern OCR heavily incorporates deep learning, especially combinations of Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN). In invoice recognition scenarios, OCR faces challenges including: diverse ticket formats (VAT special invoices, general invoices, electronic invoices, train tickets, taxi receipts, etc.), inconsistent print quality, and varying photo angles and lighting conditions. MinerU, as an open-source document parsing tool, primarily targets structured extraction from PDFs and images, with its advantage being out-of-the-box usability. PaddleOCR is Baidu's open-source OCR toolkit supporting a complete pipeline of text detection, direction classification, and text recognition—through fine-tuning for specific ticket types, recognition accuracy can be significantly improved.

Large Language Model: Qwen 3 59B

For tasks like information extraction, table generation, and archiving, the current Qwen 3 59B model is more than capable.

Qwen 3 is the third-generation large language model series from Alibaba Cloud's Tongyi Qianwen team. The 59B (59 billion parameter) version is a medium-to-large model in the series, excelling in reasoning ability, instruction following, and structured output. Compared to larger models (70B+), the 59B version has relatively manageable VRAM and memory requirements for local deployment—approximately 30-35GB under 4-bit quantization—making it runnable on consumer-grade hardware. For invoice information extraction tasks, the model needs accurate JSON/table format output capabilities, understanding of Chinese financial terminology, and fault-tolerant reasoning when facing OCR recognition errors. The 59B-level model has demonstrated sufficient reliability in these areas.

AI Agent Framework: Qianwen Po

Panther chose Alibaba's Qianwen Po as the Agent framework (the "Lobster"). Compared to alternatives like OpenClaw, Qianwen Po has a remarkably high level of completeness, and most critically, domestic deployment requires no VPN access, making it very beginner-friendly. The Agent's "brain" directly uses Qwen 3 59B.

Qianwen Po is an Agent development framework built on Alibaba's Tongyi Qianwen ecosystem, providing core capabilities including tool registration, multi-step planning, memory management, and multi-Agent collaboration. Compared to popular overseas frameworks like LangChain, AutoGen, and CrewAI, Qianwen Po's advantages include: native support for Qwen series model features (such as function calling format), comprehensive Chinese documentation and community support, and deployment that doesn't depend on overseas servers or APIs. In practice, developers can define multiple tools for the Agent (such as "read images from folder," "call OCR model," "generate Excel spreadsheet," "rename files," etc.), and the Agent will automatically orchestrate the calling sequence of these tools based on natural language instructions, achieving end-to-end automation.

An AI Agent (intelligent agent) is one of the core paradigms in current large model applications, fundamentally different from simple conversational AI. An Agent can not only understand natural language instructions but also autonomously plan task steps, invoke external tools (such as file system operations, API calls, code execution, etc.), dynamically adjust strategies based on intermediate results, and ultimately complete complex multi-step tasks. A typical Agent architecture includes: perception layer (receiving user input and environmental information), planning layer (decomposing complex tasks into subtasks), execution layer (invoking tools to complete specific operations), and reflection layer (evaluating execution results and deciding whether corrections are needed). In the reimbursement scenario, the Agent needs to coordinate OCR tools, file management, table generation, and other components—this is precisely where Agents excel compared to single LLM calls.

Reflections on AI's Role: Replacement, Not Substitution

In his summary, Panther raises a thought-provoking point: we use AI to replace the mechanical, repetitive parts of the reimbursement process—but note the wording is "replace" (specific tasks), not "substitute" (the entire human role).

The human role hasn't disappeared from the process; it has shifted from laborer to reviewer. You no longer need to spend hours copying and transcribing—you just need a few minutes to check AI's output. The time saved can be used for:

Judging whether reimbursements are compliant
Communicating special cases with finance
Or—leaving work early

This distinction between "replacing tasks" and "substituting humans" reflects an important consensus in current AI application deployment: in most real business scenarios, AI reliability hasn't yet reached the level of 100% unsupervised operation. Especially in finance, a single numerical error could cause tax compliance issues, so human review isn't just an efficiency consideration—it's a necessary risk control measure. This also explains why the industry tends to position current-stage AI as a "Copilot" rather than "Autopilot."

The right way to use AI: Find the bottlenecks in your workflow, use AI to break through them, then spend the saved time and energy on more valuable things. Let AI do what AI does best, and let humans do what humans do best.

关注潘瑟

Conclusion

This case demonstrates how AI Agents can be practically deployed in real office scenarios. The key isn't how complex the technology is, but whether the process breakdown is clear and whether AI intervention points are precise. For enterprise users with confidentiality requirements, local deployment solutions require some hardware investment but already offer practical value. Panther will be releasing detailed deployment tutorials in follow-up content, which is worth keeping an eye on.

From a broader perspective, this reimbursement automation case is actually a microcosm of the "RPA (Robotic Process Automation) + AI" convergence trend. Traditional RPA excels at handling repetitive operations with clear rules and fixed interfaces, but struggles with unstructured inputs (like various invoice image formats). The addition of large models fills exactly this gap, expanding automation's applicability from "structured data processing" to "unstructured information understanding." In the future, similar AI Agent solutions are expected to land in more office scenarios such as contract review, resume screening, and customer service ticket processing.

Key Takeaways

Core methodology for reimbursement automation: first map out the manual process, then identify steps AI can replace
Local deployment is suitable for confidential scenarios, with hardware costs ranging from ¥20,000 to ¥50,000
Recommended software stack: MinerU (OCR) + Qwen 3 59B (LLM) + Qianwen Po (Agent framework)
AI's role is to replace mechanical repetitive tasks; the human role shifts from executor to reviewer
The entire reimbursement process is compressed from hours to just minutes of review time