Credit Settlement for AI Async Tasks: Deduct First or Complete First?

Introduction: The "Unfair Terms" Problem in SaaS Platforms

When building AI generation SaaS platforms, a seemingly simple yet easily overlooked question is: When exactly should credits (or fees) be deducted?

Many platforms take the approach of deducting credits immediately after a user submits a task, regardless of whether the task ultimately completes successfully. For users, this is essentially an "unfair clause" — you've paid, but the service may never be delivered.

This article uses a real-world local SaaS generation platform as an example to thoroughly examine the correct sequencing of AI async tasks and credit settlement, helping developers build fairer and more robust billing systems.

bilibili source: Reject "Unfair Terms"! Thoroughly Clarify the Sequencing of AI Async Tasks and Credit Settlement

Why "Deduct on Submit" Is Wrong

The Typical Flawed Flow

In many AI generation platforms' initial implementations, the flow typically looks like this:

User authentication
User submits a generation task
Credits deducted immediately
Call AI provider API
Wait for async result to return

The core problem with this flow is: credits are deducted at step 3, but the AI provider's processing result isn't confirmed until step 5. If anything goes wrong in between — network timeout, provider downtime, generation failure — the user's credits have already been deducted without receiving the corresponding service.

From a software engineering perspective, this is fundamentally a transactional consistency problem. In monolithic applications, we can use database transactions to guarantee atomicity between "deducting fees" and "delivering service" — either both succeed or both roll back. But in distributed scenarios involving third-party AI providers, traditional ACID transactions cannot cross system boundaries. We simply cannot include external API calls within a local database transaction. This requires us to adopt different strategies at the architectural level to ensure eventual consistency.

The Uncertainty of Async Tasks

AI generation tasks (such as image generation, video generation, etc.) are typically asynchronous, with processing times ranging from seconds to minutes. During this period, several things can happen:

Provider processing failure
Generated results rejected by the system for not meeting requirements
Network interruption causing lost callbacks
Provider queue overflow causing tasks to be dropped

To understand why these failure scenarios are so common, you need to understand the typical architecture of AI providers. Most AI generation services (such as Midjourney, Stable Diffusion API, RunwayML, etc.) use a message queue-driven asynchronous processing architecture: user requests first enter a task queue (such as RabbitMQ, Kafka, or cloud messaging services), then GPU worker nodes pull tasks from the queue for processing. This architecture inherently has multiple failure points — queues can overflow, GPU nodes can OOM (Out of Memory), model inference can crash due to abnormal inputs, and generated results can be intercepted by content safety filters. According to industry statistics, mainstream AI generation APIs typically have task success rates between 92%-98%, meaning 2-8 out of every 100 tasks may fail.

In any of these failure scenarios, if credits have been pre-deducted, additional refund logic is needed. This not only increases system complexity but also easily leads to data inconsistency issues. For example, the refund operation itself can fail (database connection interruption, concurrency conflicts, etc.), causing permanent loss of user credits. Even worse, without comprehensive monitoring and alerting, these "silent failures" can persist undetected for extended periods, only surfacing when users complain.

The Correct Approach: Post-Completion Deduction Based on Webhook Callbacks

The Optimized Complete Flow

The correct approach is to move the credit deduction logic into the Webhook callback — only deducting after confirming the task has completed successfully. The optimized flow is:

User submits → Create task record (status: processing)
Call AI provider → Submit task to provider
Return to frontend → Inform user the task has been submitted and is being processed
Provider completes processing → Notifies server via Webhook callback
Webhook handling → Update task status + Deduct credits at this point
Notify user → Task complete, results available

Webhook is an HTTP-based event notification mechanism, essentially a "reverse API call." Unlike the traditional polling model — where the client actively queries the provider every few seconds asking "Is my task done yet?" — a Webhook is an HTTP POST request proactively sent by the provider to your pre-registered URL when an event occurs. This "push" model has significant advantages over polling: it reduces unnecessary network requests, lowers server load, and achieves near-real-time event notification. Currently, mainstream AI providers (such as Replicate, Stability AI, Runway, etc.) all support Webhook callback mechanisms. Developers specify a callback URL when submitting tasks, and the provider POSTs the results (including success/failure status, generated resource URLs, etc.) to that address upon completion.

However, Webhooks are not entirely reliable in distributed environments. Network jitter can cause callback requests to be lost, and the provider's callback service itself can experience failures. Therefore, mature systems typically employ a dual mechanism of "Webhook + polling fallback": primarily relying on Webhooks for real-time notifications while setting up scheduled tasks to poll the status of tasks that haven't received callbacks for an extended period.

Key Implementation Points

At the code level, the core change is moving the credit deduction API call from the task submission endpoint to the Webhook handler:

// Original location (incorrect): in the submit endpoint
// submitTask() → deductCredits() → callProvider()

// Correct location: in the webhook callback
// webhookHandler() → updateRecord(status: 'completed') → deductCredits()

This ensures that credits are only deducted after the provider confirms the task has completed successfully.

It's worth noting that Webhook handler functions need special attention to response time. Most providers have timeout limits for Webhook callbacks (typically 5-30 seconds). If your processing logic is too complex (involving multiple database operations, file downloads, etc.), the provider may determine the callback failed and trigger a retry. The best practice is: have the Webhook receiver quickly respond with a 200 status code to acknowledge receipt, then place the actual business processing (deduction, user notification, etc.) into a local message queue for asynchronous execution.

Edge Cases and Additional Considerations

Idempotency Design to Prevent Duplicate Deductions

Webhooks may be called multiple times due to network retries, so idempotency checks must be included in the deduction logic:

Check whether the task has already been charged
Use the unique task ID as an idempotency key
Use transactions at the database level to guarantee atomicity

Idempotency is a core concept in distributed system design, meaning that executing the same operation once or multiple times produces exactly the same effect. In Webhook scenarios, providers typically auto-retry when the initial callback times out or receives a non-2xx response (common strategy: exponential backoff with 3-5 retries), meaning your Webhook handler may be called multiple times for the same task. Without idempotency protection, users could be charged 2x or even 5x the credits.

Common idempotency implementation approaches include:

Database unique constraints: Set a unique index on the task ID in the credit transaction table; the database will directly reject duplicate inserts
State machine checks: Query the task status before deducting; only execute deduction when the status is "pending deduction," then immediately update to "deducted"
Redis distributed locks: Use the SET task:{id}:deducted true NX EX 3600 command, leveraging the NX (set only if not exists) feature for atomic idempotency checks
Optimistic locking with version numbers: Maintain a version number field in the task record; include the version number as a condition during updates to prevent concurrent modifications

In production environments, it's recommended to combine multiple approaches (e.g., state machine + database unique constraints) to form multi-layered protection.

Pre-Freeze Mechanism to Prevent Abuse

The post-deduction model requires preventing users from mass-submitting tasks when they have insufficient credits. A "pre-freeze" mechanism can be adopted:

On submission: freeze the corresponding credits (not deducted, but unavailable for other tasks)
On success callback: convert frozen amount to actual deduction
On failure callback: unfreeze credits, return to user

This approach balances user experience with platform security.

The pre-freeze mechanism originates from the pre-authorization model in the financial industry. When you check into a hotel, the front desk pre-authorizes your credit card (freezing a certain amount); upon checkout, settlement is based on actual consumption, and excess amounts are automatically unfrozen. The same pattern is widely used in e-commerce — inventory is frozen when an order is placed and only actually deducted after payment succeeds.

From a technical architecture perspective, pre-freezing is essentially a simplified application of the Two-Phase Commit (2PC) concept. Phase one (Prepare): freeze credits, indicating "I'm preparing to deduct this fee"; Phase two (Commit/Rollback): decide whether to confirm the deduction or roll back and release based on the task result. In database design, this typically requires maintaining two fields in the user credits table: balance (available balance) and frozen (frozen amount). The user's actual available credits = balance - frozen. When submitting a task, increase the frozen value; on callback, either deduct from balance and decrease frozen (success), or simply decrease frozen (unfreeze on failure).

It's particularly important to note that the freeze operation itself also requires atomicity guarantees. In high-concurrency scenarios, if a user rapidly submits multiple tasks in succession, over-freezing can occur. It's recommended to use conditional update statements like UPDATE ... WHERE balance - frozen >= required_amount, leveraging database row locks to ensure concurrency safety.

Timeout Handling Strategy

If the provider doesn't call back for an extended period, a timeout mechanism is needed:

Set a reasonable timeout duration (e.g., 30 minutes)
Automatically release frozen credits after timeout
Log anomalies for subsequent investigation

Timeout handling is a specific application of the Compensating Transaction pattern in distributed systems. In distributed environments where strong consistency cannot be guaranteed, we achieve eventual consistency through "forward operation + timeout compensation." In terms of implementation, there are typically several technical approaches:

Scheduled task scanning: Set up a Cron Job (e.g., executing every 5 minutes) to scan all tasks with "processing" status whose creation time exceeds the threshold, and execute timeout handling logic for these tasks
Delay queues: When creating a task, simultaneously publish a timeout check message to a delay queue (such as RabbitMQ's TTL queue, Redis Sorted Set, or cloud provider's delayed messages), which automatically triggers a check upon expiration
Dead Letter Queue (DLQ): Route failed Webhook messages to a dead letter queue, where dedicated consumers perform manual or automated exception handling

The timeout duration should be determined based on the provider's SLA (Service Level Agreement). For example, if the provider promises that 95% of tasks complete within 10 minutes, the timeout can be set to 30 minutes (allowing sufficient buffer). It's also recommended to set multi-level timeouts: trigger an alert to notify operations staff at 15 minutes, and automatically release frozen credits and mark the task as "timeout failure" at 30 minutes. For cases where a late callback arrives after timeout (i.e., a "late success"), additional reconciliation logic is needed to handle this edge scenario — whether to retroactively deduct credits and deliver results, or ignore the late callback, depends on specific business strategy.

Summary

For AI async task billing systems, the core principle is: Only those who deliver the service have the right to charge for it. Moving credit deduction from the task submission phase to the Webhook callback phase is a simple but significant architectural optimization. It not only protects user rights but also reduces refund disputes and customer service pressure, making the entire system more robust and reliable.

In actual development, it's recommended to combine the "pre-freeze + callback confirmation" dual mechanism to both prevent abuse and ensure users never pay for failed tasks. This is the billing logic that a responsible SaaS platform should implement.

From a broader perspective, this problem reflects the universal challenge SaaS platforms face in finding balance between service reliability and billing fairness. As AI services grow increasingly complex (multimodal generation, long-duration video rendering, multi-step workflows, etc.), the uncertainty of async tasks will only increase. Establishing the correct billing architecture early will lay a solid foundation for the platform's long-term development.