Building a Generative Recommendation System with Go and AI Agents: Architecture Design and Practical Analysis

Introduction: Why Traditional Recommendation Systems Need Large Language Models

The rapid iteration of AI technology is reshaping the entire software development ecosystem, with traditional frontend and backend development accelerating toward AI full-stack evolution. AI Agent development, as one of the hottest emerging fields in the past two years, has attracted the attention of a massive number of developers. Recently, an open-source project initiated by a team of student interns sparked considerable discussion in the community — they built a generative recommendation system powered by large language models using Go, codenamed "Ride the Wind and Waves" (乘风破浪).

The highlight of this project lies not only in the technical solution itself but also in its positioning: an entry-level open-source project aimed at AI Agent beginners. For developers who want to break into AI Agent development but struggle to find a suitable hands-on project, this is a starting point worth paying attention to.

Overall Architecture of the Go-Based Recommendation System: The Dual-Engine of "Wind" and "Waves"

The project name "Ride the Wind and Waves" corresponds to a clear architectural layering:

"Wind" (Backend Service Layer): Handles the construction of the traditional backend architecture, including Go server infrastructure, RESTful API design, data storage, and other classic engineering practices. This layer represents the inheritance and refinement of mature technology stacks.
"Waves" (Recommendation System Layer): The core of the project, representing innovative exploration in the AI era. The team leveraged large language models to optimize the traditional recommendation system across multiple dimensions, deeply integrating Agent capabilities into the recommendation pipeline.

The project founder explicitly stated that the backend exists to serve the recommendation system — the recommendation system is the true focus of the entire project. This architectural philosophy reflects an important trend: In the age of AI applications, engineering capabilities at the infrastructure level remain indispensable, but the real differentiating value comes from deep integration of AI capabilities.

Three Core Differences Between Generative and Traditional Recommendation Systems

Recommendation systems are far from a new topic. From early collaborative filtering to feature-engineering-based machine learning approaches, movie and music recommendation systems have long been classic case studies in computer science courses.

It's worth recalling that Collaborative Filtering, the most classic algorithmic paradigm in the recommendation system field, was born in the 1990s. Its core idea is "birds of a feather flock together" — by analyzing the historical behavioral data of user groups, it identifies users with similar interests or items liked by similar users to generate recommendations. Subsequently, recommendation systems went through multiple rounds of technological iteration: from matrix factorization (e.g., SVD), content-based recommendation, to deep learning recommendation models (e.g., Wide & Deep, DeepFM, DIN, etc.). Each generation of technology attempted to solve the same core problem: how to more accurately model the matching relationship between user preferences and item features. The common limitation of traditional approaches is that they are essentially searching for patterns within the statistical distributions of existing data, lacking the ability to deeply understand content semantics.

So what exactly makes this LLM-based generative recommendation system "new"?

Difference 1: Deep Semantic Understanding Based on LLM Embedding

Traditional recommendation systems primarily rely on manual tagging or keywords to describe item features. This approach has obvious shortcomings — tag granularity is coarse and struggles to capture the deep semantics of content.

This project introduces the Embedding capabilities of large language models, constructing article content into a three-level hierarchical structure:

Level 1: The article's main direction and core themes
Level 2: Subheadings of each paragraph, i.e., the paragraph structure hierarchy
Level 3: Chunk splitting for overly long paragraphs

To understand the underlying principles of Embedding technology: Embedding is a technique that maps high-dimensional discrete data (such as text and images) into a low-dimensional continuous vector space. The reason LLM Embedding is so powerful is that it has been pre-trained on massive corpora, enabling it to map semantically similar content to adjacent positions in the vector space. For example, "Machine Learning Beginner's Guide" and "Deep Learning Fundamentals Tutorial" have different keywords but would be very close in the LLM Embedding space. The chunk splitting mentioned in the article is a standard practice in the RAG (Retrieval-Augmented Generation) domain — since large language models have limited context windows, long documents need to be split into appropriately sized segments for individual Embedding to ensure the precision of semantic representation.

During the recall stage, the system employs both coarse recall and fine recall strategies. Coarse recall performs initial filtering based on titles, cover images, manual tags, and content types; fine recall leverages vector search technology to dive into second-level tags and paragraph content for semantic matching. Vector Search is a technology for efficiently finding the most similar vectors in a vector space. Commonly used algorithms include HNSW (Hierarchical Navigable Small World), IVF (Inverted File Index), and other approximate nearest neighbor search algorithms, used in conjunction with specialized vector databases such as Milvus, Pinecone, and Weaviate.

The team also plans to introduce a graph database in subsequent versions to build stronger relational indexing of article structures. Graph Databases store and query data using graph structures (nodes and edges), with representative products including Neo4j, Amazon Neptune, and TigerGraph. In recommendation systems, the value of graph databases lies in their natural ability to express complex relationship networks between entities — for example, "User A read Article B," "Article B belongs to Topic C," "Topic C is related to Topic D," and other multi-hop relationships. Through graph traversal and graph algorithms (such as PageRank, community detection, etc.), the system can discover implicit associations that traditional relational databases struggle to express efficiently. Knowledge Graphs, as an important application form of graph databases, can organize item attributes, categories, creator information, and more into structured knowledge networks, significantly improving the explainability and accuracy of recommendations.

Difference 2: Three-Layer User Memory Architecture

This is one of the project's most innovative designs. Traditional recommendation systems typically perform statistical modeling based on users' historical behavior, while generative recommendation systems need to "feed" user profiles to the large model so it can truly understand user preferences.

The team designed a three-layer memory architecture to build comprehensive user profiles:

Long-term Memory: Initial interest preferences obtained through questionnaires during user registration, serving as the recommendation baseline during the cold start phase
Short-term Memory: Records of content the user has recently browsed and interacted with, reflecting immediate interests
Periodic Memory: Captures patterns of interest changes across different time periods — for example, preferring technical articles in the morning and lighter content in the evening

The long-term memory design directly addresses one of the most classic challenges in the recommendation system field — the Cold Start Problem. Cold start refers to the situation where, when a new user registers or a new item goes live, the system lacks sufficient historical interaction data to make effective recommendations. Traditional solutions include demographic-based recommendations, popular content fallbacks, and so on. The introduction of large language models provides a new approach to this problem — even with minimal user behavioral data, the model can perform semantic-level understanding and reasoning based on limited preference descriptions to generate reasonable initial recommendations. This represents a significant advantage over traditional statistical methods in cold start scenarios.

Future plans also include incorporating search history recall and trending information fusion to further enrich the dimensions of user profiles.

Difference 3: AI Directly Participates in Recommendation Decisions and Content Generation

This is the core meaning of the word "generative." In traditional recommendation systems, the system merely filters and ranks from an existing content pool; in this project, AI directly participates in recommendation decisions and can even generate personalized content to recommend to users.

What type of articles a user should see — this judgment is made by the large language model, rather than by simple rule engines or statistical models. The advantage of this approach is its ability to process more complex contextual information and deliver more personalized recommendations that better match user needs.

It's worth noting that this is not an original idea from the team. The project documentation references a practical case shared by Xiaohongshu (Little Red Book) at CNCC — integrating large language models into recommendation systems is already a direction being actively explored across the industry.

Five-Dimensional User Feedback System: Fine-Grained Capture of User Interests

In terms of capturing user interests, the team designed a five-level feedback mechanism aimed at comprehensively reconstructing users' true preferences:

Explicit Feedback: Active behaviors such as likes, bookmarks, and comments — the strongest signal intensity
Implicit Consumption Behavior: Passive signals such as article dwell time, whether the full article was read, and quick scrolling past
Negative Feedback: Rejection behaviors such as clicking "not interested" or skipping, used to filter irrelevant content
Periodic Behavior: Analyzing patterns of interest changes across different time periods (early morning, commute, weekends)
Geolocation and Context Information: For example, recommending a newly opened restaurant nearby (not yet implemented)

The design of this feedback system is quite comprehensive. Although some features have not yet been fully implemented, the overall framework demonstrates the team's deep thinking about user modeling in recommendation systems.

Project Status and Iteration Roadmap

The project has completed its first version of development, using Alibaba's Tongyi Qianwen (Qwen) model API as the LLM foundation. The team also candidly acknowledged the shortcomings of the current version:

Only a single model API is supported; more LLMs will be integrated in the future
The recommendation system does not yet have online learning capabilities and cannot continuously self-optimize from user positive/negative feedback
Plans to introduce a multi-Agent collaboration system to improve recommendation decision accuracy

Regarding Online Learning, this is a training paradigm in machine learning that contrasts with traditional Batch Learning. In recommendation systems, online learning means the model can continuously update its parameters based on real-time user feedback (such as clicks, skips, purchases, etc.) without waiting for offline retraining. This is crucial for capturing immediate changes in user interests — for example, when a user suddenly becomes interested in a trending topic, an online learning system can adjust its recommendation strategy within minutes. Common online learning approaches in industry include incremental Embedding updates, real-time feature engineering, and reinforcement learning-based Exploration-Exploitation strategies such as Multi-Armed Bandit and Contextual Bandit. The current version's lack of this capability means the model cannot self-evolve from post-deployment user interactions, making this a key optimization direction for future iterations.

Multi-Agent Collaboration is one of the most cutting-edge research directions in the AI Agent field today. Compared to a single Agent, multi-Agent systems achieve stronger reasoning capabilities and higher task completion quality by having multiple AI Agents with different roles and capabilities collaborate on complex tasks. Typical frameworks include AutoGen (Microsoft), CrewAI, MetaGPT, and others. In recommendation system scenarios, multi-Agent collaboration can manifest as: one Agent responsible for user profile analysis, one Agent for content understanding and recall, one Agent for ranking decisions, and another Agent for controlling result diversity and novelty. The Agents reach final recommendation decisions through structured message passing and negotiation mechanisms. The advantage of this architecture lies in separation of concerns, independent optimization, and a decision-making model that more closely resembles human team collaboration.

In terms of development cadence, the team has laid out a clear three-phase plan:

Near-term (during spring recruitment season): Maintain the existing version, handle minor feature requests and bug fixes
Mid-term (summer 2025): Organize the team for a major version iteration, adding multi-modal content support such as video, and comprehensively upgrading the recommendation algorithms
Long-term: Build a sustainably operated open-source platform, exploring community activities such as Open Source Summer of Code

Three Takeaways for AI Agent Developers

Although this project is still in its early stages, it conveys several signals worth developers' attention:

First, AI tools are lowering the development barrier for complex systems. As the project founder noted, with AI assistance, developers can experience the entire product development process from a higher vantage point. Even a student team can build an AI application system with considerable complexity.

Second, generative recommendation is a direction worth deep exploration. Based on the practices of major companies like Xiaohongshu, the combination of large language models and recommendation systems is not hype — it's a technology trend that is actively being implemented. Developers who master this direction will have a clear advantage in the job market.

Third, participating in open-source projects is the best path to getting started with AI Agent development. For those who want to break into AI Agent development, participating in a real open-source project is far more efficient than working in isolation. The team also welcomes developers with zero experience to take their first step by submitting a PR.

At a time when demand for AI Agent development positions continues to grow, open-source projects like this one — tightly combining theory with practice — may be exactly the "first hands-on project" that many developers need.