AI Scientist: A Deep Dive into Sakana AI's Automated Research Framework

Sakana AI's AI Scientist uses LLMs to automate the entire scientific research pipeline from idea to paper.
AI Scientist by Sakana AI is an open-source framework that leverages Large Language Models to automate the complete scientific research workflow. It operates through four stages—idea generation, experiment execution, paper writing, and automated review—primarily targeting computational experiments. While it represents a milestone in AI-driven science, current limitations include shallow innovation, restricted experimental scope, and reliability concerns requiring human oversight.
Project Overview
AI Scientist is a groundbreaking open-source project launched by Sakana AI, designed to leverage Large Language Models (LLMs) to automate the entire scientific research workflow. Large Language Models are deep learning models based on the Transformer architecture that learn statistical patterns and knowledge representations by pre-training on massive text datasets. Representative models include OpenAI's GPT series, Google's PaLM/Gemini, and Anthropic's Claude. These models possess powerful text generation, reasoning, and code-writing capabilities, with parameter scales ranging from billions to trillions. The core breakthrough of LLMs lies in Emergent Abilities—when model scale exceeds a certain threshold, they suddenly exhibit capabilities not explicitly taught during training, such as logical reasoning, mathematical proofs, and code debugging. This provides the technical foundation for complex applications like AI Scientist.
The project is primarily developed in Jupyter Notebook on GitHub, demonstrating how AI can complete an entire research loop—from proposing hypotheses, designing experiments, and running code to writing papers.

Background on Sakana AI and AI Scientist
Who is Sakana AI
Sakana AI is an artificial intelligence research company headquartered in Tokyo, founded by former Google Brain researchers. Google Brain was one of Google's most influential AI research laboratories, which merged with DeepMind in 2023 to form Google DeepMind. Google Brain's research achievements include the TensorFlow framework, the Transformer architecture (the birthplace of the famous paper "Attention Is All You Need"), and numerous cutting-edge AI technologies. One of Sakana AI's co-founders, Llion Jones, is one of the eight authors of the Transformer paper, while another co-founder, David Ha, served as a research scientist on the Google Brain Tokyo team. This top-tier research background gives Sakana AI deep expertise in fundamental AI research.
The company name "Sakana" comes from the Japanese word for "fish," symbolizing the idea of solving complex problems through collective intelligence, much like a school of fish. The company focuses on exploring nature-inspired AI methods and is dedicated to building more efficient and creative AI systems.
The Core Philosophy of AI Scientist
The core philosophy of AI Scientist is to make AI a "full-stack scientist." In traditional research workflows, researchers invest significant time in literature review, experimental design, code writing, and paper drafting. AI Scientist automates these steps, enabling AI to:
- Autonomously generate research ideas: Propose new research directions based on existing literature and domain knowledge
- Design and execute experiments: Write experimental code, run it, and collect data results
- Write complete papers: Generate full papers with abstracts, methods, experiments, and conclusions following academic standards
- Conduct peer review: Perform automated review and scoring of generated papers
Detailed Technical Architecture of AI Scientist
Four-Stage Workflow
The automated research workflow of AI Scientist is divided into four stages:
-
Idea Generation: The LLM generates multiple potential research directions based on given research templates and domain background, evaluating the feasibility of each idea. This stage draws on the concept of "brainstorming," using multi-turn dialogue and self-reflection mechanisms to help the model balance breadth and depth, filtering out the most research-worthy directions.
-
Experiment Execution: The system automatically writes and modifies experimental code, runs experiments, and records result data. The key challenge in this stage is code correctness and robustness—the AI needs to handle runtime errors, adjust hyperparameters, and automatically debug and retry when experiments fail, similar to a human researcher's iterative debugging process.
-
Paper Writing: Experimental results are organized into a LaTeX paper that conforms to academic standards, with a complete section structure. LaTeX is a professional document typesetting system based on TeX, widely used in academia, especially in computer science, mathematics, and physics. Unlike WYSIWYG editors like Word, LaTeX uses markup language to describe document structure, enabling precise control over formula typesetting, reference management, figure numbering, and other core requirements of academic papers. AI Scientist's choice to generate LaTeX-formatted papers means its output can be directly submitted to the arXiv preprint platform or academic conferences, meeting the formatting requirements of mainstream academic publishing.
-
Automated Review: Another LLM instance performs multi-dimensional review and scoring of the paper. Traditional academic peer review is the core quality control mechanism of scientific publishing, typically involving 2-4 anonymous domain experts evaluating a paper's novelty, methodological correctness, experimental adequacy, and clarity of presentation. However, traditional review faces issues such as long cycles (months or even years), heavy reviewer burden, and subjective bias. The automated review module introduced by AI Scientist attempts to simulate this process using LLMs. While it currently cannot fully replace the deep judgment of human experts, it can serve as a preliminary screening tool to quickly identify obvious flaws in papers and provide reference opinions for subsequent human review.
Tech Stack and Development Environment
The project uses Jupyter Notebook as its primary development environment, a choice driven by several key considerations:
- Facilitates interactive development and debugging, allowing researchers to verify each step incrementally
- Suitable for intuitively displaying experimental processes and intermediate results
- Lowers the barrier for project reproduction and learning, encouraging community participation
Jupyter Notebook originated from the IPython project, with its name derived from the combination of three programming languages: Julia, Python, and R. It uses a "cell" organizational structure that allows users to interleave code, text explanations, and visualization outputs to form an executable interactive document. This format is particularly well-suited for exploratory research in data science and machine learning, making AI Scientist's workflow more transparent and reproducible for external researchers.
Significance and Limitations of AI-Automated Research
Pioneering Significance
AI Scientist represents an important milestone in the field of AI-assisted research. It is not merely a code generation tool but attempts to simulate the entire scientific thinking process—from identifying problems to validating hypotheses. This has profound implications for accelerating scientific discovery and lowering research barriers, especially providing new possibilities for resource-limited research teams.
From a broader perspective, the emergence of AI Scientist echoes the global research trend of "AI for Science" (AI-driven scientific discovery). In recent years, AI has achieved breakthrough results in protein structure prediction (AlphaFold), mathematical theorem proving (AlphaProof), and materials discovery (GNoME). What makes AI Scientist unique is that it is not an AI tool targeting a specific scientific problem, but rather attempts to build a general automated research framework that gives AI cross-disciplinary research capabilities.
Analysis of Current Limitations
However, we also need to rationally assess AI Scientist's limitations:
- Limited innovation depth: Currently, AI-generated research ideas are mostly incremental improvements, making it difficult to produce disruptive innovations. This is related to the fundamental nature of LLMs—models essentially sample and combine within the distribution of existing knowledge, while true scientific breakthroughs often require intuitive leaps beyond existing paradigms.
- Restricted experimental scope: Primarily applicable to pure computational experiment scenarios (such as machine learning model training, numerical simulations, etc.), unable to handle research requiring physical experimental equipment. Wet lab experiments in biology, particle collision experiments in physics, and synthesis experiments in chemistry still remain beyond its capabilities.
- Inconsistent paper quality: Automatically generated papers still have significant room for improvement in logical rigor and depth of analysis. Particularly in the Discussion section, AI often struggles to analyze the significance, limitations, and potential impact of results as deeply as human researchers.
- Reliability risks: AI may produce conclusions that appear reasonable but are actually incorrect (the "hallucination" problem), requiring human review and oversight. In scientific research, the propagation of erroneous conclusions can have serious consequences, making human supervision indispensable at the current stage.
Future Outlook: From Research Tool to Research Partner
Although the AI Scientist project is still in its early stages, it points to a clear development direction: AI will gradually evolve from an auxiliary research tool into a true research partner. With the continued improvement of LLM capabilities and the development of multimodal technology, future AI scientists are expected to handle more complex interdisciplinary research problems and produce more valuable scientific discoveries.
Multimodal AI refers to artificial intelligence systems capable of simultaneously processing and understanding multiple data forms such as text, images, audio, and video. Representative technologies include GPT-4V (visual understanding), DALL-E (image generation), and various vision-language models. In scientific research, multimodal capabilities mean AI can analyze microscope images, understand photos of experimental setups, interpret chart data, and even process 3D representations of molecular structures. This is crucial for extending AI Scientist to disciplines that rely on visual data, such as biology, materials science, and astronomy, and represents a key technical pathway from pure computational experiments to broader research scenarios.
Furthermore, as Agent technology matures, future AI scientists may no longer be limited to the capabilities of a single model but instead have multiple specialized Agents collaborating to complete research—one responsible for literature retrieval, one for experimental design, one for data analysis, and one for paper writing, forming a division-of-labor collaboration model similar to human research teams.
For researchers, rather than worrying about being replaced by AI, it's better to actively learn how to collaborate efficiently with AI—delegating repetitive data processing and literature organization to machines while devoting more energy to the higher-level scientific thinking that requires human intuition and creativity. The human-AI collaborative research model is likely to become the new normal in academia over the next decade.
Key Takeaways
Related articles

The Clotilda: Underwater Archaeological Discovery of America's Last Slave Ship
The Clotilda, America's last slave ship, was discovered by underwater archaeologists in Alabama nearly 160 years after sinking. Learn about the search, key evidence, and other slave trade shipwreck discoveries.

Sakana AI in Practice: Reshaping Banking Lending Operations with AI Agents — Technology and Strategy
Deep dive into how Sakana AI applies AI Agents to banking lending operations, covering end-to-end support from information gathering to approval document generation, plus technical challenges and human-AI collaboration design.

Instagram Enters the Living Room: Long-Form Video, Series, and Live Streaming Challenge Netflix
Instagram is building a TV app with long-form video, episodic series, and live streaming to challenge Netflix. Deep analysis of its living room strategy and industry impact.