ChatGPT Financial Services Workflow in Practice: A Complete Guide from Investment Research to Financial Modeling

Overview

In her latest presentation, OpenAI Solutions Engineer Stephanie Anani provided a detailed demonstration of how ChatGPT delivers transformative value to individual professionals in the financial services industry. From investment research to financial modeling to decision presentations, a complete AI-assisted workflow is redefining how analysts work.

And I focus on things like SEC filings, earning transcripts, and investment.

So you can go in and actually audit these capabilities yourself.

as to how Blossom Bank likes to create decks.

The Core Philosophy: AI Unlocks Human Judgment

Stephanie opened by highlighting a pain point common across enterprises: employees spend enormous amounts of time on work that "must be done but doesn't require human judgment." OpenAI's survey data shows that 75% of ChatGPT users can now complete tasks they previously couldn't accomplish independently — this isn't just about speed, it's about elevating individual operational capabilities.

This philosophy echoes the long-discussed "cognitive load" theory in management science. In financial services, analysts spend a significant portion of their day on mechanical tasks like data cleaning, formatting, and report layout. The work that truly requires professional judgment — risk assessment, investment thesis construction, anomaly detection — often gets squeezed into extremely narrow time windows. This mismatch of "highly paid talent doing low-value work" has long been an efficiency bottleneck on Wall Street and at major financial institutions.

To support this capability uplift, OpenAI believes employees need support across five key dimensions, and ChatGPT is the platform designed to deliver it — positioning AI as "cognitive infrastructure" rather than a simple productivity tool.

GPT 5.5: An AI Model Optimized for Financial Services

Industry Knowledge Embedded in Model Intelligence

One of the major announcements in this presentation was the direction of GPT 5.5. OpenAI has directly embedded financial services workflows into the model's intelligence layer, meaning the model not only understands general language but also deeply comprehends the professional logic and workflows specific to the financial industry.

Traditional large language models face a core challenge in finance: while general-purpose models have powerful language capabilities, they lack deep understanding of financial professional logic — such as discount rate selection in DCF valuations, synergy calculations in M&A transactions, and multi-factor frameworks in credit ratings. GPT 5.5's embedding of financial services workflows into the model's intelligence layer means that both pre-training and post-training stages have been specifically optimized for financial scenarios, enabling the model to understand the business meaning and logical relationships behind industry terminology like a seasoned financial professional.

OpenAI Banker Bench Benchmark

To quantify model performance on financial tasks, OpenAI developed the "Banker Bench" benchmark tool. According to Stephanie's presentation, GPT 5.5 outperforms all other frontier models on financial services tasks, achieving best-in-class performance. This provides financial institutions with a quantifiable reference for selecting AI tools.

The launch of Banker Bench reflects an important trend in the AI industry: general-purpose benchmarks (such as MMLU and HumanEval) are no longer sufficient to measure a model's real-world performance in vertical domains. The industry needs evaluation standards that more closely mirror actual work scenarios. This is similar to MedQA in the medical field, but focused on real-world tasks across financial sub-sectors like investment banking, asset management, and commercial banking — including financial statement analysis, deal structuring, risk pricing, and other specialized scenarios.

Trusted Data Sources: Financial-Grade Context Connections

Financial services demands extremely high data source credibility. A single erroneous data point could lead to investment decision mistakes worth millions of dollars, or even trigger compliance risks. OpenAI has established App Connectors with several authoritative financial data providers, including:

Dow Jones: Its Factiva platform is one of the world's largest business news and information databases, covering 33,000 sources across more than 200 countries
LSEG (London Stock Exchange Group): Formerly Refinitiv/Reuters Financial, providing real-time market data, trade execution, and risk management tools
S&P: The global authority on credit ratings, indices, and data analytics

These connections ensure that when analysts use ChatGPT for deep research, data sources are verified, up-to-date, and authoritative — not generic web search results. From a technical architecture perspective, this design is essentially an enterprise-grade implementation of RAG (Retrieval-Augmented Generation) — combining the model's generative capabilities with trusted external data sources to fundamentally address the critical risk of LLM "hallucinations" in financial scenarios.

Complete Workflow Demo: From Research to Investment Decisions

Stephanie walked through a complete demonstration using a realistic scenario: as an investment analyst at the fictional Blossom Bank, she needed to prepare an investment analysis on QXO for an Investment Committee (IC) meeting within a single day.

Step 1: Deep Research Generates an Investment Dossier

Using ChatGPT's Deep Research feature, connected to trusted data sources like Dow Jones and focused on key information such as SEC filings and earnings call transcripts, the system automatically generates an Investment Dossier. It first creates a research plan that users can review and adjust before execution.

Deep Research is fundamentally different from regular ChatGPT conversations. Regular conversations provide instant single-turn or multi-turn responses, while Deep Research operates like a human researcher — formulating a research plan, executing information retrieval step by step, cross-referencing multiple sources, and ultimately synthesizing a structured report. In financial scenarios, the system automatically identifies and prioritizes SEC (Securities and Exchange Commission) filings including 10-K annual reports, 10-Q quarterly reports, and 8-K current event reports, as well as earnings call transcripts. These documents are the foundation of investment analysis — 10-K filings contain complete financial statements and Management Discussion & Analysis (MD&A), while 8-K filings disclose material events such as acquisitions and executive changes.

During the research process, ChatGPT uncovered a critical piece of information: QXO was acquiring Kodiak, which had significant implications for the investment decision. This discovery perfectly illustrates the value of Deep Research — automatically identifying key events with material impact on the investment thesis from within massive volumes of information.

Step 2: ChatGPT in Excel Generates Financial Models

This was the standout feature of the entire workflow. Using a pre-configured "Blossom Bank Three-Statement Model" Skill, analysts can generate a complete financial model Excel workbook with brief instructions — no need to write detailed prompts.

The Three-Statement Model is a foundational tool in investment banking and financial analysis. It links the Income Statement, Balance Sheet, and Cash Flow Statement through accounting logic. For example, net income from the Income Statement flows into retained earnings on the Balance Sheet while also serving as the starting point for the Cash Flow Statement. Building a complete three-statement model typically requires 8–20 hours of work from a junior analyst, including historical data entry, driver assumption setup, formula linking, and error checking.

Key features include:

Not a black box: ChatGPT leaves annotations in cells about assumptions and data sources, ensuring every number is traceable
Formula-driven: All numbers are backed by auditable formulas rather than static values — meaning senior analysts can modify any assumption parameter and the model will automatically recalculate all linked values, preserving the flexibility of traditional financial models
Scenario analysis: Quickly generates valuations across bear case, base case, and bull case scenarios

Bear Case, Base Case, and Bull Case scenario analysis is a standard methodology in investment decision-making. Each scenario corresponds to different macroeconomic assumptions, industry growth expectations, and company-specific risk factors. For example, when evaluating a building materials distribution company, the bull case might assume a strong real estate market recovery with full realization of M&A synergies, while the bear case might assume persistently high interest rates and integration challenges. By comparing valuation ranges across three scenarios (typically using DCF discounted cash flow analysis or comparable company multiples), the investment committee can more clearly understand the risk-return profile of an investment. Traditionally, each additional scenario meant hours of extra model adjustment work for analysts.

Stephanie emphasized that work that previously took hours or even days can now be completed in minutes. "We can almost keep up with the speed of the market."

Step 3: Auto-Generating Decision Presentations

The final step transforms all analysis into a presentation ready for the IC meeting. Using the "Blossom Bank Deck Build" Skill combined with Extended Thinking mode, ChatGPT integrates the deep research report and valuation model into a structured decision presentation.

Extended Thinking is a capability introduced by OpenAI that allows the model to engage in deeper reasoning before generating its final answer. For complex tasks like generating investment decision presentations — which require synthesizing multi-dimensional information and weighing trade-offs — Extended Thinking helps the model better organize argumentative logic, identify potential contradictions, and ensure consistency between conclusions and evidence.

The generated presentation includes:

A clear decision recommendation (e.g., "conditional approval")
Visualized charts displaying valuation results and key decision points
Speaker notes containing decision logic explanations, supporting end-to-end auditability

Skills: Standardized Packaging of Enterprise Knowledge

The "Skill" concept that appeared repeatedly throughout the demo deserves deeper understanding. It is essentially the packaging of an organization's standardized work methods (such as specific formatting requirements for three-statement models or style guidelines for presentations) into reusable instruction sets.

The design philosophy behind Skills draws from the concept of "encapsulation" in software engineering and "Standard Operating Procedures" (SOPs) in enterprise management. In large financial institutions, AI proficiency varies enormously across different teams and seniority levels. Prompt Engineering has itself become a specialized skill, but expecting every analyst to become a prompt expert is unrealistic. Skills encapsulate an organization's best practices — including output format requirements, data source priorities, analytical framework preferences, and compliance checklists — into standardized, one-click modules.

This solves a practical problem: not every employee excels at writing prompts, but through Skills, organizational best practices can be standardized and applied across every team member's work. From a knowledge management perspective, Skills also serve as a mechanism for converting tacit knowledge (the experiential judgment and working methods of senior analysts) into explicit knowledge (reusable AI instruction sets) — which has significant implications for the talent mobility and knowledge transfer challenges prevalent in financial institutions.

Core Insight: AI Doesn't Replace Judgment — It Creates Space for It

In her summary, Stephanie stated clearly: "AI doesn't replace the need for human judgment — it just frees up your time so you can use that time to make those judgments."

The value of this workflow lies not in any single feature, but in the complete value chain — from trusted context to deep research, from research to financial models, from models to decision points that drive the business forward. In financial services, an industry that relies heavily on human professional judgment, AI's role is precisely positioned as a "judgment amplifier" rather than a "judgment replacement."

Practical Implications for the Financial Services Industry

For financial services institutions, this AI workflow brings changes across several dimensions:

Multiplied junior analyst productivity: Repetitive data collection and model-building work is dramatically compressed, enabling junior analysts to participate in high-value judgment work much earlier
Faster decision-making: Compressed from days to hours — in rapidly changing market environments, this speed advantage can translate directly into investment returns
Auditability assurance: Every step includes source citations and logic explanations, meeting the compliance requirements of financial regulators (such as the SEC, FCA, MAS, etc.) for investment decision processes
Standardized organizational knowledge: Best practices are codified through Skills, reducing training costs while minimizing knowledge loss from staff turnover

Of course, this also raises deeper questions about redefining the value of junior roles, managing model hallucination risks, and assigning responsibility for AI-assisted decisions — issues the industry should continue to monitor. On the regulatory front in particular, financial regulators across countries have yet to establish a unified framework for AI's role in investment decisions, which will be a shared challenge for the industry in the years ahead.