Analysis Pipeline Overview

Once the Episteme backend receives a request to analyze a specific stock ticker, it initiates a comprehensive, multi-step pipeline to gather, process, and structure investment theses and related data. This pipeline is designed to transform raw, unstructured information from sources like Reddit and Seeking Alpha into actionable insights for the user.

The core process begins after the initial data scraping phase and focuses on refining this data, extracting meaningful points, validating them through peer review (comments), and enriching them with contextual information.

The key stages involved in this pipeline include:

Scraping Integration: Combining results from different scrapers (e.g., Reddit, Seeking Alpha) run concurrently.
Error Handling: Implementing robust error handling for the entire pipeline and specific database operations.
Post Filtering: Efficiently removing posts that have already been analyzed and stored in the database to avoid redundant processing.
Database Updates: Saving new, unique posts and associated metadata to the database.
Point Extraction & Sentiment Analysis: Using AI (like GPT-4o) to extract factual investment thesis points from post content and assign sentiment scores.
Company Profile Fetching: Gathering basic company information (profile, financials) from external APIs like Financial Modeling Prep and Yahoo Finance.
AI-Generated Description: Creating a concise, AI-generated summary of the company based on fetched data and web context.
Duplicate Point Filtering: Employing text embeddings and cosine similarity (using models like MiniLM) combined with AI verification to identify and filter out semantically duplicate thesis points.
Ticker Sentiment Calculation: Aggregating individual point sentiments to calculate an overall sentiment score for the ticker.
Criticism Extraction: Analyzing comments associated with posts to extract valid criticisms, linking them to specific thesis points, and assigning validity scores.
Finalization: Saving all processed points, criticisms, and updated ticker information (like last_analyzed timestamp) to the database and preparing the final data structure for the frontend.

Each step is crucial for ensuring the final output presented to the user is accurate, unique, and comprehensive. The subsequent pages in this section detail the implementation specifics of each stage.