Running Scrapers
Let’s now start with the first step of the analysis process: The scraping.
As documented I wrote both of the scrapers (reddit and seekingalpha) at the very start of the project. Now comes the part where I implement them. I want to run the scrapers asynchronously for better efficiency and speed so I’ll use ThreadPoolExecutor to do this. I create a new scraping.py
file which will run the scrapers in separate threads track their progress and return the result of both scrapers in a dictionary when they’re completed.
The scraping functions do have some basic input parameters which can be used to adjust what gets scraped. These options can be submitted by the user in his initial request to create an analysis.
background_tasks: BackgroundTasks,
ticker:str = Query(..., description="The ticker symbol for the analysis"),
title: str = Query(..., description="The stock name"),
subreddits: Optional[List[str]] = Query(default=default_subreddits, description="List of subreddits to scrape"),
reddit_timeframe: Optional[str] = Query(default=default_reddit_timeframe, description="Timeframe to scrape posts(e.g., 'hour', 'day', 'week', 'month', 'year', 'all')"),
reddit_num_posts: Optional[int] = Query(default=default_reddit_num_posts, description="Number of reddit posts to scrape"),
seekingalpha_num_posts: Optional[int] = Query(default=default_seekingalpha_num_posts, description="Number of seekingalpha posts to scrape")