Building an interactive nlp ui for a custom nlp pipeline in spacy

4 min readFeb 18, 2023

Summary

For the stonk pipeline I tend to focus on stock tickers from various csv files automatic generate.

This code initializes the spacy library with some custom configurations and data, such as company names, stock symbols, and exchange information. It also adds an entity_ruler to the pipeline, which allows the library to identify specific entities in text, such as stocks and companies, based on pre-defined patterns. The entity_ruler is populated with patterns using data from various CSV files.

def init_nlp(exchange_data_path: str = "https://raw.githubusercontent.com/dli-invest/fin_news_nlp/main/nlp_articles/core/data/exchanges.tsv", indicies_data_path: str = "https://raw.githubusercontent.com/dli-invest/fin_news_nlp/main/nlp_articles/core/data/indicies.tsv"):
    SPLIT_COMPANY_INTO_WORDS = False
    BEAR_MARKET_ADJUSTMENT = True
    nlp = spacy.load("en_core_web_sm")
    ticker_df = pd.read_csv(
                "https://raw.githubusercontent.com/dli-invest/eod_tickers/main/data/us.csv"
            )
    ticker_df = ticker_df.dropna(subset=['Code', 'Name'])
    ticker_df = ticker_df[~ticker_df.Name.str.contains("Wall Street", na=False)]

This code reads a CSV file containing stock ticker data using the pandas library. It then filters out rows that have missing values in the “Code” and “Name” columns, or that contain the string “Wall Street” in the “Name” column. These operations…

Summary

Written by David Li

No responses yet