Topic models are a simple and popular tool for the statistical analysis of textual data. Their identification and estimation are typically enabled by assuming the existence of anchor words; that is, words that are exclusive to specific topics. In this paper we show that the existence of anchor words is statistically testable: There exists a hypothesis test with correct size that has nontrivial power. This means that the anchor-words assumption cannot be viewed simply as a convenient normalization. Central to our results is a simple characterization of when a column-stochastic matrix with known nonnegative rank admits a separable factorization. We test for the existence of anchor words in two different datasets derived from monetary policy discussions in the Federal Reserve and reject the null hypothesis that anchor words exist in one of them.
View the Full Working Paper
Working Paper
On the Testability of the Anchor-Words Assumption in Topic Models
April 2025
WP 25-14 – What does the Fed talk about in its monetary policy discussions? We introduce a new statistical methodology to analyze text documents, and we use that methodology to recover the topics discussed during FOMC meetings.
Share
Download