The data and code from my papers posted below may be used for non-commercial purposes free of charge. They are provided as is, without any guarantee of correctness. Please reference the relevant paper for construction details. Email me if you encounter any issues.
Text Selection (with Bryan Kelly and Alan Moreira)
Journal of Business & Economic Statistics, 2021, Vol 39, Issue 4, pp. 859–879 | Citation
HurdleDMR.jl Code for our HurdleDMR package for Julia. It can be called from many other programming languages like Python and R. The package allows for computationally efficient distributed estimation of the multiple hurdles over parallel processes, generating sufficient reduction projections, and inverse regressions with selected text. It allows for elastic net type convex combinations of L1 (Lasso) and L2 (Ridge) regularization as in glmnet (Friedman et al., 2010), and for concave regularization paths as in gamlr (Taddy, 2017).
Journal of Financial Economics, 2017, Vol 126, Issue 1, pp. 1–35 (Lead article) | Citation
Intermediary capital risk factor, 1970Q1–2018Q3 Quarterly, monthly, and starting 2000-01-01 daily too. Also includes portfolio returns used in our cross-sectional tests. See readme.txt inside for details and replication code.
Journal of Financial Economics, 2017, Vol 123, Issue 1, pp. 137–162 | Citation
NVIX, 1889–07 to 2016–03 Also includes a decomposition into categories.
Phrase counts (ngram frequencies) of Wall Street Journal frontpage titles and abstracts. See readme.txt inside for details and replication code.
structureofnews.com An interactive website that allows users to visualize and inspect a wide variety of features from our estimated topic model. Also allows researchers to download our WSJ news attention time series for use in their own projects.