Deep Learning- It works in practice, but does it work in theory?
There is a big gap between theoretical and applied research progress for deep learning. This is a deviation from past trends in ML, such as kernel methods, which has deep roots in theory, and has enjoyed interest from both practitioners and theoreticians.
Uninformative priors say something informative about the posterior predictions
I took a class in Bayesian statistics in year two of graduate school. I wouldn't call myself a Bayesian -- as an industry practitioner, I take the "whatever works" approach. It's done me a lot of good to have a broad toolset, pulling out one tool or another based on what I think is best for the business problem at hand. This tends to be the attitude in ML research, but statistics can still be pretty clan-ny with various "Bayesian" societies and affiliations.
Spurious correlation, unit roots and cointegration
I learned about the spurious regression problem during a course at the Booth school of business. It’s well known among econometricians because it is in the classic text by Hamilton but I don’t think it’s known more widely.
A first-order measure of association between two variables \(x,y\) is their correlation. Equivalently, we can fit a univariate linear regression to the data:
\[ y = \alpha + \beta x \]
If we have \(N\) observations that are independent, given a couple mild assumptions, we get a CLT:
\[ \sqrt{N}(\hat{\beta}-\beta) \rightarrow N(0,\sigma_{y\mid x}^2/\sigma_x^2), \]
where \(\sigma_x^2 = \text{var}( x)\) and \(\sigma_{y\mid x}^2 = \text{var}( y-\alpha-\beta x)\).
We can test for association (\(\beta \not = 0\)) using a standard F-test.
The independent observation assumption is crucial. Without it, you can get very surprising and unusual behavior.
Consider observations of pairs \(x_t,y_t\), which are generated from random walks: