Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finding out whether a variable is really collinear #95

Closed
sergiocorreia opened this issue Apr 17, 2017 · 0 comments
Closed

Finding out whether a variable is really collinear #95

sergiocorreia opened this issue Apr 17, 2017 · 0 comments

Comments

@sergiocorreia
Copy link
Owner

clear
cls
set obs 100
gen double  y = runiform() * 1e8
gen double z = runiform()

reghdfe z y, noab
reg z y

A variable is omitted if:

a) After partialling-out, it is zero (or within EPS of zero, where EPS is usually 1e-10)
b) The regression gives a beta of zero for a coef. , or invsym() detects it is collinear and drops it. (2nd approach is probably better). An extra issue is that the error in step a) gets carried to step b).

Not sure if there is an easy way around this, but in any case the tolerance used to detect omitted should be linked to the tolerance used to partial out the variables.

Also, if both Y and X are completely absorbed, their residuals will be very low but within the same magnitude of each other, giving a spurious result.

sergiocorreia added a commit that referenced this issue Apr 17, 2017
Issues resolved:

1) qrsolve(XX, XY) suffers from numerical inaccuracies on some cases, so
we fix to qrsolve(X, Y) if we don't have weights

(Note: this might be a bit slower as the XX and XY are already
precomputed. It could also be optimized but for now let's leave it as it
is.

2) The methods used to find out omitted variables were different in
different points. When demeaning, we looked if the regressors were close
to zero, but used absolute values (1e-8) which doesn't work if there is
a huge scaling difference between Y and X. Now we will assumme the var
has been absorbed by the absvars if the ratio z'z/w'w < (1e-9) where
z=old variable and w= demeaned variable. This 1e-9 will also increase
with tolerance() as it's just tolerance*1e-1

We also use invsym() as the main driver for whether to drop or not, as
that is what it's used on the built-in tools. Then the inputs of
qrsolve() will exclude the omitted variables, to prevent any issue.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant