Hello All,
Section tomorrow (and for the foreseeable future unless noted
otherwise) will be unified. So everyone should come at the earlier
time.
Also, please let Jacob know tomorrow during section if you prefer
section next week to be on Tuesday or Wednesday. I think I detected
some hesitation about Tuesday. It doesn't matter to us because Jacob
is in town on Wed. We were just worried from past experience that
classes on the Wed before thanksgiving are often sparsely attended.
Cheers,
JS.
Wherever it says "employment" in an equation it should say
"unemployment". I see that the typo is also in one equation in
question #3. I'll correct the online file.
Jas.
Jason Lakin writes:
> Greetings all. In problem 2, is the independent variable in the model correctly named, or should it say "unemployment"?
>
> thanks.
> jason
>
> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
> <HTML><HEAD>
> <META http-equiv=Content-Type content="text/html; charset=iso-8859-1">
> <META content="MSHTML 6.00.2800.1276" name=GENERATOR>
> <STYLE></STYLE>
> </HEAD>
> <BODY bgColor=#ffffff>
> <DIV><FONT face=Arial size=2>Greetings all. In problem 2, is the independent
> variable in the model correctly named, or should it say "unemployment"?
> </FONT></DIV>
> <DIV><FONT face=Arial size=2></FONT> </DIV>
> <DIV><FONT face=Arial size=2>thanks.</FONT></DIV>
> <DIV><FONT face=Arial size=2>jason</FONT></DIV>
> <DIV><FONT face=Arial size=2></FONT> </DIV>
> <DIV><FONT face=Arial size=2></FONT> </DIV></BODY></HTML>
On Monday, after answering any questions you may have for me about the
homework, I will start on page 151 of the lecture notes ("Hypothesis
Testing"). I hope to get to page 172 ("Full Results of the 5 Models
B").
Please read the following article by Monday December 1:
MacKuen, Michael B., Robert S. Erikson and James A. Stimson. 1992.
Peasants or Bankers? The American Electorate and the U.S. Economy.
American Political Science Review 86(3): 597-611.
(The article is available via JSTOR).
BTW, one of you has asked for a quick calculus review book. It isn't
necessary to look at this book because the calculus material is NOT
REQUIRED for this course, but knowledge of calculus obviously helps
when trying to understand how we obtain the OLS beta. The following
book is on reserve in Littauer for this purpose:
Kleppner, Daniel. Quick calculus: for self-study or classroom
use. New York, Wiley [1972]. QA303 .K673
Cheers,
JS.
----- Original Message -----
From: Jason Lakin
To: Jasjeet Singh Sekhon
Cc: gov1000-list-request(a)fas.harvard.edu
Sent: Friday, November 21, 2003 8:13 AM
Subject: RE: HW clarification
Greetings all. In problem 2, is the independent variable in the model correctly named, or should it say "unemployment"?
thanks.
jason
Hi,
Home work #6 has been posted to the course website. See
http://www.courses.fas.harvard.edu/~gov1000/assignments
There is also an R file which offers R hints. It is located at
http://jsekhon.fas.harvard.edu/gov1000/R.html and is named
"ApprovalRegression1.R". You can run this file by simply typing (in
R)
>source("http://jsekhon.fas.harvard.edu/gov1000/ApprovalRegression1.R")
(You obviously need network access for this to work).
Since this homework is due in section next week (somewhat earlier than
usual because of the Thanksgiving break), I highly recommend that you
take a close look at the homework by Monday so you can ask questions
in class.
Cheers,
JS.
Hello All,
This is just to let you know that section this week (and for the
foreseeable future unless otherwise noted) will be unified. So, everyone
should come to the earlier time.
Also, please let Jacob know if you would prefer section next week to be on
Tuesday or Wednesday. I detected some hesitation about Tuesday. It
doesn't matter to us because we're around. But from past experience we've
noted that classes on the Wed before Thanksgiving are often sparsely
attended. I should also note that since Wednesday is the standard time,
if someone has a strong preference for Wednesday, Wednesday is the day it
will be. One thing which may influence your decision is that the homework
will be due in section next week.
Cheers,
JS.
[BTW, this is the second copy of this I've emailed; I didn't get a copy of
my first email sent back so I'm worried that it wasn't sent. You may get
two copies.]
Hi Jason,
Thanks for the regression questions. Here are some answers.
> So I am still trying to puzzle through the logic of regression,
> particularly the importance of the error term.
> My recollection is that there must be nothing in the error term that is systematically
> related to both the independent and the dependent variable. Is
> this correct?
Everything in the error term must be either unrelated (i.e.,
orthogonal) to the dependent variable (Y) **OR** unrelated to the
independent variables (X). Stuff in the error term need not be BOTH
unrelated to Y AND X.
Recall this is what makes an experiment work. In an experiment X is
the treatment and Y the outcome. There is lots of stuff in the error
term related to Y, but none of it is systematically related to X
(because X is randomly assigned).
As an aside, there is a somewhat small exception to this weakened
condition which allows *some* stuff in the error term to be related to
both X and Y. This exception is the concept of "post treatment bias",
but we have not yet discussed it in class. I will probably get to it
today in lecture.
> If so, then the question is: is it accurate to then say that noise
> in the error term that is systematically related to the dependent
> variable but NOT to the independent variables will in fact change
> the value of the coefficients on the included independent variable
> terms, but not the general relationship?
No. Anything in the error term which is unrelated (i.e., orthogonal)
to the independent variables cannot effect (in expectation) the
estimates of the coefficients associated with the independent
variables---no matter what the relationship is between the stuff in
the error term and our dependent variable. Including new variables
which are orthogonal to the included independent variables can only
change our coefficient estimates of the existing variables because of
issues related to efficiency (we will talk about this next week). But
in expectation, we expect that the coefficient estimates of the
existing variables will remain the same as they are.
>Alternatively, it seems to me that an omitted variable that was
>unrelated to the independent variables could still explain all of the
>variance in the dependent variable. Then the coefficients would be
>entirely meaningless, except in a sense devoid of any causal
>inference.
Let Z be our left out variable. And X denote our independent
variables and Y our dependent variable. Then
IF
A) the left out variable, Z, explains all the variance of dependent
variable, Y.
AND
B) our independent variables (X) are related to Y.
THEN it is not possible that Z is unrelated (i.e., orthogonal) to X
To put it another way (note that "abs()" denotes absolute value):
If abs(cor(Y,X)) > 0 and abs(cor(Y,Z))=1, then cor(Z,X) cannot equal
zero
(there are knife edge exceptions having to do with sampling
uncertainty, but they are irrelevant for the general point).
> (i am oversimplifying). let's say, just for the sake of argument,
> that the "true" determinant of level of foreign ownership was a
> completely exogenous factor unrelated to these independent
> variables-- like a WTO Rule that strictly regulated levels of
> foreign ownership based on another set of criteria like how close
> the president of the host country was to the president of the WTO.
> Say that one could in fact model the entire dependent variable
> perfectly against this independent variable if one knew about it,
> and could measure "friendliness." But this researcher did not know
> about it, nor how to measure it.
> Would I be correct to infer that the coefficients from the model he
> did use (if this model were OLS and the assumption of linearity
> held) were useful in summarizing the relationship between these
> independent variables and the dependent variable (in the way of
> cross-tabs, as we discussed last time) but that they tell you
> nothing about the causal relationship?
Whether the coefficients tell us anything about a causal relationship
depends on a ton of stuff related to research design which I cannot
speak to in this example because I'm not familiar with the article to
which you are referring. *BUT* it is not possible for this left out
exogenous variable to be unrelated to the independent variables but
explain all of the variance of the dependent variable even though the
independent variables are systematically related to the dependent
variable (see above).
> Would i be further correct
> in saying that these coefficients are "correct" insofar as we take
> them to be measuring this direct relationship and not a causal
> relationship? Or are they "biased" or "incorrect" because of the
> exclusion of the key variable?
The exclusion of a variable which is orthogonal will not bias our
estimates.
> in this fake example, am i correct
> to assume that the inclusion of the key independent variable
> (assuming no measurement error) would reduce the other coefficients
> to zero? Or is this not mathematically quite right?
We don't get to this questions because of the previous issues. But we
may get to a related question which you should ask in class today.
Cheers,
JS.
Jason Lakin writes:
> Hi Jas. How are you?
>
> So I am still trying to puzzle through the logic of regression, particularly the importance of the error term. My recollection is that there must be nothing in the error term that is systematically related to both the independent and the dependent variable. Is this correct?
>
> If so, then the question is: is it accurate to then say that noise in the error term that is systematically related to the dependent variable but NOT to the independent variables will in fact change the value of the coefficients on the included independent variable terms, but not the general relationship? Alternatively, it seems to me that an omitted variable that was unrelated to the independent variables could still explain all of the variance in the dependent variable. Then the coefficients would be entirely meaningless, except in a sense devoid of any causal inference.
>
> to take an example:
>
> in a paper from my IPE class, the model (which does not use OLS, actually, but something called Tobit because the distribution is truncated at the top-- forget about this for the moment), is the following:
>
> dependent variable: percent of US versus Host Country ownership of U.S. multi-national subsidiaries in LDC's
> the independent variables are: index of bargaining power of gov't, index of bargaining power of Multi-National Corporation, economic controls.
>
> (i am oversimplifying). let's say, just for the sake of argument, that the "true" determinant of level of foreign ownership was a completely exogenous factor unrelated to these independent variables-- like a WTO Rule that strictly regulated levels of foreign ownership based on another set of criteria like how close the president of the host country was to the president of the WTO. Say that one could in fact model the entire dependent variable perfectly against this independent variable if one knew about it, and could measure "friendliness." But this researcher did not know about it, nor how to measure it.
>
> Would I be correct to infer that the coefficients from the model he did use (if this model were OLS and the assumption of linearity held) were useful in summarizing the relationship between these independent variables and the dependent variable (in the way of cross-tabs, as we discussed last time) but that they tell you nothing about the causal relationship? Would i be further correct in saying that these coefficients are "correct" insofar as we take them to be measuring this direct relationship and not a causal relationship? Or are they "biased" or "incorrect" because of the exclusion of the key variable? in this fake example, am i correct to assume that the inclusion of the key independent variable (assuming no measurement error) would reduce the other coefficients to zero? Or is this not mathematically quite right?
>
> i hope this is not too confusing...
>
> thanks
> jason
> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
> <HTML><HEAD>
> <META http-equiv=Content-Type content="text/html; charset=iso-8859-1">
> <META content="MSHTML 6.00.2800.1276" name=GENERATOR>
> <STYLE></STYLE>
> </HEAD>
> <BODY bgColor=#ffffff>
> <DIV><FONT face=Arial size=2>Hi Jas. How are you? </FONT></DIV>
> <DIV><FONT face=Arial size=2></FONT> </DIV>
> <DIV><FONT face=Arial size=2>So I am still trying to puzzle through the logic of
> regression, particularly the importance of the error term. My recollection
> is that there must be nothing in the error term that is systematically related
> to both the independent and the dependent variable. Is this
> correct?</FONT></DIV>
> <DIV><FONT face=Arial size=2></FONT> </DIV>
> <DIV><FONT face=Arial size=2>If so, then the question is: is it accurate to then
> say that noise in the error term that is systematically related to the dependent
> variable but NOT to the independent variables will in fact change the value of
> the coefficients on the included independent variable terms, but not the general
> relationship? Alternatively, it seems to me that an omitted variable that
> was unrelated to the independent variables could still explain all of the
> variance in the dependent variable. Then the coefficients would be
> entirely meaningless, except in a sense devoid of any causal inference.
> </FONT></DIV>
> <DIV><FONT face=Arial size=2></FONT> </DIV>
> <DIV><FONT face=Arial size=2>to take an example:</FONT></DIV>
> <DIV><FONT face=Arial size=2></FONT> </DIV>
> <DIV><FONT face=Arial size=2>in a paper from my IPE class, the model (which does
> not use OLS, actually, but something called Tobit because the distribution is
> truncated at the top-- forget about this for the moment), is the
> following:</FONT></DIV>
> <DIV><FONT face=Arial size=2></FONT> </DIV>
> <DIV><FONT face=Arial size=2>dependent variable: percent of US versus Host
> Country ownership of U.S. multi-national subsidiaries in
> LDC's</FONT></DIV>
> <DIV><FONT face=Arial size=2>the independent variables are: index of bargaining
> power of gov't, index of bargaining power of Multi-National Corporation,
> economic controls.</FONT></DIV>
> <DIV><FONT face=Arial size=2></FONT> </DIV>
> <DIV><FONT face=Arial size=2>(i am oversimplifying). let's say, just for
> the sake of argument, that the "true" determinant of level of foreign
> ownership was a completely exogenous factor unrelated to these independent
> variables-- like a WTO Rule that strictly regulated levels of
> foreign ownership based on another set of criteria like how close the president
> of the host country was to the president of the WTO. Say that one could in
> fact model the entire dependent variable perfectly against this independent
> variable if one knew about it, and could measure "friendliness."
> But this researcher did not know about it, nor how to measure
> it. </FONT></DIV>
> <DIV><FONT face=Arial size=2></FONT> </DIV>
> <DIV><FONT face=Arial size=2>Would I be correct to infer that the coefficients
> from the model he did use (if this model were OLS and the assumption
> of linearity held) were useful in summarizing the relationship between
> these independent variables and the dependent variable (in the way of
> cross-tabs, as we discussed last time) but that they tell you nothing about the
> causal relationship? Would i be further correct in saying that these
> coefficients are "correct" insofar as we take them to be measuring this direct
> relationship and not a causal relationship? Or are they "biased" or
> "incorrect" because of the exclusion of the key
> variable? in this fake example, am i correct to assume that the
> inclusion of the key independent variable (assuming no measurement error) would
> reduce the other coefficients to zero? Or is this not mathematically quite
> right?</FONT></DIV>
> <DIV><FONT face=Arial size=2></FONT> </DIV>
> <DIV><FONT face=Arial size=2>i hope this is not too confusing...</FONT></DIV>
> <DIV><FONT face=Arial size=2></FONT> </DIV>
> <DIV><FONT face=Arial size=2>thanks</FONT></DIV>
> <DIV><FONT face=Arial size=2>jason </FONT></DIV>
> <DIV><FONT face=Arial size=2></FONT> </DIV></BODY></HTML>
Hello All,
On Monday I will continue to lecture on regression. Because there was
no lecture last week, it may be a good idea to review the material I
covered two weeks ago (pages 120-133). Looking ahead to page 149
would also be a good idea. If you don't know calculus, feel free to
skip the section in which least squares is derived.
Please start reading the Achen monograph on regression ("Interpreting
and Using Regression"). This little book provides very useful
information which is missing from the text books.
A new homework assignment will be handed out next week which will be
due Wed the 26th (a little early because of the Thanksgiving break).
Cheers,
JS.