![]() ![]() For instance, a person's height in cm should be in a range, say, 100-300 cm. Writing a spec for what is "valid data" for each column can help you tag invalid data. Use of codes like 0, -1, -99999 or 99999 to mean something non-numeric like "not applicable" or "column unavailable" and just dumping this into a linear model along with valid data.Wrong or incorrectly converted units (grams vs kilos vs pounds meters, feet, miles, km), possibly from merging multiple data sets (Note: The Mars Orbiter was thought to be lost in this way, so even NASA rocket scientists can make this mistake).Digits missing or added with hand-entered data (off by a factor of 10 or more).Here are some types of noise in garbage input data that do not typically fit a normal distribution: You can also filter input data before the linear fit for obvious, glaring errors. ![]() You can test for normality of residuals after the linear fit by looking at the residuals. Ideally you have mostly data and a little noise. Implicit in getting the full benefit of linear regression is that the noise follows a normal distribution. I would also suggest robust regression methods and the transparent reporting of dropped observations, as suggested by Rob and Chris respectively. Hopefully, you then have a reasonable basis for either throwing them out or getting the data compilers to double-check the records for you. For example, is it really reasonable that you have a 600 pound woman in your study, which recruited from various sports injury clinics? Or, isn't it strange that a person is listing 55 years or professional experience when they're only 60 years old? And so forth. I think the best way to start is to ask whether the outliers even make sense, especially given the other variables you've collected. ![]() This must come from subject-area knowledge. Taking your question literally, I would argue that there are no statistical tests or rules of thumb can be used as a basis for excluding outliers in linear regression analysis (as opposed to determining whether or not a given observation is an outlier). ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |