![]() When we take logs, it's again about two interquartile ranges below the new median. Meanwhile a low value like 30 (only 4 values in the sample of size 1000 are below it) is a bit less than one interquartile range below the median of $y$. In the case of $y$, it's 5 interquartile ranges above the median.īut when we take logs, it gets pulled back toward the median after taking logs it's only about 2 interquartile ranges above the median. When we looks at the original data, a value at the far right - say around 750 - is sitting far above the median. In the first diagram, $x$, $y$ and $z$ all have means near 178, all have medians close to 150, and their logs all have medians near 5. Taking logs "pulls in" more extreme values on the right (high values) relative to the median, while values at the far left (low values) tend to get stretched back, further away from the median. So we can imagine looking at some kind of "standardized" variables (while remaining positive, all have similar location and spread, say) Note that when we're looking at a picture of the distributional shape, we're not considering the mean or the standard deviation - that just affects the labels on the axis. ![]() We can see that this might help at least sometimes to reduce the amount of right-skewness. If we wanted our distributions to look more symmetric, and perhaps more normal, the transformation clearly improved the second and third case. One the other hand, the most skew variable ( $z$) is still (slightly) right skew, even after taking logs. You can see that the center case ( $y$) has been transformed to something close to symmetry, while the more mildly right skew case ( $x$) is now somewhat left skew. The bottom row contains histograms for their logs. The top row contains histograms for samples from three different, increasingly skewed distributions. The economist is likely to plunge ahead anyway since what we really like about the transformation are points 1,2,and 4-7.įirst let's see what typically happens when we take logs of something that's right skew. Log-normally distributed or where logging the data does not result in the transformed data having equal variance across observations, a statistician will tend not to like the transformation very much. This, I think, is because they judge my point 8 and the second half of my point 3 to be very important. Statisticians generally find economists over-enthusiastic about this particular transformation of the data. Normally distributed data have lots going for them.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |