The Behaviour of the Sample Mean under I.i.d. Sampling
Introductory Remarks About Distributions
The Normal Distribution
This is the classic "bell-shaped" curve that Gauss invented (hence,
it is sometimes called the Gaussian distribution). It is symmetric,
centered at the mean and has points of inflexion at
mean + standard_deviation and mean - standard_deviation.
This is the easiest case for the LLN: The sample mean of observations
from such a distribution is also Normally distributed, with the same
true mean, and variance N times smaller than the original one, where
N is the number of observations.
This case illustrates very clearly the concept of convergence in
mean-square-error.
The Bernoulli Distribution
This discrete distribution describes a random variable that takes two
possible values, 1 (success) with probability p, and 0 (failure) with
probability 1-p.
The LLN will imply that we should expect the sample mean (proportion) of
successes to converge to the true population proportion p, and this
should happen irrespective of the fact that the underlying distribution
we are drawing from is discrete.
The Exponential Distribution
This is the non-negative distribution that is found to be a good
model for things like the life-time of light-bulbs, etc.
It has a single parameter: its mean. It falls uniformly as
the value of the random variable grows to infinity.
The sample mean of observations drawn from such a distribution is
a (rescaled) Gamma distribution. Hence, you should watch out for the
LLN taking place, whereby the sample mean converges to the population
mean, despite the fact that the underlying distribution is skewed
(to the right).
The Cauchy Distribution
This is the pathological distribution that while it looks just like
the normal curve, being symmetric and bell-shaped, it does not have
any finite moments, not even a mean. This is caused by its tails
being "too fat". As a result, its two parameters are its median
and its scale.
What this implies for the LLN experiments, is that the sample average
does not converge to the true location because all LLN's require the
mean of the underlying distribution to be finite. This is violated
here and hence convergence will not take place. Indeed, a surprising
fact is that the sample mean has exactly the same Cauchy distribution
as the one of the underlying observations, irrespective of the sample
size!
Watch out for this fact.
The Pareto Distribution
This case is very interesting because it allows one to have the LLN,
the CLT, or both fail at will by choosing appropriately the single
parameter of this distribution, theta.
This is because the Pareto distribution does not have a finite
variance if theta is less than 2, and it also does not have a mean if
theta is less than 1.
Hence, choosing a theta that exceeds 3 should show both the LLN and the
CLT holding. A theta smaller than 1 should exhibit failure of both
the sample mean and the sample T-statistic to converge, whereas
a theta in between 1 and 2 will allow one to obtain convergence in
probability of the sample mean (the LLN), but not of the T-statistic
to the Normal curve (the CLT).