Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
39 Cards in this Set
- Front
- Back
Measure space |
Omega: a set of points ("states of nature") F: set of subsets ("events") of \Omega, which form a sigma-algebra mu: measure (mapping F -> R+ U {infty}) |
|
sigma-algebra |
Omega \in F If A \in F, the A^c \in F A_j \in F, j = 1, 2, ... implies (union j = 1 to infty A_j) \in F |
|
Measure |
Positivity: mu(A) >= 0 sigma-additivity: if A_j \in F, j = 1, 2, ... are disjoint, then: mu(union j = 1 to infty A_j) = sum j = 1 to infty mu(A_j) mu(null) = 0 |
|
Probability space / probability measure |
mu(Omega) = 1 |
|
Borel-sigma-algebra |
smallest sigma-algebra which contains all open subsets of Omega |
|
Measurable |
A function f: Omega -> R^k is called F-measurable if f^-1(B) \in F for every Borel set B \in B(R^k) |
|
Radon-Nikodym |
Let F be a sigma-algebra on Omega. Let mu and nu be two measures on F. Suppose that mu(Omega) < infty and nu(Omega) < infty. Suppose that nu is absolutely continuous with respect to mu, i.e. mu(A) = 0 implies nu(A) = 0 for A in F. Then there exists a positive measurable function g, called the Radon-Nikodym derivative, g: Omega -> R+ or g = dnu / dmu with nu(A) = integral A g dmu = integral A dnu / dmu dmu |
|
Conditional Expectation |
Let f: Omega -> R be F2-measurable. Find a F1-measurable function g: Omega -> R, g = E[f|F1] = E1[f] per the property: integral A g dmu = integral A f dmu for all A in F1 |
|
Score |
dl (theta | y) / dtheta column vector |
|
Information matrix |
E[s(theta | y) s(theta | y)'] -E[d2l(theta | y) / dtheta dtheta'] |
|
MLE asymptotic |
sqrt(n) (thetahat_n - theta_0) -> d N(0, I(theta)^-1) |
|
Cramer Rao lower bound |
If T(y) is an unbiased estimator of theta. Then Var_theta(T(y)) >= I(theta)^-1. |
|
Identification (MLE) |
theta is identified if there is no other thetatilde with L(theta | y) = L(thetahat | y) for all y. |
|
Neyman-Pearson Lemma |
beta - max. prob. rejecting true hypothesis / type-I error (5% significance) Consider testing theta = theat_0 vs. theta = theta_1. Given Psi, let delta(y) be some test with size equal or smaller than the size of the test delta_Psi^(LR). Then, the power at the alternative is smaller too: beta(delta, theta_0) <= beta(delta_Psi^(LR), theta_0) implies beta(delta, theta_1) <= beta(delta_Psi^(LR), theta_1) In words: test delta that are at least as "careful" in avoiding type-I errors as a likelihood-ratio test will make at least as many type-II errors a likelihood-ratio test. |
|
Wald test |
H_0: a(theta_0) = 0 A(theta) = da(theta) / dtheta A = A(theta_0) I = I(theta_0) W = n a(thetahat)' (A(thetahat) Ihat^-1 A(thetahat)')^-1 a(thetahat) -> d chisq_k |
|
Lagrange multiplier/score test |
H_0: a(theta_0) = 0 I = I(theta_0) LM = n s(thetahat_c)' Ihat^-1 s(thetahat_c) -> d chisq_k |
|
Likelihood ratio test |
LR = 2n (l(thetahat) - l(thetahat_c)) -> d chisq_k |
|
GMM maximization |
Q_n(theta) = -1/2 g_n(theta)' What_n g_n(theta) g_n(theta) = 1/n sum g(y_j; theta) |
|
Extremum consistency |
Identification: theta_0 = argmax Q_0(theta) is unique Uniform convergence: Q_n converges uniformly in probability to Q_0, i.e.: sup |Q_n(theta) - Q_0(theta)| -> p 0 |
|
M-Estimator asymptotics |
sqrt(n) (thetahat_n - theta_0) -> d N(0, Psi^-1 Sigma Psi^-1) Psi = -E_theta_0[H(y; theta_0)] Sigma = sum Gamma_k Gamma_k = E[s(y_j; theta_0) s(y_j+k; theta_0)'] |
|
GMM asymptotics |
sqrt(n) (thetahat_n - theta_0) -> d N(0, Psi^-1 Sigma Psi^-1) Psi = G'WG G = dg_n(theta) / dtheta Sigma = G'WSWG S = sum Gamma_k Gamma_k = E[g(y_j; theta_0) g(y_j+k; theta_0)'] |
|
IV and GMM |
E[Z_t (y_t - X_t beta_)] = 0 Generalized: g([X_t, Z_t]; theta) = E[Z_t' f(X_t; theta)] = 0 |
|
Bayesian posterior |
Prior: pi(theta) Posterior: pi(theta|x) pi(theta|x) = L(theta|x) pi(theta) / [integral Theta L(theta|x) pi(theta) mu(dtheta)] |
|
Posterior expected loss |
Decision: delta(x) rho(pi, delta(x)) = E_pi[L(theta, delta(x))] = integral Theta L(theta, delta(x)) pi(theta|x) dtheta |
|
Integrated risk |
Decision: delta(x) Marginal distribution: m(x) = integral Theta L(theta | x) pi(theta) mu(dtheta) r(pi, delta) = E_pi[R(theta, delta)] = integral Theta integral X L(theta, delta(x)) f(x | theta) pi(theta) dx dtheta = integral X rho(pi, delta(x)) m(x) dx |
|
Admissibility |
Frequentist expected loss: R(theta, delta) = integral X L(theta, delta(x)) f(x | theta) dx An estimator delta_0 is admissible if there is no estimator delta_1 which dominates delta_0, i.e. which satisfies R(theta, delta_0) >= R(theta, delta_1) and ">" for at least one value of theta_0. |
|
Bayes estimator |
delta^pi(x) = arg min_d rho(pi, d | x) |
|
Bayes risk |
r(pi) = r(pi, delta^pi) |
|
Bayes estimators are admissible |
If pi is strictly positive on Theta with finite Bayes risk and the risk function R(theta, delta) is a continuous function of theta for every delta, then the Bayes estimator delta^pi is admissible. If the Bayes estimator associated with a prior pi is unique, it is admissible. |
|
Admissible estimators are Bayes estimators |
Suppose Theta is compact and R is convex. If all estimators have a continuous risk function, then for every non-Bayes estimator delta' there is a Bayes estimator delta^pi for some pi which dominates delta'. Under some mild conditions, all admissible estimators are limits of sequences of Bayes estimators. |
|
Sufficiency |
A function ("statistic") T of x is sufficient, if the distribution of x conditional on T(x) does not depend in theta. |
|
Sufficiency principle |
Two observations x, y which lead to the same value of a sufficient statistic T, T(x) = T(y), shall lead to the same inference regarding theta. |
|
Conditionality Principle |
If two experiments on theta are available, and if exactly one of these experiments is carried out with some probability p, then the resulting inference on theta should only depend on the selected experiment and the resulting observation. |
|
Likelihood principle |
The information brought about by an observation x about theta is entirely contained in the likelihood function L(theta | x). If two observations x1 and x2 lead to proportional likelihood functions L(theta | x1) = c L(theta | x2), some c > 0 then they shall lead to the same inference regarding theta. |
|
Exponential families |
If there are real valued functions c1, ..., ck and k of theta and real-valued functions T1, ..., Tk, S on R^n and a set A in R^n such that f(x|theta) = exp(sum i = 1 to k ci(theta) Ti(x) + d(theta) + S(x)) 1_A(x) for all theta, then {f(.|theta)|theta} is called a k-parameter exponential family |
|
Natural sufficient statistic |
T(x) = (T1(x), ..., Tk(x)) is sufficient |
|
Conjugacy |
If the prior pi is a member of a parametric family of distributions so that the posterior pi(theta|x) also belongs to that family, then this family is called conjugate to {f(.|theta)|theta} |
|
Conjugacy for exponential families |
The (k+1)-st parameter exponential family
pi(theta; (t1, ..., tk+1)) = exp(sum j=1 to k cj(theta) tj + tk+1d(theta) - log omega(t1, ..., tk+1)) Is conjugate to the exponential family (1). The posterior is given by pi(theta |x) = pi(theta; (t1 + T1(x), ..., tk + Tk(x), tk+1 + 1)) |
|
Jeffereys prior |
Proportional to square root of determinant of information matrix I(theta) = E_theta[dlog(f(x|theta))/dtheta dlog(f(x|theta))/dtheta'] Flat if f(x|theta) is N(theta, sigma^2) |