• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/39

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

39 Cards in this Set

  • Front
  • Back

Measure space

Omega: a set of points ("states of nature")




F: set of subsets ("events") of \Omega, which form a sigma-algebra




mu: measure (mapping F -> R+ U {infty})

sigma-algebra

Omega \in F


If A \in F, the A^c \in F


A_j \in F, j = 1, 2, ... implies (union j = 1 to infty A_j) \in F

Measure

Positivity: mu(A) >= 0




sigma-additivity: if A_j \in F, j = 1, 2, ... are disjoint, then: mu(union j = 1 to infty A_j) = sum j = 1 to infty mu(A_j)




mu(null) = 0

Probability space / probability measure

mu(Omega) = 1

Borel-sigma-algebra

smallest sigma-algebra which contains all open subsets of Omega

Measurable

A function f: Omega -> R^k is called F-measurable if f^-1(B) \in F for every Borel set B \in B(R^k)

Radon-Nikodym

Let F be a sigma-algebra on Omega. Let mu and nu be two measures on F. Suppose that mu(Omega) < infty and nu(Omega) < infty. Suppose that nu is absolutely continuous with respect to mu, i.e. mu(A) = 0 implies nu(A) = 0 for A in F. Then there exists a positive measurable function g, called the Radon-Nikodym derivative,




g: Omega -> R+ or g = dnu / dmu with




nu(A) = integral A g dmu = integral A dnu / dmu dmu

Conditional Expectation

Let f: Omega -> R be F2-measurable. Find a F1-measurable function g: Omega -> R, g = E[f|F1] = E1[f] per the property:




integral A g dmu = integral A f dmu for all A in F1

Score

dl (theta | y) / dtheta




column vector

Information matrix

E[s(theta | y) s(theta | y)']




-E[d2l(theta | y) / dtheta dtheta']

MLE asymptotic

sqrt(n) (thetahat_n - theta_0) -> d N(0, I(theta)^-1)

Cramer Rao lower bound

If T(y) is an unbiased estimator of theta. Then Var_theta(T(y)) >= I(theta)^-1.

Identification (MLE)

theta is identified if there is no other thetatilde with L(theta | y) = L(thetahat | y) for all y.

Neyman-Pearson Lemma

beta - max. prob. rejecting true hypothesis / type-I error (5% significance)




Consider testing theta = theat_0 vs. theta = theta_1. Given Psi, let delta(y) be some test with size equal or smaller than the size of the test delta_Psi^(LR). Then, the power at the alternative is smaller too:




beta(delta, theta_0) <= beta(delta_Psi^(LR), theta_0) implies beta(delta, theta_1) <= beta(delta_Psi^(LR), theta_1)




In words: test delta that are at least as "careful" in avoiding type-I errors as a likelihood-ratio test will make at least as many type-II errors a likelihood-ratio test.

Wald test

H_0: a(theta_0) = 0




A(theta) = da(theta) / dtheta


A = A(theta_0)


I = I(theta_0)




W = n a(thetahat)' (A(thetahat) Ihat^-1 A(thetahat)')^-1 a(thetahat) -> d chisq_k

Lagrange multiplier/score test

H_0: a(theta_0) = 0




I = I(theta_0)




LM = n s(thetahat_c)' Ihat^-1 s(thetahat_c) -> d chisq_k

Likelihood ratio test

LR = 2n (l(thetahat) - l(thetahat_c)) -> d chisq_k

GMM maximization

Q_n(theta) = -1/2 g_n(theta)' What_n g_n(theta)




g_n(theta) = 1/n sum g(y_j; theta)

Extremum consistency

Identification: theta_0 = argmax Q_0(theta) is unique




Uniform convergence: Q_n converges uniformly in probability to Q_0, i.e.:




sup |Q_n(theta) - Q_0(theta)| -> p 0

M-Estimator asymptotics

sqrt(n) (thetahat_n - theta_0) -> d N(0, Psi^-1 Sigma Psi^-1)




Psi = -E_theta_0[H(y; theta_0)]


Sigma = sum Gamma_k


Gamma_k = E[s(y_j; theta_0) s(y_j+k; theta_0)']

GMM asymptotics

sqrt(n) (thetahat_n - theta_0) -> d N(0, Psi^-1 Sigma Psi^-1)




Psi = G'WG


G = dg_n(theta) / dtheta


Sigma = G'WSWG


S = sum Gamma_k


Gamma_k = E[g(y_j; theta_0) g(y_j+k; theta_0)']

IV and GMM

E[Z_t (y_t - X_t beta_)] = 0




Generalized:




g([X_t, Z_t]; theta) = E[Z_t' f(X_t; theta)] = 0

Bayesian posterior

Prior: pi(theta)


Posterior: pi(theta|x)




pi(theta|x) = L(theta|x) pi(theta) / [integral Theta L(theta|x) pi(theta) mu(dtheta)]

Posterior expected loss

Decision: delta(x)




rho(pi, delta(x)) = E_pi[L(theta, delta(x))] = integral Theta L(theta, delta(x)) pi(theta|x) dtheta

Integrated risk

Decision: delta(x)


Marginal distribution: m(x) = integral Theta L(theta | x) pi(theta) mu(dtheta)




r(pi, delta) = E_pi[R(theta, delta)]




= integral Theta integral X L(theta, delta(x)) f(x | theta) pi(theta) dx dtheta




= integral X rho(pi, delta(x)) m(x) dx

Admissibility



Frequentist expected loss: R(theta, delta) = integral X L(theta, delta(x)) f(x | theta) dx




An estimator delta_0 is admissible if there is no estimator delta_1 which dominates delta_0, i.e. which satisfies




R(theta, delta_0) >= R(theta, delta_1)




and ">" for at least one value of theta_0.

Bayes estimator

delta^pi(x) = arg min_d rho(pi, d | x)

Bayes risk

r(pi) = r(pi, delta^pi)

Bayes estimators are admissible

If pi is strictly positive on Theta with finite Bayes risk and the risk function R(theta, delta) is a continuous function of theta for every delta, then the Bayes estimator delta^pi is admissible.




If the Bayes estimator associated with a prior pi is unique, it is admissible.

Admissible estimators are Bayes estimators

Suppose Theta is compact and R is convex. If all estimators have a continuous risk function, then for every non-Bayes estimator delta' there is a Bayes estimator delta^pi for some pi which dominates delta'.




Under some mild conditions, all admissible estimators are limits of sequences of Bayes estimators.

Sufficiency

A function ("statistic") T of x is sufficient, if the distribution of x conditional on T(x) does not depend in theta.

Sufficiency principle

Two observations x, y which lead to the same value of a sufficient statistic T, T(x) = T(y), shall lead to the same inference regarding theta.

Conditionality Principle

If two experiments on theta are available, and if exactly one of these experiments is carried out with some probability p, then the resulting inference on theta should only depend on the selected experiment and the resulting observation.

Likelihood principle

The information brought about by an observation x about theta is entirely contained in the likelihood function L(theta | x).




If two observations x1 and x2 lead to proportional likelihood functions L(theta | x1) = c L(theta | x2), some c > 0




then they shall lead to the same inference regarding theta.

Exponential families

If there are real valued functions c1, ..., ck and k of theta and real-valued functions T1, ..., Tk, S on R^n and a set A in R^n such that




f(x|theta) = exp(sum i = 1 to k ci(theta) Ti(x) + d(theta) + S(x)) 1_A(x)




for all theta, then {f(.|theta)|theta} is called a k-parameter exponential family

Natural sufficient statistic

T(x) = (T1(x), ..., Tk(x)) is sufficient

Conjugacy

If the prior pi is a member of a parametric family of distributions so that the posterior pi(theta|x) also belongs to that family, then this family is called conjugate to {f(.|theta)|theta}

Conjugacy for exponential families

The (k+1)-st parameter exponential family



pi(theta; (t1, ..., tk+1)) = exp(sum j=1 to k cj(theta) tj + tk+1d(theta) - log omega(t1, ..., tk+1))



Is conjugate to the exponential family (1). The posterior is given by



pi(theta |x) = pi(theta; (t1 + T1(x), ..., tk + Tk(x), tk+1 + 1))

Jeffereys prior

Proportional to square root of determinant of information matrix




I(theta) = E_theta[dlog(f(x|theta))/dtheta dlog(f(x|theta))/dtheta']




Flat if f(x|theta) is N(theta, sigma^2)