Set the Language

We weren't able to detect the audio language on your flashcards. Please select the correct language below.

Front

Back

Flashcards
»
Metrics - Winter '16

Metrics - Winter '16

by tditta, Feb. 2016

Favorite

Add to folder

Flag

Shuffle
Toggle On

Toggle Off
Alphabetize
Toggle On

Toggle Off
Front First
Toggle On

Toggle Off
Both Sides
Toggle On

Toggle Off
Read
Toggle On

Toggle Off

Reading...

Front

Card Range To Study

through

Play button

Progress

1/39

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

39 Cards in this Set

Front
Back

	Measure space	Omega: a set of points ("states of nature") F: set of subsets ("events") of \Omega, which form a sigma-algebra mu: measure (mapping F -> R+ U {infty})
	sigma-algebra	Omega \in F If A \in F, the A^c \in F A_j \in F, j = 1, 2, ... implies (union j = 1 to infty A_j) \in F
	Measure	Positivity: mu(A) >= 0 sigma-additivity: if A_j \in F, j = 1, 2, ... are disjoint, then: mu(union j = 1 to infty A_j) = sum j = 1 to infty mu(A_j) mu(null) = 0
	Probability space / probability measure	mu(Omega) = 1
	Borel-sigma-algebra	smallest sigma-algebra which contains all open subsets of Omega
	Measurable	A function f: Omega -> R^k is called F-measurable if f^-1(B) \in F for every Borel set B \in B(R^k)
	Radon-Nikodym	Let F be a sigma-algebra on Omega. Let mu and nu be two measures on F. Suppose that mu(Omega) < infty and nu(Omega) < infty. Suppose that nu is absolutely continuous with respect to mu, i.e. mu(A) = 0 implies nu(A) = 0 for A in F. Then there exists a positive measurable function g, called the Radon-Nikodym derivative, g: Omega -> R+ or g = dnu / dmu with nu(A) = integral A g dmu = integral A dnu / dmu dmu
	Conditional Expectation	Let f: Omega -> R be F2-measurable. Find a F1-measurable function g: Omega -> R, g = E[f\|F1] = E1[f] per the property: integral A g dmu = integral A f dmu for all A in F1
	Score	dl (theta \| y) / dtheta column vector
	Information matrix	E[s(theta \| y) s(theta \| y)'] -E[d2l(theta \| y) / dtheta dtheta']
	MLE asymptotic	sqrt(n) (thetahat_n - theta_0) -> d N(0, I(theta)^-1)
	Cramer Rao lower bound	If T(y) is an unbiased estimator of theta. Then Var_theta(T(y)) >= I(theta)^-1.
	Identification (MLE)	theta is identified if there is no other thetatilde with L(theta \| y) = L(thetahat \| y) for all y.
	Neyman-Pearson Lemma	beta - max. prob. rejecting true hypothesis / type-I error (5% significance) Consider testing theta = theat_0 vs. theta = theta_1. Given Psi, let delta(y) be some test with size equal or smaller than the size of the test delta_Psi^(LR). Then, the power at the alternative is smaller too: beta(delta, theta_0) <= beta(delta_Psi^(LR), theta_0) implies beta(delta, theta_1) <= beta(delta_Psi^(LR), theta_1) In words: test delta that are at least as "careful" in avoiding type-I errors as a likelihood-ratio test will make at least as many type-II errors a likelihood-ratio test.
	Wald test	H_0: a(theta_0) = 0 A(theta) = da(theta) / dtheta A = A(theta_0) I = I(theta_0) W = n a(thetahat)' (A(thetahat) Ihat^-1 A(thetahat)')^-1 a(thetahat) -> d chisq_k
	Lagrange multiplier/score test	H_0: a(theta_0) = 0 I = I(theta_0) LM = n s(thetahat_c)' Ihat^-1 s(thetahat_c) -> d chisq_k
	Likelihood ratio test	LR = 2n (l(thetahat) - l(thetahat_c)) -> d chisq_k
	GMM maximization	Q_n(theta) = -1/2 g_n(theta)' What_n g_n(theta) g_n(theta) = 1/n sum g(y_j; theta)
	Extremum consistency	Identification: theta_0 = argmax Q_0(theta) is unique Uniform convergence: Q_n converges uniformly in probability to Q_0, i.e.: sup \|Q_n(theta) - Q_0(theta)\| -> p 0
	M-Estimator asymptotics	sqrt(n) (thetahat_n - theta_0) -> d N(0, Psi^-1 Sigma Psi^-1) Psi = -E_theta_0[H(y; theta_0)] Sigma = sum Gamma_k Gamma_k = E[s(y_j; theta_0) s(y_j+k; theta_0)']
	GMM asymptotics	sqrt(n) (thetahat_n - theta_0) -> d N(0, Psi^-1 Sigma Psi^-1) Psi = G'WG G = dg_n(theta) / dtheta Sigma = G'WSWG S = sum Gamma_k Gamma_k = E[g(y_j; theta_0) g(y_j+k; theta_0)']
	IV and GMM	E[Z_t (y_t - X_t beta_)] = 0 Generalized: g([X_t, Z_t]; theta) = E[Z_t' f(X_t; theta)] = 0
	Bayesian posterior	Prior: pi(theta) Posterior: pi(theta\|x) pi(theta\|x) = L(theta\|x) pi(theta) / [integral Theta L(theta\|x) pi(theta) mu(dtheta)]
	Posterior expected loss	Decision: delta(x) rho(pi, delta(x)) = E_pi[L(theta, delta(x))] = integral Theta L(theta, delta(x)) pi(theta\|x) dtheta
	Integrated risk	Decision: delta(x) Marginal distribution: m(x) = integral Theta L(theta \| x) pi(theta) mu(dtheta) r(pi, delta) = E_pi[R(theta, delta)] = integral Theta integral X L(theta, delta(x)) f(x \| theta) pi(theta) dx dtheta = integral X rho(pi, delta(x)) m(x) dx
	Admissibility	Frequentist expected loss: R(theta, delta) = integral X L(theta, delta(x)) f(x \| theta) dx An estimator delta_0 is admissible if there is no estimator delta_1 which dominates delta_0, i.e. which satisfies R(theta, delta_0) >= R(theta, delta_1) and ">" for at least one value of theta_0.
	Bayes estimator	delta^pi(x) = arg min_d rho(pi, d \| x)
	Bayes risk	r(pi) = r(pi, delta^pi)
	Bayes estimators are admissible	If pi is strictly positive on Theta with finite Bayes risk and the risk function R(theta, delta) is a continuous function of theta for every delta, then the Bayes estimator delta^pi is admissible. If the Bayes estimator associated with a prior pi is unique, it is admissible.
	Admissible estimators are Bayes estimators	Suppose Theta is compact and R is convex. If all estimators have a continuous risk function, then for every non-Bayes estimator delta' there is a Bayes estimator delta^pi for some pi which dominates delta'. Under some mild conditions, all admissible estimators are limits of sequences of Bayes estimators.
	Sufficiency	A function ("statistic") T of x is sufficient, if the distribution of x conditional on T(x) does not depend in theta.
	Sufficiency principle	Two observations x, y which lead to the same value of a sufficient statistic T, T(x) = T(y), shall lead to the same inference regarding theta.
	Conditionality Principle	If two experiments on theta are available, and if exactly one of these experiments is carried out with some probability p, then the resulting inference on theta should only depend on the selected experiment and the resulting observation.
	Likelihood principle	The information brought about by an observation x about theta is entirely contained in the likelihood function L(theta \| x). If two observations x1 and x2 lead to proportional likelihood functions L(theta \| x1) = c L(theta \| x2), some c > 0 then they shall lead to the same inference regarding theta.
	Exponential families	If there are real valued functions c1, ..., ck and k of theta and real-valued functions T1, ..., Tk, S on R^n and a set A in R^n such that f(x\|theta) = exp(sum i = 1 to k ci(theta) Ti(x) + d(theta) + S(x)) 1_A(x) for all theta, then {f(.\|theta)\|theta} is called a k-parameter exponential family
	Natural sufficient statistic	T(x) = (T1(x), ..., Tk(x)) is sufficient
	Conjugacy	If the prior pi is a member of a parametric family of distributions so that the posterior pi(theta\|x) also belongs to that family, then this family is called conjugate to {f(.\|theta)\|theta}
	Conjugacy for exponential families	The (k+1)-st parameter exponential family pi(theta; (t1, ..., tk+1)) = exp(sum j=1 to k cj(theta) tj + tk+1d(theta) - log omega(t1, ..., tk+1)) Is conjugate to the exponential family (1). The posterior is given by pi(theta \|x) = pi(theta; (t1 + T1(x), ..., tk + Tk(x), tk+1 + 1))
	Jeffereys prior	Proportional to square root of determinant of information matrix I(theta) = E_theta[dlog(f(x\|theta))/dtheta dlog(f(x\|theta))/dtheta'] Flat if f(x\|theta) is N(theta, sigma^2)

Share This Flashcard Set

Set the Language

Metrics - Winter '16

Add to Folders

Upgrade to Cram Premium

Card Range To Study

39 Cards in this Set