# Advantages Of Distance Metric Learning

A Structured Regularization Approach

Report

Zhe Cheng

Instructor: Dr. Ikhlas Abdel-Qader

Catalog

1.Understanding dropout

2.DML using dropout

3.Applying Dropout to Distance Metric

4.Applying Dropout to Training Data

5 .Conclusion 1.Dropout

Dropout prevention overfit , and offers many ways a different neural network effectively about the combination index . The term " pressure " refers to the shedding units (hidden and visible) in a neural network. By reducing a unit , we mean temporarily removed from the network, along with all of its incoming and outgoing connections , shown in Figure 1 , in which the unit is a random selection of decline . In the simplest case , each unit is maintained

*…show more content…*

We will discuss the details in the next part of this range. In particular, we will discuss the two different applications drop out of school, drop out of school to drop out of training lessons metrics and application data that the application, in the following two subsections.

4.Applying Dropout to Distance Metric

In this section, we focus on applying dropout to the learned metric. Let M be metric learned from the precious iteration. To apply the dropout technique, we introduce a Bernoulli random matrix , where each is a Bernoulli random variable with , using the random matrix , we compute the dropped out distance metric denoted by as.

Given by i=j, and we already known that M is a symmetric matrix. With different design of sampling probabilities, we can apply dropout to the learned metric to simulate the effect of L1 regularization. In particular, we introduce a data dependent dropout probability as

Now, instead of perturbing Mt−1, we apply dropout to M’ t , i.e. the matrix after the gradient mapping. It is easy to verify that the expectation of the perturbed matrix Mˆ 0 is given

*…show more content…*

Then, the probability of is

Then, we consider dropout with the probability based on the elements as

6.Applying Dropout to Training Data

In the Guassian noise could perform as the trace norm, the external noise may affect the solution. Therefore, we consider dropout as

where is a binary value random variable and

Note that when we perform dropout to the training data according to this strategy, we actually drop the rows and the corresponding columns in the first component (x t i −x t j )(x t i − x t j ) > of At. Since the expectation of random variables in diagonal is the variance and it is 1 in off diagonal, the expectation of Aˆ