How to potty train a Siamese Network

2018-04-17T15:36:18-04:00

Could you please explain the anchor = y_pred[:,0:3]? Why is anchor 3 columns? And in the previous version how many columns was anchor and why was it that many columns?
Thank you, Jon

LikeLike

Reply

2018-04-18T10:50:26-04:00

This need a little bit of explaining I agree 🙂 . So the output of the LSTM base network for anchor, negative and positive are passed as a flatened list to the loss function by the siamese network. In this instance I was using a 3D representation, so this is how I retrieved each “component” in the loss function. The other thing to notice is the passing of a “whole” column of them, as the loss function is called on mini-batches, so it is not only one example at a time, but a batch of them. I must say that my loss function handling changed a lot in later iterations, might come back to write about it later…
So to summarize, Anchor, Positive and Negative are output from the same LSTM base network which encodes the traffic pattern on a 3D space (a little like word embedding if you want); this is why it is 3 columns. In previous version it was buggy… hence the update.

LikeLike

Reply

2018-06-29T07:25:55-04:00

Hi,
I read your amazing article on triplet loss. I am performing a similar experiment which I believe that triplet loss will have. But I don’t understand why it is not working out for me. Instead of going to lossless triplet loss. I thought to make triplet loss work first to some extent.
Now the task is to classify similar and not similar items from a vector space and then do the clustering using annoy with euclidean, which will end in making clusters of similar items.

The entire process –
I have 300D dimensional word vectors, ran on the universal encyclopedia, and there could be huge number of classes to classify, like, animal, vegetable, fruits, etc.
Now, what I did was to create the training set and testing set, the training set contains anchor, positive and negative vectors.
My implementation of Triplet Loss –

def triplet_loss(y_true, y_pred):
anchor = y_pred[:, 0:300]
positive = y_pred[:, 300:600]
negative = y_pred[:, 600:900]

# distance between the anchor and the positive
pos_dist = K.sum(K.square(anchor – positive), axis=1)

# distance between the anchor and the negative
neg_dist = K.sum(K.square(anchor – negative), axis=1)

# compute loss
alpha = 0.5
basic_loss = pos_dist – neg_dist + alpha
loss = K.mean(K.maximum(basic_loss, 0.0))

return loss

My network –

def create_base_network(input_dim=300):
t = ‘tanh’
model = Sequential()
model.add(Dense(input_dim, input_shape=(input_dim, ), activation=t))
model.add(Dense(600, activation=t))
model.add(Dense(input_dim, activation=t))

return model

def init_model(self):
base_network = create_base_network()

anchor_in = Input(shape=(300,))
positive_in = Input(shape=(300,))
negative_in = Input(shape=(300,))

anchor_out = base_network(anchor_in)
positive_out = base_network(positive_in)
negative_out = base_network(negative_in)

merged_vector = concatenate([anchor_out, positive_out, negative_out], axis=1)

model = Model(inputs=[anchor_in, positive_in, negative_in], outputs=merged_vector)

return model

Training –
model.fit(train_data, np.zeros(),
epochs=100, shuffle=True, steps_per_epoch=None, batch_size=128,
)

Total Number of training samples is ~7k

Note – the labels i am passing in the model.fit is zeroes, because it doesn’t matter for the triplet loss.

When training the model, the loss converges to 0 after 10 epochs on an average. But the actual separation is not even affected.

The test is being performed in the following ways –
To test the separation between two similar vectors and non similar vectors, you just need two vectors. We can not use this exact model, so what I did was extracted the weights of the base network and in a different script made the exact same network so that I can set the learned weights. My expectation here is that the learned weights should know what separation to make between the two input vectors.

But, it turns out that I does not make any separation.

For measuring the separation I am using a metric as Cohen’ D value.

= {\displaystyle {\frac {\text{mean difference}}{\text{standard deviation}}}}
Image result for cohen’s d

In separation between the samples in the input vector space is 1, ideally after getting transformed this value should become 4, if only taking the -2-sigma, +2-sigma solution space.
But, while testing the model to see the separation, the value resulted is 1.
If you don’t even train the model, still the value is 1, which is not expected, because the weights will be completely randomised.

I have got no sense left to reason, what is happening with triplet loss, why is not able to perform the job.

I have been dealing with this situation from a long time. Please share your thoughts on this. Any help is very much appreciated.

Thank You,
Shivam Srivastava

Hi,
I read your amazing article on triplet loss. I am performing a similar experiment which I believe that triplet loss will have. But I don’t understand why it is not working out for me. Instead of going to lossless triplet loss. I thought to make triplet loss work first to some extent.
Now the task is to classify similar and not similar items from a vector space and then do the clustering using annoy with euclidean, which will end in making clusters of similar items.

The entire process –
I have 300D dimensional word vectors, ran on the universal encyclopedia, and there could be huge number of classes to classify, like, animal, vegetable, fruits, etc.
Now, what I did was to create the training set and testing set, the training set contains anchor, positive and negative vectors.
My implementation of Triplet Loss –

def triplet_loss(y_true, y_pred):
anchor = y_pred[:, 0:300]
positive = y_pred[:, 300:600]
negative = y_pred[:, 600:900]

# distance between the anchor and the positive
pos_dist = K.sum(K.square(anchor – positive), axis=1)

# distance between the anchor and the negative
neg_dist = K.sum(K.square(anchor – negative), axis=1)

# compute loss
alpha = 0.5
basic_loss = pos_dist – neg_dist + alpha
loss = K.mean(K.maximum(basic_loss, 0.0))

return loss

My network –

def create_base_network(input_dim=300):
t = ‘tanh’
model = Sequential()
model.add(Dense(input_dim, input_shape=(input_dim, ), activation=t))
model.add(Dense(600, activation=t))
model.add(Dense(input_dim, activation=t))

return model

def init_model(self):
base_network = create_base_network()

anchor_in = Input(shape=(300,))
positive_in = Input(shape=(300,))
negative_in = Input(shape=(300,))

anchor_out = base_network(anchor_in)
positive_out = base_network(positive_in)
negative_out = base_network(negative_in)

merged_vector = concatenate([anchor_out, positive_out, negative_out], axis=1)

model = Model(inputs=[anchor_in, positive_in, negative_in], outputs=merged_vector)

return model

Training –
model.fit(train_data, np.zeros(),
epochs=100, shuffle=True, steps_per_epoch=None, batch_size=128,
)

Total Number of training samples is ~7k

Note – the labels i am passing in the model.fit is zeroes, because it doesn’t matter for the triplet loss.

When training the model, the loss converges to 0 after 10 epochs on an average. But the actual separation is not even affected.

The test is being performed in the following ways –
To test the separation between two similar vectors and non similar vectors, you just need two vectors. We can not use this exact model, so what I did was extracted the weights of the base network and in a different script made the exact same network so that I can set the learned weights. My expectation here is that the learned weights should know what separation to make between the two input vectors.

But, it turns out that I does not make any separation.

For measuring the separation I am using a metric as Cohen’ D value.

= {\displaystyle {\frac {\text{mean difference}}{\text{standard deviation}}}}
Image result for cohen’s d

In separation between the samples in the input vector space is 1, ideally after getting transformed this value should become 4, if only taking the -2-sigma, +2-sigma solution space.
But, while testing the model to see the separation, the value resulted is 1.
If you don’t even train the model, still the value is 1, which is not expected, because the weights will be completely randomised.

I have got no sense left to reason, what is happening with triplet loss, why is not able to perform the job.

I have been dealing with this situation from a long time. Please share your thoughts on this. Any help is very much appreciated.

Thank You,
Shivam Srivastava

LikeLike

Reply

2018-10-07T13:29:36-04:00

Alpha value of 0.5 could be the culprit…I ain’t expert…But trying bigger values like 100 can help …
Think from number of dimensions and give a distance for each component and work out a reasonable value for Alpha

LikeLike

Reply

2018-07-02T12:54:33-04:00

My experience up to now is that the most probable culprit is the loss function. I suggest debugging it individually and make sure what you get from the Keras framework is really what you think you get… Basically, getting the concatenation right, then de-concatenating it can be tricky. Once you get it right, make sure your tensorflow (through Keras in your case “K.”) are operating on the right dimensions and performing the calculation you want (because what you get in the loss function is a batch of examples, not only a single one). That would be my 2 cents of advice. Hope it can help!

LikeLike

Reply

2018-10-07T13:18:10-04:00

I had batchnorm issue while training GAN…. The problem is that batchnorm depends on batch image characteristic which can be very different during test times…Also, batch size also varies in test time and this can wreak havoc… You need to know the train stats and use that batchnorm during testing….

LikeLike

Reply

2018-10-09T13:40:41-04:00

Makes sense, what I read was also leading me toward storing the train parameter and re-applying them during testing. Just didn’t got the time to try it out.

LikeLike

Reply

2018-10-31T04:33:16-04:00

How would you define an epoch in Siamese.
Say I have 50 images for each of the 100 people i have.
Now given a person, his image can be potentially compared with 4950 images. For 100 such people, that would be 495000 comparisons. Will this constitute an epoch. If i have a batch of 100, then 4950 batched would make 1 epoch. Is that right!?

LikeLike

Reply

2018-10-31T09:08:21-04:00

An epoch is anyway an arbitrary value… It’s even more clear in this type of networks. For myself, I used an arbitrary (fixed) number of triplets for an batch/epoch, let say 100 batches of 1k randomly selected triplets per epochs. It doesn’t need to be 1to1 mapping to all possibilities. In facts, you better train (if possible) with triplets where the positive examples looks as different as possible to the anchor and the negative examples look as much as possible as the anchor.

LikeLike

Reply

2019-02-11T03:33:07-05:00

Peobably very late to the party, but… regarding BatchNormalization… check this out: https://github.com/keras-team/keras/pull/9965

LikeLike

Reply

	def triplet_loss(y_true, y_pred, alpha = ALPHA):
	"""
	Implementation of the triplet loss function

	Arguments:
	y_true — true labels, required when you define a loss in Keras, you don't need it in this function.
	y_pred — python list containing three objects:
	anchor — the encodings for the anchor data
	positive — the encodings for the positive data (similar to anchor)
	negative — the encodings for the negative data (different from anchor)

	Returns:
	loss — real number, value of the loss

	"""

	anchor = y_pred[:,0:3]
	positive = y_pred[:,3:6]
	negative = y_pred[:,6:9]

	# distance between the anchor and the positive
	pos_dist = K.sum(K.square(anchor-positive),axis=1)

	# distance between the anchor and the negative
	neg_dist = K.sum(K.square(anchor-negative),axis=1)

	# compute loss
	basic_loss = pos_dist-neg_dist+alpha
	loss = K.maximum(basic_loss,0.0)

	return loss

	TheLoneNut on Gen X Quiet Quitting in T…
	skyrtech on Gen X Quiet Quitting in T…
	TheLoneNut on Gen X Quiet Quitting in T…
	TheLoneNut on Gen X Quiet Quitting in T…
	Heraldo Sales Cavalc… on Gen X Quiet Quitting in T…

How to potty train a Siamese Network

Published by TheLoneNut

10 thoughts on “How to potty train a Siamese Network”

Leave a comment Cancel reply

Share this:

Published by TheLoneNut

10 thoughts on “How to potty train a Siamese Network”

Leave a comment Cancel reply