What Siamese Dreams are made of…

In my last post I wrote a high-level description of a One-Shot learning approach we developed for telecommunication network fault identification through traffic analysis. The One-Shot learning approach is implemented using a Siamese Deep Neural Network. In this post I will describe with more details how this can be achieved with the use of Keras and TensorFlow. As said in the previous post, this is early work and subject to a lot of change, but if it can help someone else alleviate some of the pain of building such a network, let it be!

The first step is probably to understand what is a Siamese Network and how it works. What we want out network to produce is a representation of the data we feed it e.g. a vector representing the input data like word embeddings, but for in this case telecom network traffic data. At the end of the day, this representation vector should have close distances for similar traffic and higher distance for dissimilar traffic. Hence, when the network is properly trained we can use those distances to determine which network traffic is the closest and thus the most representing. But how do we implement it?

For that, let’s look at the cute kitten image I have put on this and the previous post. The crème color cute one hiding at the bottom is Aristotle. The other crème color one is Peter Pan and the black one is Napoleon. Aristotle is our Anchor, the kitten we want to compare to. If another kitten is similar, let say Peter Pan, then the vector representing Peter Pan should be close in distance to the vector representing Aristotle. This is our Positive example. Similarly, when a kitten is different from Aristotle, let say Napoleon, we want the vector representing it being far in distance to Aristotle. This is our Negative example.

Simplifying things, training a deep neural network consist in predicting a result from a training example; finding out how far we are from the expected value using a loss function to find the error; and then correcting the weights of the deep neural network based on that error, so next time we are a bit closer. Here we do not know what is the expected value for our training examples, but we know that whatever that value is, it should be close in distance to the Anchor if we present the Positive example, and far in distance if we present the Negative example. Thus, we will build our loss function in that way. It receives python list of the representation of the Anchor, the Positive example and the Negative example through y_pred. Then it computes the distance between the Anchor and the Positive (AP), and the Anchor and the Negative (AN). As we said AP should get close to 0 while AP should get large. For this exercise, let set “large” to 0.2. So, we want AP=0 and AN=0.2 so we want AN – 0.2 = 0. Ideally, we want both of those to stand, hence we want to minimize the loss where loss = AP – (AN – 0.2). That being explained, below is the loss function we defined.


def triplet_loss(y_true, y_pred, alpha = 0.2):
    """
    Implementation of the triplet loss function
    Arguments:
    y_true — true labels, required when you define a loss in Keras, not used in this function.
    y_pred — python list containing three objects:
            anchor:   the encodings for the anchor data
            positive: the encodings for the positive data (similar to anchor)
            negative: the encodings for the negative data (different from anchor)
    Returns:
    loss — real number, value of the loss
    """
    anchor, positive, negative = y_pred[0], y_pred[1], y_pred[2]
    # distance between the anchor and the positive
    pos_dist = tf.reduce_sum(tf.square(tf.subtract(anchor,positive)))
    # distance between the anchor and the negative
    neg_dist = tf.reduce_sum(tf.square(tf.subtract(anchor,negative)))
    # compute loss
    basic_loss = pos_dist-neg_dist+alpha
    loss = tf.maximum(basic_loss,0.0)
    return loss

view raw

triplet_loss.py

hosted with ❤ by GitHub

Now having a loss function to train a network with, we need a network to be defined. The network should receive as input our network traffic information and output a vector representation of it. I already mentioned the network before, so here is the function that creates it from Keras sequential model.


def create_base_network(in_dims, out_dims):
    """
    Base network to be shared.
    """
    model = Sequential()
    model.add(BatchNormalization(input_shape=in_dims))
    model.add(LSTM(512, return_sequences=True, dropout=0.2, recurrent_dropout=0.2, implementation=2))
    model.add(LSTM(512, return_sequences=False, dropout=0.2, recurrent_dropout=0.2, implementation=2))
    model.add(BatchNormalization())
    model.add(Dense(512, activation='relu'))
    model.add(BatchNormalization())
    model.add(Dense(out_dims, activation='relu'))
    model.add(BatchNormalization())
    return model

view raw

base_network.py

hosted with ❤ by GitHub

Now that we have that base model, we need to embed it within a Siamese “framework”. After all, that base network simply computes one vector representation for a specific network traffic data and the loss function we defined calls for three of those representation i.e. the anchor, the positive and the negative. So, what we will do is to define three inputs which will be evaluated through the SAME base network, hence the name of Siamese network. The output of that Siamese network it then simply concatenated in a list of vectors, which is what we are asking our loss function to evaluate on. Note that at this point we defines the input and output dimensions. The inputs will be in the shape of N_MINS minutes of network traffic characterization (60 minutes for now), where each minutes is characterized by n_feat features (the 130 or so features I mentioned in my previous post).


in_dims = (N_MINS, n_feat)
out_dims = N_FACTORS
# Network definition
with tf.device(tf_device):
    # Create the 3 inputs
    anchor_in = Input(shape=in_dims)
    pos_in = Input(shape=in_dims)
    neg_in = Input(shape=in_dims)
    # Share base network with the 3 inputs
    base_network = create_base_network(in_dims, out_dims)
    anchor_out = base_network(anchor_in)
    pos_out = base_network(pos_in)
    neg_out = base_network(neg_in)
    merged_vector = concatenate([anchor_out, pos_out, neg_out], axis=-1)
    # Define the trainable model
    model = Model(inputs=[anchor_in, pos_in, neg_in], outputs=merged_vector)
    model.compile(optimizer=Adam(),
                  loss=triplet_loss)

Everything is now in place to train the base model through the Siamese “framework” using our defined loss function. Note that the y values we pass to the fit method are dummies value since our loss function does not care for the real targets (which we do not know).


# Training the model
model.fit(train_data, y_dummie, batch_size=256, epochs=10)

view raw

training.py

hosted with ❤ by GitHub

Now we could save the model (really, just the base model is needed here). But more importantly, we can use the base model to perform some evaluation of what would be the vector representation. For me, this was that part which was unclear from other tutorials. You simply should perform a predict on the base model and do not care anymore about the Siamese “framework”. You kind of throw it away.


def traffic_to_encoding(x, model):
    return model.predict(np.array([x]))

view raw

encoding.py

hosted with ❤ by GitHub

For completeness sake, since what we want to do is to evaluate the “closest” vector representation to the trained faults we want to detect, we could create a method to identify the traffic case such as the following.


def identify_traffic(x, database, model):
    """
    Implements traffic recognition.
    Arguments:
    x — the traffic to identify
    database — database containing recognized traffic encodings
    model — the encoding model
    Returns:
    min_dist — the minimum distance between traffic encoding and the encodings from the database
    identity — string, the traffic prediction name
    """
    # Compute the target "encoding" for the traffic.
    encoding = traffic_to_encoding(x, model)
    # Find the closest encoding
    min_dist = 100
    identity = 'unknown'
    for (name, db_enc) in database.items():
     # Compute L2 distance between the target "encoding" and the current "emb" from the database.
        dist = np.linalg.norm(db_enc-encoding)
        # If this distance is less than the min_dist, then set min_dist to dist, and identity to name.
if dist < min_dist:
min_dist = dist
identity = name
return min_dist, identity

view raw

identify.py

hosted with ❤ by GitHub

Assuming proper training from our Siamese network and our training data, we can use the above to create a database of the different traffic conditions we can identify in a specific network (as traffic patterns can change from network to network, but hopefully not the way to represent them). And identify the current traffic using the above created function.


database = {}
database['normal'] = traffic_to_encoding(get_example_label(train_cases_df, df_lens, 0), base_network)
database['error2'] = traffic_to_encoding(get_example_label(train_cases_df, df_lens, 1), base_network)
# Prediction on traffic
identify_traffic(x, database, base_network)

view raw

database.py

hosted with ❤ by GitHub

Et voilà, you should now have all the pieces to properly use Aristotle, Peter Pan and Napoleon to train a Siamese Network, and then sadly throw them away when you do not need them anymore… This metaphor of Siamese cats is heartbrokenly getting closer and closer to reality… Nevertheless, I hope it can help you out there creating all sorts of Siamese Networks!

5 thoughts on “What Siamese Dreams are made of…

  1. Very cool application and solution. My question is how is the merged_vector passed to the triplet_loss function?
    I see the output of the model is merged_vector and in the compile loss-triplet_loss, but how does the merged_vector values get into triplet_loss? Thanks, Jon

    # Define the trainable model
    model = Model(inputs=[anchor_in, pos_in, neg_in], outputs=merged_vector)
    model.compile(optimizer=Adam(),
    loss=triplet_loss)

    Like

  2. So fist take a look at my answer on the “how to potty train a siamese network”. The loss function here was buggy… that can explain difficulty in understanding it 🙂 . Next, we don’t use the y_true in this version as we define the loss only on the y_pred, in fact on pairs of y_pred. So we need to split the y_pred into the components anchor, positive and negative and then we want to maximize the distance between anchor and negative while minimizing the distance anchor to positive. The model compile define the usage of that loss function, and the model.train will pass the examples prediction to the loss function in batches, that is Keras mechanics.

    Like

  3. Hi, I am trying to train a siamese network for signature verification. When I use loss functions which find distance between image encodings (e.g. triplet loss and contrastive loss). I don’t know why the model learns to encode every image into a vector of zeros.

    But when I used ‘binary_crossentropy’ it is working fine.

    Like

Leave a comment