In half 1: A delicate introduction to positional encoding in transformer fashions, we mentioned the positional encoding layer of the transformer mannequin. We additionally confirmed how one can implement this layer and its features your self in Python. On this tutorial, we’ll implement the positional encoding layer in Keras and Tensorflow. You possibly can then use this layer in an entire transformer mannequin.
After finishing this tutorial, you’ll know:
- Textual content vectorization in Keras
- Embedding layer in Keras
- The right way to subclass the embedding layer and write your personal positional encoding layer.
Let’s get began.

The Transformer Positional Encoding Layer in Keras, Half 2.
Photograph by Ijaz Rafi. Some rights reserved
Tutorial Overview
This tutorial is split into 3 elements; they’re:
- Textual content vectorization and embedding layer in Keras
- Writing your personal positional encoding layer in Keras
- Randomly initialized and tunable embeddings
- Mounted weight embeddings from Consideration is All You Want
- Graphical view of the output of the positional encoding layer
The Import Part
First let’s write the part to import all of the required libraries:
import tensorflow as tf from tensorflow import convert_to_tensor, string from tensorflow.keras.layers import TextVectorization, Embedding, Layer from tensorflow.knowledge import Dataset import numpy as np import matplotlib.pyplot as plt
The Textual content Vectorization Layer
We’ll begin with a set of English phrases, that are already preprocessed and cleaned. The textual content vectorization layer creates a dictionary of phrases and replaces every phrase by its corresponding index within the dictionary. Let’s see how we will map these two sentences utilizing the textual content vectorization layer:
- I’m a robotic
- you too robotic
Observe we now have already transformed the textual content to lowercase and eliminated all of the punctuations and noise in textual content. We’ll convert these two phrases to vectors of a hard and fast size 5. The TextVectorization
layer of Keras requires a most vocabulary dimension and the required size of output sequence for initialization. The output of the layer is a tensor of form:
(variety of sentences, output sequence size)
The next code snippet makes use of the adapt
methodology to generate a vocabulary. It subsequent creates a vectorized illustration of textual content.
output_sequence_length = 5 vocab_size = 10 sentences = [["I am a robot"], ["you too robot"]] sentence_data = Dataset.from_tensor_slices(sentences) # Create the TextVectorization layer vectorize_layer = TextVectorization( output_sequence_length=output_sequence_length, max_tokens=vocab_size) # Prepare the layer to create a dictionary vectorize_layer.adapt(sentence_data) # Convert all sentences to tensors word_tensors = convert_to_tensor(sentences, dtype=tf.string) # Use the phrase tensors to get vectorized phrases vectorized_words = vectorize_layer(word_tensors) print("Vocabulary: ", vectorize_layer.get_vocabulary()) print("Vectorized phrases: ", vectorized_words)
Vocabulary: ['', '[UNK]', 'robotic', 'you', 'too', 'i', 'am', 'a'] Vectorized phrases: tf.Tensor( [[5 6 7 2 0] [3 4 2 0 0]], form=(2, 5), dtype=int64)
The Embedding Layer
The Keras Embedding
layer converts integers to dense vectors. This layer maps these integers to random numbers, that are later tuned throughout the coaching part. Nevertheless, you even have the choice to set the mapping to some predefined weight values (proven later). To initialize this layer, we have to specify the utmost worth of an integer to map, together with the size of the output sequence.
The Phrase Embeddings
Let’s see how the layer converts our vectorized_text
to tensors.
output_length = 6 word_embedding_layer = Embedding(vocab_size, output_length) embedded_words = word_embedding_layer(vectorized_words) print(embedded_words)
I’ve annotated the output with my feedback as proven beneath. Observe, you will notice a unique output each time you run this code as a result of the weights have been initialized randomly.

Phrase Embeddings. This output shall be completely different each time you run the code due to the random numbers concerned.
The Place Embeddings
We additionally want the embeddings for the corresponding positions. The utmost positions correspond to the output sequence size of the TextVectorization
layer.
position_embedding_layer = Embedding(output_sequence_length, output_length) position_indices = tf.vary(output_sequence_length) embedded_indices = position_embedding_layer(position_indices) print(embedded_indices)
The output is proven beneath:
The Output of Positional Encoding Layer in Transformers
In a transformer mannequin the ultimate output is the sum of each the phrase embeddings and the place embeddings. Therefore, while you arrange each embedding layers, you should be sure that the output_length
is similar for each.
final_output_embedding = embedded_words + embedded_indices print("Closing output: ", final_output_embedding)
The output is proven beneath, annotated with my feedback. Once more, this shall be completely different out of your run of the code due to the random weight initialization.
SubClassing the Keras Embedding Layer
When implementing a transformer mannequin, you’ll have to write down your personal place encoding layer. That is fairly easy as the essential performance is already supplied for you. This Keras instance reveals how one can subclass the Embedding
layer to implement your personal performance. You possibly can add extra strategies to it as you require.
class PositionEmbeddingLayer(Layer): def __init__(self, sequence_length, vocab_size, output_dim, **kwargs): tremendous(PositionEmbeddingLayer, self).__init__(**kwargs) self.word_embedding_layer = Embedding( input_dim=vocab_size, output_dim=output_dim ) self.position_embedding_layer = Embedding( input_dim=sequence_length, output_dim=output_dim ) def name(self, inputs): position_indices = tf.vary(tf.form(inputs)[-1]) embedded_words = self.word_embedding_layer(inputs) embedded_indices = self.position_embedding_layer(position_indices) return embedded_words + embedded_indices
Let’s run this layer.
my_embedding_layer = PositionEmbeddingLayer(output_sequence_length, vocab_size, output_length) embedded_layer_output = my_embedding_layer(vectorized_words) print("Output from my_embedded_layer: ", embedded_layer_output)
Output from my_embedded_layer: tf.Tensor( [[[ 0.06798736 -0.02821309 0.00571618 0.00314623 -0.03060734 0.01111387] [-0.06097465 0.03966043 -0.05164248 0.06578685 0.03638128 -0.03397174] [ 0.06715029 -0.02453769 0.02205854 0.01110986 0.02345785 0.05879898] [-0.04625867 0.07500569 -0.05690887 -0.07615659 0.01962536 0.00035865] [ 0.01423577 -0.03938593 -0.08625181 0.04841495 0.06951572 0.08811047]] [[ 0.0163899 0.06895607 -0.01131684 0.01810524 -0.05857501 0.01811318] [ 0.01915303 -0.0163289 -0.04133433 0.06810946 0.03736673 0.04218033] [ 0.00795418 -0.00143972 -0.01627307 -0.00300788 -0.02759011 0.09251165] [ 0.0028762 0.04526488 -0.05222676 -0.02007698 0.07879823 0.00541583] [ 0.01423577 -0.03938593 -0.08625181 0.04841495 0.06951572 0.08811047]]], form=(2, 5, 6), dtype=float32)
Positional Encoding in Transformers: Consideration is All You Want
P(okay, 2i) &=& sinBig(frac{okay}{n^{2i/d}}Large)
P(okay, 2i+1) &=& cosBig(frac{okay}{n^{2i/d}}Large)
finish{eqnarray}
Embedding
layer, you should present the positional encoding matrix as weights together with trainable=False
. Let’s create one other positional embedding class that does precisely this.class PositionEmbeddingFixedWeights(Layer): def __init__(self, sequence_length, vocab_size, output_dim, **kwargs): tremendous(PositionEmbeddingFixedWeights, self).__init__(**kwargs) word_embedding_matrix = self.get_position_encoding(vocab_size, output_dim) position_embedding_matrix = self.get_position_encoding(sequence_length, output_dim) self.word_embedding_layer = Embedding( input_dim=vocab_size, output_dim=output_dim, weights=[word_embedding_matrix], trainable=False ) self.position_embedding_layer = Embedding( input_dim=sequence_length, output_dim=output_dim, weights=[position_embedding_matrix], trainable=False ) def get_position_encoding(self, seq_len, d, n=10000): P = np.zeros((seq_len, d)) for okay in vary(seq_len): for i in np.arange(int(d/2)): denominator = np.energy(n, 2*i/d) P[k, 2*i] = np.sin(okay/denominator) P[k, 2*i+1] = np.cos(okay/denominator) return P def name(self, inputs): position_indices = tf.vary(tf.form(inputs)[-1]) embedded_words = self.word_embedding_layer(inputs) embedded_indices = self.position_embedding_layer(position_indices) return embedded_words + embedded_indices
Subsequent, we arrange all the things to run this layer.
attnisallyouneed_embedding = PositionEmbeddingFixedWeights(output_sequence_length, vocab_size, output_length) attnisallyouneed_output = attnisallyouneed_embedding(vectorized_words) print("Output from my_embedded_layer: ", attnisallyouneed_output)
Output from my_embedded_layer: tf.Tensor( [[[-0.9589243 1.2836622 0.23000172 1.9731903 0.01077196 1.9999421 ] [ 0.56205547 1.5004725 0.3213085 1.9603932 0.01508068 1.9999142 ] [ 1.566284 0.3377554 0.41192317 1.9433732 0.01938933 1.999877 ] [ 1.0504174 -1.4061394 0.2314966 1.9860148 0.01077211 1.9999698 ] [-0.7568025 0.3463564 0.18459873 1.982814 0.00861763 1.9999628 ]] [[ 0.14112 0.0100075 0.1387981 1.9903207 0.00646326 1.9999791 ] [ 0.08466846 -0.11334133 0.23099795 1.9817369 0.01077207 1.9999605 ] [ 1.8185948 -0.8322937 0.185397 1.9913884 0.00861771 1.9999814 ] [ 0.14112 0.0100075 0.1387981 1.9903207 0.00646326 1.9999791 ] [-0.7568025 0.3463564 0.18459873 1.982814 0.00861763 1.9999628 ]]], form=(2, 5, 6), dtype=float32)
Visualizing the Closing Embedding
As a way to visualize the embeddings, let’s take two larger sentences, one technical and the opposite one only a quote. We’ll arrange the TextVectorization
layer together with the positional encoding layer and see what the ultimate output appears like.
technical_phrase = "to know machine studying algorithms you want" + " to know ideas comparable to gradient of a operate "+ "Hessians of a matrix and optimization and so on" wise_phrase = "patrick henry stated give me liberty or give me loss of life "+ "when he addressed the second virginia conference in march" total_vocabulary = 200 sequence_length = 20 final_output_len = 50 phrase_vectorization_layer = TextVectorization( output_sequence_length=sequence_length, max_tokens=total_vocabulary) # Be taught the dictionary phrase_vectorization_layer.adapt([technical_phrase, wise_phrase]) # Convert all sentences to tensors phrase_tensors = convert_to_tensor([technical_phrase, wise_phrase], dtype=tf.string) # Use the phrase tensors to get vectorized phrases vectorized_phrases = phrase_vectorization_layer(phrase_tensors) random_weights_embedding_layer = PositionEmbeddingLayer(sequence_length, total_vocabulary, final_output_len) fixed_weights_embedding_layer = PositionEmbeddingFixedWeights(sequence_length, total_vocabulary, final_output_len) random_embedding = random_weights_embedding_layer(vectorized_phrases) fixed_embedding = fixed_weights_embedding_layer(vectorized_phrases)
Now let’s see what the random embeddings appear like for each phrases.
fig = plt.determine(figsize=(15, 5)) title = ["Tech Phrase", "Wise Phrase"] for i in vary(2): ax = plt.subplot(1, 2, 1+i) matrix = tf.reshape(random_embedding[i, :, :], (sequence_length, final_output_len)) cax = ax.matshow(matrix) plt.gcf().colorbar(cax) plt.title(title[i], y=1.2) fig.suptitle("Random Embedding") plt.present()
The embedding from the mounted weights layer are visualized beneath.
fig = plt.determine(figsize=(15, 5)) title = ["Tech Phrase", "Wise Phrase"] for i in vary(2): ax = plt.subplot(1, 2, 1+i) matrix = tf.reshape(fixed_embedding[i, :, :], (sequence_length, final_output_len)) cax = ax.matshow(matrix) plt.gcf().colorbar(cax) plt.title(title[i], y=1.2) fig.suptitle("Mounted Weight Embedding from Consideration is All You Want") plt.present()
We are able to see that the embedding layer initialized utilizing the default parameter outputs random values. Then again, the mounted weights generated utilizing sinusoids create a novel signature for each phrase with data on every phrase place encoded inside it.
You possibly can experiment with each tunable or mounted weight implementations to your explicit software.
Additional Studying
This part gives extra sources on the subject in case you are trying to go deeper.
Books
- Transformers for pure language processing, by Denis Rothman.
Papers
Articles
- The Transformer Consideration Mechanism
- The Transformer Mannequin
- Transformer Mannequin for Language Understanding
- Utilizing Pre-Skilled Phrase Embeddings in a Keras Mannequin
- English-to-Spanish translation with a sequence-to-sequence Transformer
- A Light Introduction to Positional Encoding in Transformer Fashions, Half 1
Abstract
On this tutorial, you found the implementation of positional encoding layer in Keras.
Particularly, you discovered:
- Textual content vectorization layer in Keras
- Positional encoding layer in Keras
- Creating your personal class for positional encoding
- Setting your personal weights for the positional encoding layer in Keras
Do you may have any questions on positional encoding mentioned on this publish? Ask your questions within the feedback beneath and I’ll do my finest to reply.
The publish The Transformer Positional Encoding Layer in Keras, Half 2 appeared first on Machine Studying Mastery.