
- Introduction to Seq2Seq Models
- Seq2Seq Architecture and Applications
- Text Summarization Using an Encoder-Decoder Sequence-to-Sequence Model
- Step 1 – Importing the Dataset
- Step 2 – Cleaning the Data
- Step 3 – Determining the Maximum Permissible Sequence Lengths
- Step 4 – Selecting Plausible Texts and Summaries
- Step 5 – Tokenizing the Text
- Step 6 – Removing Empty Text and Summaries
In this tutorial, we will delve into the continuation of our series on encoder-decoder sequence-to-sequence RNNs, focusing on crafting, training, and testing our seq2seq model aimed at text summarization through Keras.
Let’s proceed!
Prerequisites
To effectively engage with this article, you should have familiarity with Python and a basic grasp of Deep Learning concepts. We assume that readers are equipped with adequately powerful machines to execute the provided code.
If GPU access is unavailable, consider utilizing cloud options.
For guidance on initiating with Python, we recommend reviewing beginner tutorials to establish your system setup.
Step 7: Creating the Model
First, ensure all essential libraries are imported.
from tensorflow.keras.preprocessing.text import Tokenizerfrom tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.layers import Input, LSTM, Embedding, Dense, Concatenate, TimeDistributed
from tensorflow.keras.models import Model
from tensorflow.keras.callbacks import EarlyStopping
Next, define the Encoder and Decoder networks.
Encoder
The encoder accepts input with a length that’s equal to the maximal text length derived in Step 3. This input is fed into an Embedding Layer with a dimension calculated based on the total words captured in the vocabulary. Following this, three LSTM layers are applied, where each layer returns the LSTM output and the hidden and cell states from the previous time steps.
Decoder
In the decoder, an embedding layer is defined and linked to an LSTM network. The LSTM’s initial state utilizes the last hidden and cell states from the encoder. The resultant output from the LSTM feeds into a TimeDistributed Dense layer featuring a softmax activation function.
Overall, the model accepts encoder (text) input along with the decoder (summary) input and outputs the predicted summary. This prediction occurs through the formulation of the upcoming word based on the prior word in the summary.
To define your neural network architecture, incorporate the following code.
latent_dim = 300embedding_dim = 200
# Encoder
encoder_inputs = Input(shape=(max_text_len,))
# Embedding layer
enc_emb = Embedding(x_voc, embedding_dim, trainable=True)(encoder_inputs)
# Encoder LSTM 1
encoder_lstm1 = LSTM(latent_dim, return_sequences=True, return_state=True, dropout=0.4, recurrent_dropout=0.4)
encoder_output1, state_h1, state_c1 = encoder_lstm1(enc_emb)
# Encoder LSTM 2
encoder_lstm2 = LSTM(latent_dim, return_sequences=True, return_state=True, dropout=0.4, recurrent_dropout=0.4)
encoder_output2, state_h2, state_c2 = encoder_lstm2(encoder_output1)
# Encoder LSTM 3
encoder_lstm3 = LSTM(latent_dim, return_state=True, return_sequences=True, dropout=0.4, recurrent_dropout=0.4)
encoder_outputs, state_h, state_c = encoder_lstm3(encoder_output2)
# Establishing the decoder, utilizing encoder states as the initial state
decoder_inputs = Input(shape=(None,))
# Embedding layer
dec_emb_layer = Embedding(y_voc, embedding_dim, trainable=True)
dec_emb = dec_emb_layer(decoder_inputs)
# Decoder LSTM
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True, dropout=0.4, recurrent_dropout=0.2)
decoder_outputs, decoder_fwd_state, decoder_back_state = decoder_lstm(dec_emb, initial_state=[state_h, state_c])
# Dense layer
decoder_dense = TimeDistributed(Dense(y_voc, activation='softmax'))
decoder_outputs = decoder_dense(decoder_outputs)
# Defining the model
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.summary()
Step 8: Training the Model
In this step, compile the model and set up EarlyStopping
to halt training once the validation loss ceases to reduce.
model.compile(optimizer='rmsprop', loss='sparse_categorical_crossentropy')es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=2)
Then, use the model.fit()
method to fit the training data. Define a batch size of 128 and provide the text and summaries (excluding the final word in the summary) as inputs. Additionally, provide a reshaped summary tensor comprising each word (starting from the second word) as output. Include validation data to monitor validation during training.
history = model.fit( [x_tr, y_tr[:, :, :-1]],
y_tr[:, 1:],
epochs=50,
callbacks=[es],
batch_size=128,
validation_data=([x_val, y_val[:, :, :-1]], y_val[:, 1:])
)
Plot the training and validation loss metrics observed throughout the training.
from matplotlib import pyplotpyplot.plot(history.history['loss'], label='train')
pyplot.plot(history.history['val_loss'], label='test')
pyplot.legend()
pyplot.show()
Step 9: Generating Predictions
With the model trained, generate summaries from the provided texts by first reversing the indices to words. This was achieved through texts_to_sequences
in Step 5. Additionally, map the words to indices from the summaries tokenizer to identify the start and end of the sequences.
reverse_target_word_index = y_tokenizer.index_wordreverse_source_word_index = x_tokenizer.index_word
target_word_index = y_tokenizer.word_index
Subsequently, define encoder and decoder inference models to start generating predictions. The encoder inference model processes input text and provides the output generated from the three LSTMs along with hidden and cell states. The decoder inference model accepts the start of the sequence identifier (sostok) and predicts the subsequent word, gradually predicting the complete summary.
# Inference Models# Encode the input sequence to obtain the feature vector
encoder_model = Model(inputs=encoder_inputs, outputs=[encoder_outputs, state_h, state_c])
# Decoder setup
# Below tensors will hold the states of the previous time step
decoder_state_input_h = Input(shape=(latent_dim,))
decoder_state_input_c = Input(shape=(latent_dim,))
decoder_hidden_state_input = Input(shape=(max_text_len, latent_dim))
# Acquiring the embeddings of the decoder sequence
dec_emb2 = dec_emb_layer(decoder_inputs)
# To predict the next word in the sequence, set the initial states to the states from the previous time step
decoder_outputs2, state_h2, state_c2 = decoder_lstm(dec_emb2,
initial_state=[decoder_state_input_h, decoder_state_input_c])
# Dense softmax layer to generate probability distribution over the target vocabulary
decoder_outputs2 = decoder_dense(decoder_outputs2)
# Final decoder model
decoder_model = Model([decoder_inputs] + [decoder_hidden_state_input, decoder_state_input_h, decoder_state_input_c],
[decoder_outputs2] + [state_h2, state_c2])
Define a function decode_sequence()
that accepts input text and generates the predicted summary. Start with sostok
and continue until eostok
is encountered or the maximum summary length is reached. Each prediction of the upcoming word occurs by selecting the word with the highest attached probability while updating the internal state of the decoder accordingly.
def decode_sequence(input_seq): # Encode the input as state vectors.
e_out, e_h, e_c = encoder_model.predict(input_seq)
# Generate empty target sequence of length 1
target_seq = np.zeros((1, 1))
# Populate the first word of target sequence with the start word.
target_seq[0, 0] = target_word_index['sostok']
stop_condition = False
decoded_sentence = ''
while not stop_condition:
output_tokens, h, c = decoder_model.predict([target_seq] + [e_out, e_h, e_c])
# Sample a token
sampled_token_index = np.argmax(output_tokens[0, -1, :])
sampled_token = reverse_target_word_index[sampled_token_index]
if sampled_token != 'eostok':
decoded_sentence += ' ' + sampled_token
# Exit condition: either reach max length or find the stop word.
if sampled_token == 'eostok' or len(decoded_sentence.split()) >= max_summary_len - 1:
stop_condition = True
# Update the target sequence (length 1)
target_seq = np.zeros((1, 1))
target_seq[0, 0] = sampled_token_index
# Update internal states
e_h, e_c = h, c
return decoded_sentence
Also, define two functions – seq2summary()
and seq2text()
which convert numeric-representation to string-representation of summary and text respectively.
# To convert sequence to summarydef seq2summary(input_seq):
newString = ''
for i in input_seq:
if i != 0 and i != target_word_index['sostok'] and i != target_word_index['eostok']:
newString += reverse_target_word_index[i] + ' '
return newString
# To convert sequence to text
def seq2text(input_seq):
newString = ''
for i in input_seq:
if i != 0:
newString += reverse_source_word_index[i] + ' '
return newString
Finally, generate the predictions by supplying the text.
for i in range(0, 19): print('Review:', seq2text(x_tr[i]))
print('Original summary:', seq2summary(y_tr[i]))
print('Predicted summary:', decode_sequence(x_tr[i].reshape(1, max_text_len)))
print('')
Here are several notable summaries generated by the RNN model.
Review: us president donald trump on wednesday said that north korea has returned the remains of 200 us troops missing from the korean war although there was no official confirmation from military authorities north korean leader kim jong un had agreed to return the remains during his summit with trump about 700 us troops remain unaccounted from the 1950 1953 korean warOriginal summary: start n korea has returned remains of 200 us war dead trump end
Predicted summary: start n korea has lost an war against us trump end
Review: pope francis has said that history will judge those who refuse to accept the science of climate change if someone is doubtful that climate change is true they should ask scientists the pope added notably us president donald trump who believes global warming is chinese conspiracy withdrew the country from the paris climate agreement
Original summary: start history will judge those denying climate change pope end
Predicted summary: start pope francis will be in paris climate deal prez end
Review: the enforcement directorate ed has attached assets worth over â¢â‚¬33 500 crore in the over three year tenure of its chief karnal singh who retires sunday officials said the agency filed around 390 in connection with its money laundering probes during the period the government on saturday appointed indian revenue service irs officer sanjay kumar mishra as interim ed chief
Original summary: start enforcement attached assets worth â¢â‚¬33 500 cr in yrs end
Predicted summary: start ed attaches assets worth 100 crore in india in days end
Review: lok janshakti party president ram vilas paswan daughter asha has said she will contest elections against him from constituency if given ticket from lalu prasad yadav rjd she accused him of neglecting her and promoting his son chirag asha is paswan daughter from his first wife while chirag is his son from his second wife
Original summary: start will contest against father ram vilas from daughter end
Predicted summary: start lalu son tej pratap to contest his daughter in 2019 end
Review: irish deputy prime minister frances fitzgerald announced her resignation on tuesday in bid to avoid the collapse of the government and potential snap election she quit hours before no confidence motion was to be proposed against her by the main opposition party the political crisis began over fitzgerald role in police whistleblower scandal
Original summary: start irish deputy prime minister resigns to avoid govt collapse end
Predicted summary: start pmo resigns from punjab to join nda end
Review: rr wicketkeeper batsman jos buttler slammed his fifth straight fifty in ipl 2018 on sunday to equal former indian cricketer virender sehwag record of most straight 50 scores in the ipl sehwag had achieved the feat while representing dd in the ipl 2012 buttler is also only the second batsman after shane watson to hit two successive 90 scores in ipl
Original summary: start buttler equals sehwag record of most straight 50s in ipl end
Predicted summary: start sehwag slams sixes in an ipl over 100 times in ipl end
Review: maruti suzuki india on wednesday said it is recalling 640 units of its super carry mini trucks sold in the domestic market over possible defect in fuel pump supply the recall covers super carry units manufactured between january 20 and july 14 2018 the faulty parts in the affected vehicles will be replaced free of cost the automaker said n
Original summary: start maruti recalls its mini trucks over fuel pump issue in india end
Predicted summary: start maruti suzuki recalls india over â¢â‚¬3 crore end
Review: the arrested lashkar e taiba let terrorist aamir ben has confessed to the national investigation agency that pakistani army provided him cover firing to infiltrate into india he further revealed that hafiz organisation ud dawah arranged for his training and that he was sent across india to carry out subversive activities in and outside kashmir
Original summary: start pak helped me enter india arrested let terrorist to nia end
Predicted summary: start pak man who killed indian soldiers to enter kashmir end
Review: the 23 richest indians in the 500 member bloomberg billionaires index saw wealth erosion of 21 billion this year lakshmi mittal who controls the world largest steelmaker arcelormittal lost 5 6 billion or 29 of his net worth followed by sun pharma founder dilip shanghvi whose wealth declined 4 6 billion asia richest person mukesh ambani added 4 billion to his fortune
Original summary: start lakshmi mittal lost 10 bn in 2018 ambani added 4 bn end
Predicted summary: start india richest man lost billion in wealth in 2017 end
Conclusion
The Encoder-Decoder Sequence-to-Sequence Model (LSTM) developed successfully generated reasonable summaries based on the training texts. Although after 50 epochs the predicted summaries do not perfectly align with the expected summaries (the model has not reached human-level intelligence!), the advancements achieved by the model are commendable.
For enhanced accuracy from this model, consider enlarging the dataset, experimenting with hyperparameters, increasing the network size, and augmenting the number of epochs.
In this tutorial, you learned to train an encoder-decoder sequence-to-sequence model for text summarization. In the next article, we will explore attention mechanisms in detail. Until then, happy learning!
Reference: Sandeep Bhogaraju
Thank you for learning with us. Explore our offerings for compute, storage, networking, and managed databases.
Learn more about our products
Welcome to DediRock, your trusted partner in high-performance hosting solutions. At DediRock, we specialize in providing dedicated servers, VPS hosting, and cloud services tailored to meet the unique needs of businesses and individuals alike. Our mission is to deliver reliable, scalable, and secure hosting solutions that empower our clients to achieve their digital goals. With a commitment to exceptional customer support, cutting-edge technology, and robust infrastructure, DediRock stands out as a leader in the hosting industry. Join us and experience the difference that dedicated service and unwavering reliability can make for your online presence. Launch our website.