Google recently released a text summarization task using TensorFlow. This guide will help you get set up on your machine.
Note: The dataset used to train the NN (Gigaset) is proprietary and requires a license. In this guide we use the example data and (maybe) some alternative dataset.
1) Clone the repo:
git clone https://github.com/tensorflow/models.git
2) Install Tensorflow (How to install TF) and Bazel.
3) In the repo folder, access folder models and make a new folder to train your model. Copy the contents of textsum into your new folder. Create a WORKSPACE file and build. If you are like me, you might not have CUDA, so just remove –config=cuda.
mkdir textsumtrain
cp -r ../textsum ./
touch WORKSPACE
bazel build -c opt --config=cuda textsum/...
4) Add the data to your folder. For example, if you use the example data:
mv textsum/data ./
5) Now we can run the training using the training data. Pay attention to the data_path= and vocab_path=. If you followed all the steps you can just copy paste what is below.
bazel-bin/textsum/seq2seq_attention \
--mode=train \
--article_key=article \
--abstract_key=abstract \
--data_path=data/data \
--vocab_path=data/vocab \
--log_root=textsum/log_root \
--train_dir=textsum/log_root/train
And shortly:
Yey! Go make a cup of tea and wait, cause it might take a while…
# References: