View on GitHub

Research Review Notes

Summaries of academic research papers

Adversarial Learning for Neural Dialogue Generation

Idea

The authors formulate this dialogue model as a reinforcement learning problem. The network used is a Generative Adversarial Network. The discriminator objective is the same as a Turing test predictor i.e. classifies whether the dialogue response is human or machine-generated. The goal is to improve to improve the generator to the point where the discriminator has trouble distinguishing between human and machine-generated responses.

Method

The generator network is a neural seq2seq model, and the discriminator is similar to a Turing test evaluator.
The generation task is not formulated as a NMT task. Instead, it tries to maximize the likelihood of a response $y = {y_1, y_2 … y_T}$ given a history of previous sentences $x$.
The generator defines the policy by which each word of the output sentence $y$ is generated using a softmax over the space of the vocabulary.
The discriminator uses a hierarchical neural autoencoder to generate a vector representation of an entire sequence of conversation i.e. ${x, y}$. This vector representation is then fed into a binary classifier which predicts whether the sentences were human- or machine-generated.
The generator is trained to maximize the expected reward of the generated utterance using the REINFORCE algorithm.
The vanilla REINFORCE model doesn’t assign rewards to each generated word, and rather assigns equal reward to all the tokens within a predicted sequence of words.
However, for partially decoded sequences, the discriminator must also be capable of generating classifications for partial sequences. Two methods are proposed to solve this:
- Using a Monte-Carlo search to decode $N ( = 5)$ top candidate sentences given a partial sequences and using the discriminator average of the 5 complete sequences to predict the classification for the partial sequence.
- Training the discriminator to directly also be able to classify partial sequences.
The Monte-Carlo search strategy was found to be more effective.
Teacher forcing is used to essentially short-circuit the distance between the generator and the true sequence.
The generative model is trained using seq2seq and an attention mechanism. The discriminator is also pre-trained using part of the training data and generating sequences by beam-search and sampling.
Intuitively, low accuracy of a reasonably well trained discriminator would imply that the quality of generated sentences have improved significantly.

Observations

The authors report that the responses generated by their system are more interactive, interesting, and non-repetitive. It’d be interesting to see how they quantify this. UPDATE: The source for this claim is human evaluations, which of course, could be subjective.
It’s also observed that the system yielded better results when the context i.e. the $x$ preceding utterances were limited to 2.
The hierarchical neural model is the architecture of choice for the discriminator (evaluator).