close

Se connecter

Se connecter avec OpenID

Addressing the Rare Word Problem in Neural Machine Translation

IntégréTéléchargement
Addressing the Rare Word Problem
in Neural Machine Translation
Thang Luong
ACL 2015
Joint work with: Ilya Sutskever, Quoc Le, Oriol
Vinyals, & Wojciech Zaremba.
Standard Machine Translation (MT)
Cindy
Cindy
loves
aime
cute cats
les chats mignons
• Translate locally phrases by phrases:
– Good progress: Moses (Koehn et al., 2007) among many others.
– Many subcomponents need to be tuned separately.
• Hybrid systems with neural components:
– Language model: (Schwenk et al., 2006), (Vaswani et al., 2013).
– Translation model: (Schwenk, 2012), (Devlin et al., 2014).
– Complex pipeline.
• Desire: a simple system that translates globally.
Neural Machine Translation (NMT)
Target Sentence
A B C D
Source Sentence
X
Y
Z
–
–
X
Y
Z
(Sutskever et al., 2014)
• Encoder-decoder: first proposed at Google & Montreal.
• Advantages:
– Minimal domain knowledge.
– Dimensionality reduction: up to 100-gram source-conditioned LMs.
• No gigantic phrase tables or LMs.
– Simple beam-search decoder.
Existing NMT Work
(Kalchbrenner & Blunsom, 2013)
(Sutskever et al., 2014)
(Cho et al., 2014), (Bahdanau et al.,
2015)
Encoder
Decoder
Convolutional Net
RNN
Long-short term memory (LSTM)
LSTM
Gated Recurrent Unit (GRU)
GRU
• All decoders use recurrent networks.
• All* NMT work uses fixed modest-size vocabulary
– <unk> to represent all OOV words.
– Translations with <unk> are troublesome!
*Except the very recent work (Jean et al., 2015): scale to large vocabulary.
The Rare Word Problem
Original
Actual input
The ecotax portico in Pont-de-Buis
Le portique écotaxe de Pont-de-Buis
The <unk> portico in <unk>
Le <unk> <unk> de <unk>
• NMTs translate poorly for sentences with rare words.
BLEU
40
35
30
Durrani et al. (37.0)
Sutskever et al. (34.8)
25
Sentences ordered by average frequency rank
Our approach
Original
The ecotax portico in Pont-de-Buis
Le portique écotaxe de Pont-de-Buis
Actual input
The <unk> portico in <unk>
Le unk1 unk-1 de unk1
• Idea: track where each target <unk> comes from
– Annotate train data: unsupervised alignments & relative indices.
– Post-process test translations: word/identity translations.
• “Attention” for rare words (Bahdanau et al., 2015).
Our approach
Original
The ecotax portico in Pont-de-Buis
Le portique écotaxe de Pont-de-Buis
Actual input
The <unk> portico in <unk>
Le unk1 unk-1 de unk1
• Idea: track where each target <unk> comes from
– Annotate train data: unsupervised alignments & relative indices.
– Post-process test translations: word/identity translations.
• “Attention” for rare words (Bahdanau et al., 2015).
Our approach
Original
The ecotax portico in Pont-de-Buis
Le portique écotaxe de Pont-de-Buis
Actual input
The <unk> portico in <unk>
Le unk1 unk-1 de unk1
• Idea: track where each target <unk> comes from
– Annotate train data: unsupervised alignments & relative indices.
– Post-process test translations: word/identity translations.
• “Attention” for rare words (Bahdanau et al., 2015).
Our approach
Original
The ecotax portico in Pont-de-Buis
Le portique écotaxe de Pont-de-Buis
Actual input
The <unk> portico in <unk>
Le unk1 unk-1 de unk1
• Idea: track where each target <unk> comes from
– Annotate train data: unsupervised alignments & relative indices.
– Post-process test translations: word/identity translations.
• “Attention” for rare words (Bahdanau et al., 2015).
Treat any neural MT as a black box:
annotate training data & post-process translations.
Experiments
• WMT’14 English-French
– Hyper-parameters: newstest2012+2013.
– BLEU: newstest2014.
• Setup: similar to (Sutskever et al., 2014)
– Stacking LSTMs: 1000 cells, 1000-dim embeddings.
– Reverse source sentences.
Results
Systems
SOTA in WMT’14 (Durrani et al., 2014)
BLEU
37.0
Our NMT systems (40K target vocab)
Single 6-layer LSTM
Single 6-layer LSTM + Our technique
Ensemble of 8 LSTMs
Ensemble of 8 LSTMs + Our technique
30.4
32.7 (+2.3)
34.1
36.9 (+2.8)
• Better models: better gains with our technique
• Naïve approach: monotonic alignments of <unk>
– Only +0.8 BLEU gain.
Results
Systems
SOTA in WMT’14 (Durrani et al., 2014)
BLEU
37.0
Our NMT systems (40K target vocab)
Single 6-layer LSTM
Single 6-layer LSTM + Our technique
Ensemble of 8 LSTMs
Ensemble of 8 LSTMs + Our technique
30.4
32.7 (+2.3)
34.1
36.9 (+2.8)
Our NMT systems (80K target vocab)
Single 6-layer LSTM
Single 6-layer LSTM + Our technique
Ensemble of 8 LSTMs
Ensemble of 8 LSTMs + Our technique
31.5
33.1 (+1.6)
35.6
37.5 (+1.9)
• New SOTA: about +2.0 BLEU gain with our technique
Existing Work
Systems
Vocab
BLEU
Ensemble 8 LSTMs (This work)
80K
37.5
SOTA in WMT’14 (Durrani et al., 2014)
All
37.0
Neural Language Model (Schwenk, 2014)
All
33.3
Phrase table neural features (Cho et al., 2014)
All
34.5
Ensemble 5 LSTMs, rerank n-best lists (Sutskever et al., 2014)
All
36.5
Standard MT + neural components
Existing Work
Systems
Vocab
BLEU
Ensemble 8 LSTMs (This work)
80K
37.5
SOTA in WMT’14 (Durrani et al., 2014)
All
37.0
Neural Language Model (Schwenk, 2014)
All
33.3
Phrase table neural features (Cho et al., 2014)
All
34.5
Ensemble 5 LSTMs, rerank n-best lists (Sutskever et al., 2014)
All
36.5
Ensemble 5 LSTMs (Sutskever et al., 2014)
80K
34.8
Single RNNsearch (Bahdanau et al., 2015)
30K
28.5
Ensemble 8 RNNsearch + Unknown replace (Jean et al., 2015)
500K
37.2
Standard MT + neural components
End-to-end NMT systems
• Still SOTA performance until now!
– We got 37.7 after ACL camera-ready version.
Effects of Translating Rare Words
BLEU
40
Durrani et al. (37.0)
35
Sutskever et al. (34.8)
30
This work (37.5)
25
Sentences ordered by average frequency rank
• Better than existing SOTA on both frequent and rare words.
Effects of Network Depths
32
29
Before unk
After unk
+2.2
+2.0
+1.9
26
23
20
Depth 3
Depth 4
Depth 6
• Each layer gives on average about +1 BLEU gain.
• More accurate models: better gains with our technique.
Perplexity vs. BLEU
27
BLEU
26
25
24
23
5.5
6
6.5
7
Perplexity
• Training objective: perplexity.
• Strong correlation: 0.5 perplexity reduction gives
about +1.0 BLEU.
Sample translations
src
An additional 2600 operations including orthopedic and cataract surgery will help
clear a backlog .
ref
2600 opérations supplémentaires , notamment dans le domaine de la chirurgie
orthopédique et de la cataracte , aideront à rattraper le retard .
trans
En outre , unk1 opérations supplémentaires , dont la chirurgie unk5 et la unk6 ,
permettront de résorber l' arriéré .
trans En outre , 2600 opérations supplémentaires , dont la chirurgie orthopédiques et la
+unk cataracte , permettront de résorber l' arriéré .
• Predict well long-distance alignments.
Sample translations
src
This trader , Richard Usher , left RBS in 2010 and is understand to have be given
leave from his current position as European head of forex spot trading at
JPMorgan .
ref
Ce trader , Richard Usher , a quitté RBS en 2010 et aurait été mis suspendu de
son poste de responsable européen du trading au comptant pour les devises chez
JPMorgan
Ce unk0 , Richard unk0 , a quitté unk1 en 2010 et a compris qu' il est autorisé à
trans quitter son poste actuel en tant que leader européen du marché des points de
vente au unk5 .
Ce négociateur , Richard Usher , a quitté RBS en 2010 et a compris qu' il est
trans
autorisé à quitter son poste actuel en tant que leader européen du marché des
+unk
points de vente au JPMorgan .
• Translate well long sentences.
Sample translations
src
But concerns have grown after Mr Mazanga was quoted as saying Renamo was
abandoning the 1992 peace accord .
ref
Mais l' inquiétude a grandi après que M. Mazanga a déclaré que la Renamo
abandonnait l' accord de paix de 1992 .
trans
Mais les inquiétudes se sont accrues après que M. unkpos3 a déclaré que la
unk3 unk3 l' accord de paix de 1992 .
trans Mais les inquiétudes se sont accrues après que M. Mazanga a déclaré que la
+unk Renamo était l' accord de paix de 1992 .
• Incorrect alignment prediction: was – était vs. abandonnait.
Conclusion
• Simple technique to tackle rare words:
– Applicable to any NMT (+2.0 BLEU improvements).
– State-of-the-art result in WMT’14 English-French.
• Future work:
– More challenging language pairs.
Thank you!
Auteur
Документ
Catégorie
Без категории
Affichages
4
Taille du fichier
744 Кб
Étiquettes
1/--Pages
signaler