Improving Deep Transformer with Depth-Scaled Initialization and Merged Attention

Summary

This is a publication. If there is no link to the publication on this page, you can try the pre-formated search via the search engines listed on this page.

Authors: Rico Sennrich

Journal title: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing; all authors: Rico Sennrich; Biao Zhang; Ivan Titov

Journal number: 12

Journal publisher: ACL

Published year: 2019

DOI identifier: 10.5167/uzh-176330