Amazon researcher points out that training of big language models needs to be wary of data pitfalls

Researchers at Amazon warn of the need to be wary of data traps during the training process of large language models, Techradar reports. They point out that there is currently a large amount of content on the web that is generated by machine translation, and that this low-quality content can be a problem for the training process.

Researchers have found that a large amount of Web content is often translated into multiple languages, much of it the product of machine translation. This is particularly prevalent in lesser-resourced languages, where a significant portion of web content originates from machine translation.

This may result in trained large language models producing more disfluent and illusory text. In addition, selection bias also suggests that the quality of the data may be lower even before machine translation errors are taken into account.

As a result, the Amazon researchers cautioned that special attention should be paid to the quality and source of data during the training of big language models to avoid falling into data traps. They suggest using more accurate data screening methods and emphasize the importance of proper cleaning and preprocessing of data during the training process. Through these measures, the quality and accuracy of big language models can be better improved, thus providing better services and experiences for users.

This article comes from users or anonymous contributions, does not represent the position of Mass Intelligence; all content (including images, videos, etc.) in this article are copyrighted by the original author. Please refer to this site for the relevant issues involvedstatement denying or limiting responsibilityPlease contact the operator of this website for any infringement of rights (Contact Us) We will handle this as stated. Link to this article: https://dzzn.com/en/2024/3150.html

Like (0)
Previous February 7, 2024 am10:36
Next February 7, 2024 at 2:41 pm

Recommended

Leave a Reply

Please Login to Comment