A team of researchers from Singapore University of Technology and Design has successfully developed the small but powerful TinyLlama AI model, trained in just 90 days and 3 trillion tokens

Researchers at the Singapore University of Technology and Design (SUTD) are working on an impressive project, and they have made a huge breakthrough by successfully developing a small but powerfulAI model, named TinyLlama. this model uses a compact design that takes up only 550MB of memory. Not only that, but amazingly, the model took only 90 days to complete training on the massive 3 trillion token dataset.

TinyLlama is unique in that it is designed to be used in memory-constrained edge devices, providing these devices with high-performanceartificial intelligence (AI)Solution. As more and more developers are interested in creating smallerAIThe need for models increases as fewer parameters are more optimized for edge devices with limited memory and computing power. Also, smaller models can assist in decoding larger models, as Andrej Karpathy, former Tesla senior AI director, points out.

This TinyLlama project, led by a research assistant at Singapore University of Technology and Design, aims to pre-train a 1.1 billion token Llama model on a 3 trillion token dataset. Despite occupying only 550MB of memory, the team believes its compactness will meet the needs of a wide range of applications, especially those with computational and memory footprint constraints such as real-time machine translation.

TinyLlama's training started on September 1, using 16 A100-40G GPUs, and the team plans to complete the training in just 90 days. So far, the team has successfully completed training on 105 billion tokens.

The model's builders say they are using "the exact same architecture and lexicon" that Meta used to train Llama2, which makes it easy to apply TinyLlama to open-source projects built on Llama.

The TinyLlama team used a three-trillion-token dataset including Cerebras Systems' Slimpajama and StarCoder data, which had been used to train a code generation model, StarCoder.

Once complete, TinyLlama will join a growing number of smaller language models that are used by developers to build a variety of applications. Progress is also being made on EleutherAI's Pythia-1b and Databricks' MosaicML's MPT-1b.

This article comes from users or anonymous contributions, does not represent the position of Mass Intelligence; all content (including images, videos, etc.) in this article are copyrighted by the original author. Please refer to this site for the relevant issues involvedstatement denying or limiting responsibilityPlease contact the operator of this website for any infringement of rights (Contact Us) We will handle this as stated. Link to this article: https://dzzn.com/en/2023/864.html

Like (0)
Previous September 8, 2023 at 11:33 am
Next September 9th, 2023 at 9:02 am

Recommended

Leave a Reply

Please Login to Comment