Author(s): Towards AI Editorial Team Originally published on Towards AI. What happened this week in AI by Louie We are glad to say this was a week for Open-Source AI and small LLMs, with the release of LLama 3 by META and Microsoft’s announcement of Phi-3. LLama 3 is a big win for open-source and cheap and fast smaller models, but it has some limitations. The company chose to focus the model on text format, English language, and a shorter context window (8k). LLama 3 is a very similar model architecture to LLama 2 — the key difference with v3 is a more intelligent and aggressive training data filter (including the use of llama 2 as a data classifier), 7x more data (now a massive 15 trillion tokens) and improved and scaled use of human feedback in fine-tuning. The breakthrough is huge jumps in model capabilities and benchmark scores for small model formats (8bn and 70bn parameters) and huge jumps in capabilities of the best open-source models. The speed advantage of these smaller models will be particularly important for agent workflows where latency per call can stack up. LLama 3 8B and 70B models can be run at home or fine-tuned to specific use cases. They can also be accessed on the cloud, such as on Together.ai, for $0.2 and $0.9 per million tokens, respectively, relative to GPT-3.5-Turbo and GPT-4-Turbo at an average (using 3–1 input vs output) of $0.75 and $15. Grok also offers LLama 3, with 70B at an average $0.64 cost per million tokens with faster inference speed. With LLama 3, we think the biggest gains relative to existing models are likely coming from better training data filtering. META also chose to push hard on training data quantity relative to model parameter size. This is a sub-optimal choice for training cost vs. intelligence (very far from Chinchilla optimal, and more intelligence per unit of training compute would have come from extra parameters rather than extra training tokens). However, the choice is geared towards improved inference costs, creating a smarter, smaller model that will be cheaper to run. Microsoft’s release of Phi-3 3.8B, 7B, and 14B has even more impressive benchmark scores relative to model size. The models were trained on highly filtered web data and synthetic data (3.3T to 4.8T tokens) and traveled further along the path of data quality prioritization. We await more details on the model release, real-world testing, and whether it is fully open source. Current costs and key KPIs of leading LLMs Source: Towards AI, Company websites. Why should you care? When choosing the best LLM for your application, there are many trade-offs and priorities to choose between. Superior model affordability and response speed generally come together with smaller models. At the same time, intelligence, coding skills, multi-modality, and larger context lengths are usually things you pay more for with larger models. We think LLama 3 and Phi-3 will change the game for smaller, faster, cheaper models and will be a great choice for many LLM use cases. Particularly given that it is open-source and flexible, it can be fine-tuned and tailored to specific use cases. It is incredible how far we have come with LLMs in less than two years! In August 2022, the best model available was da-Vinci-002 from OpenAI for $60 per million tokens, scoring 60% on the MMLU test (16k questions across 57 tasks with human experts at 89.8%). Now, Lllama 3 8B costs an average of $0.2 or 300x cheaper while scoring 68.4% MMLU. The most capable models (GPT-4 & Opus) are now at 86.8% on MMLU while multimodal and have 50–100x larger context length. Now, there are a large number of models that are competitive for certain use cases. We expect this to accelerate innovation and adoption of LLMs even further. – Louie Peters — Towards AI Co-founder and CEO Hottest News 1.FineWeb: 15 Trillion Tokens of High-Quality Web Data The FineWeb dataset consists of over 15 Trillion tokens of cleaned and deduplicated English web data from CommonCrawl between 2013 and 2024. Models trained on FineWeb outperform RefinedWeb, C4, DolmaV1.6, The Pile, and SlimPajama. It is accessible on HuggingFace. 2. Meta Introduced Meta Llama 3 Meta has launched Llama 3, the newest addition to its Llama series, accessible on Hugging Face. It is available in 8B and 70B versions, each with base and instruction-tuned variants featuring enhanced multilingual tokenization. Llama 3 is designed for easy deployment on platforms like Google Cloud and Amazon SageMaker. 3. Mistral AI Launched Mixtral 8x22B Mistral unveiled Mixtral 8x22B, an efficient sparse Mixture-of-Experts model with 39B active out of 141B total parameters. It specializes in multilingual communication, coding, and mathematics and excels in reasoning and knowledge tasks. The model has a 64K token context window, is compatible with multiple platforms, and is available under the open-source Apache 2.0 license. 4. Adobe To Add AI Video Generators Sora, Runway, and Pika to Premiere Pro Adobe announced that it aims to update Premiere Pro to add plug-ins to emerging third-party AI video generator models, including OpenAI’s Sora, Runway ML’s Gen-2, and Pika 1.0. With this addition, Premiere Pro users would be able to edit and work with live-action video captured on traditional cameras alongside and intermixed with AI footage. 5. Google’s New Chips Look To Challenge Nvidia, Microsoft and Amazon Google has unveiled the Cloud TPU v5p, an AI chip that delivers nearly triple the training speed of its predecessor, the TPU v4, reinforcing its position in AI services and hardware. Additionally, Google introduced the Google Axion CPU, an Arm-based processor that competes with similar offerings from Microsoft and Amazon, boasting a 30% performance improvement and better energy efficiency. Five 5-minute reads/videos to keep you learning 1.OpenAI or DIY? Unveiling the True Cost of Self-Hosting LLMs The article examines the financial considerations of leveraging OpenAI’s API versus self-hosting LLMs. It highlights the trade-off between the greater control over data achieved through self-hosting, which comes with higher costs for fine-tuning and maintenance, and the […]
↧