Author(s): Towards AI Editorial Team Originally published on Towards AI. What happened this week in AI by Louie This was a huge week for new model releases, with Alphafold-3 and GPT-4o both unlocking many new capabilities and AI use cases in very different domains. Deepmind’s Alphafold-3 is a significant update to the Alphafold series of models, which have already been used by 1.8 million scientists. The new model builds on its protein capabilities but can now predict folding patterns and chemical structures across proteins, DNA, RNA, ligands (often used as small molecule drugs), ions, and antibodies. Not only that — it can now also model their interactions with each other — key in biomedical research tasks such as modeling if drug candidates bind to target proteins. The model is a “Pairformer” (custom transformer with triangular attention) combined with a new diffusion model — a further step in the dominance of transformer and diffusion model architectures in machine learning. We were also very impressed with the release of GPT-4o (omni) by OpenAI. While not yet the eagerly anticipated GPT-5 — OpenAI instead chose to first release a faster, cheaper (and likely much smaller, at least in terms of active parameters), natively multimodal model. GPT-4o can directly input, plan, and output speech (previously, it relied on separate models for each of these, which added significant latency) and now can interact with speech, images, and video in real-time. This unlocks many new use cases, such as real-time translation and much more natural speech including recognizing and conveying emotions. Unlike speech, vision capabilities were already previously integrated directly into GPT-4V. However, they were still not “native” as they were rumored to be added to the foundation model via fine-tuning. Now, we assume all modalities for 4o are learned during pre-training. The model makes good progress on the SOTA on many benchmarks, particularly multimodal. Despite now being the most performant model, it has been made available for free within ChatGPT (replacing GPT-3.5 Turbo) and available via API at half the cost of GPT-4 Turbo. OpenAI noted double the response speed and 5x the rate limits vs 4-Turbo; however, many are experiencing response speeds closer to 5x faster and we think the model is likely significantly less than half the compute cost to OpenAI. Full new multimodal features are not yet available. Why should you care? AI use in drug development or drug target identification has shown very positive early results. It can already cut development time down to ~30 months from initiation to phase 1 trial (vs 60 months for normal drugs), and a recent study measured an 80–90% phase 1 success rate for AI drugs (vs 40–65% for normal drugs). Phase 2 data is limited, and the success rate was flat at about 40%. Despite these positive results — there are still only ~70 AI drugs in clinical trials relative to many thousands overall, and none has yet passed phase 3. While Alphafold-3 alone won’t find a new drug — (separate models need to be used to identify new drug targets, for example, more data needs to be collected for many areas of biology, and many lab experiments are still needed to verify and iterate predictions) — we think it could potentially be the catalyst for a “chatgpt” moment for AI’s use in drug design. AI tools are now much more accessible, and we hope many more biology “foundation” models” will be developed and made available. A limited version of Alphafold-3 is accessible for free, but the full model weights are expected to be released in the next six months. – Louie Peters — Towards AI Co-founder and CEO Hottest News 1.OpenAI Introduced GPT4o OpenAI announced GPT-4o, its latest AI model with text, vision, and audio capabilities. GPT-4o, with an “o” for “Omni,” will be accessible to all ChatGPT users, including those on the free version. It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages. It is also much faster and 50% cheaper in the API. 2. DeepMind Releases AlphaFold 3 AlphaFold 3 is an advanced AI model by Google DeepMind and Isomorphic Labs capable of accurately predicting biomolecular structures and interactions. It is a significant advancement over prior models and enhances scientific research and drug development. It is available globally through the AlphaFold Server. 3. Microsoft Allegedly Developing MAI-1, a Competing Model to OpenAI’s GPT-4 Microsoft is currently working on MAI-1, a 500 billion parameter AI model, aiming for a competitive edge in the AI industry and moving towards greater independence in AI development. 4. Google I/O 2024 Will Be All About AI Again Google is preparing to hold its annual Google I/O developer conference today, and naturally, it will be all about AI. A lot of the keynote will probably cover how Google is fusing Search and generative AI. The company has been testing new search features like AI conversation practice and image generation for shopping and virtual try-ons. You can catch it on Google’s site. 5. gpt2-Chatbot Confirmed As OpenAI The gpt2-chatbot that appeared in the LMSYS arena was confirmed to be an OpenAI test model after a 429 rate limit error revealed its connection to OpenAI’s API. Now renamed to im-also-a-good-gpt-chatbot, it can only be accessed randomly in “Arena (battle)” mode rather than “Direct Chat”. Five 5-minute reads/videos to keep you learning 1. The Next Big Programming Language Is English GitHub Copilot Workspace offers an AI-powered coding platform that enables users to write code using conversational English, streamlining the process, particularly for straightforward tasks. In this article, the author tests the Copilot Workspace and implements the code. 2. Everything About Long Context Fine-tuning This article examines the difficulties of fine-tuning large language models for extended contexts over 32,000 tokens, such as high memory utilization and processing inefficiencies. It presents solutions like Gradient Checkpoint, LoRA, and Flash Attention to mitigate these issues and enhance computational efficiency. 3. How LLMs Know When to Stop Generating? This article explains how LLMs know […]
↧