Author(s): Towards AI Editorial Team Originally published on Towards AI. What happened this week in AI by Louie After a year of near-weekly significant model releases and AI progress in terms of capabilities and adoption, the year finished with a focus on the legal consequences of this AI adoption. The New York Times has sued Microsoft and OpenAI over copyright infringement, alleging that the companies are liable for substantial financial damages, potentially in the billions of dollars. The New York Times is seeking compensation and the destruction of any chatbot models and data that may have used their copyrighted material. This has quickly become the most prominent legal case in the ongoing debate on the intersection of AI technology and intellectual property rights. Major AI breakthroughs over the past two years have been driven, in particular, by transformer-based large language models, diffusion models, and, more recently, graph neural networks. Training these models can be extremely data-heavy and generally use huge datasets scraped from the internet, often containing copyrighted content. While LLMs generally don’t memorize their training set in full, in some instances, commonly repeated content is memorized in full. The model’s outputs are also very open-ended — and can be used to create content that repeats copyrighted content in full or is inspired by copyrighted content. The existing legal system was not designed for Generative AI content, and opinions vary strongly on how it should be adapted to deal with it. Training and inference of LLMs do not copy and reuse content in a traditional sense. Still, neither is it equivalent to human reading and gaining “inspiration” from other people’s content. Content on the internet has also generally been available to big tech companies’ web crawlers for indexing and search purposes; however, use for model training purposes is clearly not an equivalent use case. Different people can interpret these differences differently, leading to intense debate, and we expect varied regulations and legal decisions by jurisdiction. For example, earlier this year, Japan confirmed it would not enforce copyrights on data used in AI training. Why should you care? The resolution of these AI copyright questions can have major consequences for LLMs’ quality, cost, and pace of progress going forward, as well as for the livelihoods of existing copyright owners and human content creators. We think it is important that content that took time, money, and expertise to produce can still get rewarded in a world of Generative AI content. But there are also many ways new copyright laws and legal interpretations could be overly cumbersome and hold back AI capabilities unnecessarily. The difficulty is getting the balance right. In 2024, we should start to see how this plays out. – Louie Peters — Towards AI Co-founder and CEO Hottest News NY Times Sues OpenAI, Microsoft for Infringing Copyrighted Works The New York Times sued OpenAI and Microsoft, accusing them of using millions of the newspaper’s articles without permission to help train chatbots to provide information to readers. The newspaper’s complaint accused OpenAI and Microsoft of trying to “free-ride on The Times’s massive investment in its journalism.” 2. Japan Goes All In: Copyright Doesn’t Apply to AI Training Japan’s government recently reaffirmed that it will not enforce copyrights on data used in AI training. The policy allows AI to use any data “regardless of whether it is for non-profit or commercial purposes, whether it is an act other than reproduction, or whether it is content obtained from illegal sites or otherwise.” 3. AI-Created “Virtual Influencers” Are Stealing Business From Humans Virtual influencers are becoming popular in marketing for their controlled branding and predictability. Despite ethical and transparency concerns, including issues of sexualization similar to human influencers, their distinct narratives drive brand partnerships as the industry navigates consumer trust and ethical standards. 4. Nvidia Releases Slower, Less Powerful AI Chip for China Nvidia has unveiled a new gaming processor that can be sold in China while complying with US export rules. However, the new version performs around 11% lower than the original 4090 chip, first released in late 2022. It also has fewer processing subunits, which help to accelerate AI workloads. 5. GPT and Other AI Models Can’t Analyze an SEC Filing, Researchers Find Researchers from a startup, Patronus AI, found that large language models, similar to ChatGPT, frequently fail to answer questions derived from SEC filings. Patronus AI notes a 79% accuracy issue, leading to errors and unresponsiveness. To enhance financial AI, they created FinanceBench, a testing set from SEC filings to boost AI performance in the financial sector. Five 5-minute reads/videos to keep you learning AI in 2023, in 13 Minutes A lot happened in 2023 in big tech and the AI research community. This video is a recap of the year covering developments like GPT-4 and GPT-4Vision from Open AI, Meta’s Llama 2, Google’s Bard and Gemini, Stability AI’s Stable Video Diffusion, Grok, and more. 2. Flash Attention: Underlying Principles Explained Flash Attention revolutionizes Transformer efficiency, optimizing computation and memory use, promising faster AI processing with reduced memory needs. This article explains the underlying principles of Flash Attention, illustrating how it achieves accelerated computation and memory savings without compromising the accuracy of attention. 3. LangChain State of AI 2023 LangChain’s analysis reveals growing retrieval integration in LLMs, with OpenAI and Hugging Face leading the field. It highlights the significance of specialized databases and embedding generation, underscoring the industry’s evolving preferences and technological advancements. 4. 2023, the Year of Open LLMs 2023 saw increased interest in Open LLMs, with a shift towards efficient, smaller models like LLaMA for their performance impact. The year marked the prevalence of decoder-only architectures and conversational AI, with fine-tuning methods like Instruction Fine-Tuning and RLHF standardizing model customization. 5. An AI Haunted World Recent advances in AI have enabled the use of sophisticated models like ChatGPT on personal devices. Companies such as Mistral are creating open-source AI that can be tailored to specific user needs, which democratizes AI technology beyond large tech firms. Repositories & […]
↧