- AstroFeather Tech Review
- Posts
- 🔥 Advances in AI Video, Product Launches, and Open Source LLMs
🔥 Advances in AI Video, Product Launches, and Open Source LLMs
PLUS: The Music Industry Battles AI Music
AstroFeather AI Newsletter Issue #5
Welcome to the 5th issue of the AstroFeather AI newsletter!
This week was full of attention-grabbing headlines about AI-generated songs from some of the world's best-selling music artists, research advances in computer vision and AI video creation, product launches, and the latest efforts by the open-source community to make state-of-the-art foundation large language models fully open and available to all.
Be sure to check out astrofeather.com (the companion site to this newsletter) for daily trending news and updates! I hope you enjoy reading this week’s updates and if you have any helpful feedback, feel free to respond to this email or contact me directly on LinkedIn - here.
Thanks - Adides Williams, Founder @ AstroFeather
In today’s recap (10 min read time):
The Music Industry vs. AI-Generated Music.
Exciting Research in AI Video Creation.
Announcements and Product Launches.
Open-Source Models Have the Potential to Challenge Commercial Models.
Must-Read News Articles and Updates
1. The Music Industry vs. AI-Generated Music.
Universal Globe. Image: Michael Buckner for Variety
The world's largest music company, Universal Music Group (UMG), has taken a stand against AI-generated music, broadly referring to it as "deep fakes, fraud, and denying artists their due compensation." The harsh words from UMG, which owns labels such as Interscope, Motown, EMI, and Virgin Records, came in direct response to a series of viral songs created using AI voice clones of some of its best-selling artists, which were used to recite AI-generated lyrics in the style of those artists or cover popular songs by other music acts.
Rise in Popularity of AI Music: AI-generated songs featuring uncanny voice clones of several current (and former) UMG-affiliated musical artists, including, Oasis, Kendrick Lamar, Rihanna, and a collaboration between The Weeknd and Drake, have garnered millions of streams since their public release on the Internet.
While fans have mostly embraced the realistic-sounding knockoffs, artists have expressed concern that their sound and style can be easily reproduced and shared without permission, with Drake most recently expressing his dissatisfaction with an AI-generated song circulating on Instagram, calling it "the final straw."
Heart on My Sleeve
Drake and The Weeknd on Stage. Image: Getty
With more than 20 million streams on TikTok, Twitter, Spotify, and Apple Music, Heart on My Sleeve, which features the AI-cloned voices of Drake and The Weeknd exchanging lyrics about actress Selena Gomez, is perhaps the most shared piece of AI-generated music. In response, UMG has asked Spotify and Apple Music to block the use of copyrighted songs for AI training, including the creation of copycat versions of music artists. The company has also reportedly issued takedown requests, resulting in the removal of several versions of the song from all known streaming sites (though some versions can still be found online).
How it works: While the exact generative AI platforms used to create the songs are not known in all cases, many point to a combination of OpenAI's ChatGPT and GPT-4 to generate lyrics and either ElevenLab's Prime AI text-to-speech platform or SoftVC VITS Singing Voice Conversion (SVC) models to clone voices. The resulting AI-generated songs can then be played and shared by millions of users on popular streaming and social networking sites.
Of course, one of the most important questions moving forwards is: how will copyright law deal with the rise of AI-generated music? Meanwhile, a coalition of musicians and artists has launched a "Human Artistry Campaign" to advocate for AI best practices that protect human creativity. The group has outlined seven principles, emphasizing that copyright protection should only be granted to music created by humans.
Additional Links for “The Music Industry vs. AI-Generated Music”:
AstroFeather AI Music News Headlines - Read summaries.
AstroFeather AI Copyright News Headlines - Read summaries.
AstroFeather AI Ethics News Headlines - Read summaries.
2. Exciting Research Advances in AI Video Creation.
NeRF 3D Scene. Image: Barron et al (2023) Zip-Nerf
Converting 2D images to 3D scenes with Zip-NerF: Imagine being able to stitch together a collection of 2D images (or photographs) to create a photorealistic 3D scene that you can "fly" through like a drone-like video. Such a technology would likely have the potential to revolutionize filmmaking, for example, by helping users create realistic 3D scenes quickly and efficiently.
The AI-based technology that makes this possible is called Neural Radiance Fields, or NeRFs. Until recently, it was possible to create 3D NeRF scenes, but with some trade-offs: high-resolution scenes required hours of training time; alternatively, faster turnaround times often resulted in low-quality renderings.
To overcome these challenges, Google researchers combined two techniques (mip-NeRF for high-quality images and a grid-based method called Instant-NGP for faster training times) to create a new model, Zip-NeRF, that enables rapid development of high-quality photorealistic 3D scenes.
A Storm Trooper Vacuuming the Beach. Image: Nvidia
Creating High Quality AI Videos with Video LDMs: Text-to-video generators are quickly becoming the latest advancement in AI technology, with companies like Nvidia, Google, and Runway leading the way. These generators can be used to create videos to enhance social media, educational, and scientific content.
Recently, Nvidia researchers unveiled methods for creating video latent diffusion models (video LDMs) that can convert text descriptions into high-resolution personalized videos. Interestingly, the research team claimed that the methods described in the paper could be used to turn an "off-the-shelf" image generator into a video generator.
They then applied their video LDM approach to the popular image generator, Stable Diffusion, and successfully converted it into a video generator (text-to-video) model capable of producing 4.7-second video clips at resolutions up to 1280x2048 at 24 frames per second (FPS).
[Paper | Video Demos]
DINOv2 Generates Higher-quality Segmentation vs DINO
Advances in Computer Vision with DINOv2: Meta AI has released DINOv2, an open-source AI method for training high-performance computer vision models using self-supervised learning. DINOv2 uses massive datasets to generate general-purpose visual features that perform well across domains without the need for fine-tuning, and eliminates the need for large amounts of labeled data.
Pre-trained on 142 million unlabeled and unannotated photos, the model provides unsupervised learning of high-quality visual features for image categorization, instance retrieval, depth estimation, and more.
The adaptability of DINOv2, including its ability to "learn" directly from images without relying on text descriptions, allows it to be used in various computer vision applications, making it a powerful and versatile way to train AI models.
[Paper | GitHub | Interactive Demos]
Additional Links for “Exciting Research in AI Video Creation”:
Corridor Crew: “Why THIS is the Future of Imagery (and Nobody Knows it Yet)” - Watch video
AstroFeather AI Research Coverage - Read summaries.
3. Announcements and Product Launches.
Humane AI Assistant Demo. Image: TED
Humane: Stealth hardware and software startup Humane has unveiled its standalone wearable device with AI-powered features. The device can act as a personal assistant, respond to voice or gesture commands, and project information onto nearby surfaces. During a TED Talk presentation, company co-founder Imran Chaudhri demonstrated the device's ability to identify objects in the world around it, offer dietary advice, translate languages, and access emails and calendar invites. While the screenless device aims for a "seamless" experience, it remains unclear how it will perform in public situations compared to traditional smartphones.
AI Sundar Pichai Playing Poker. Image: Midjourney prompted by THE DECODER
Google DeepMind: Google Brain and DeepMind have merged to form Google DeepMind, with a focus on developing large-scale, multimodal models. The two teams, many of whose scientists have since left to form their own startups, previously developed influential AI technologies such as Transformer and Deep Reinforcement Learning. Demis Hassabis, the CEO of DeepMind, will lead the new company, while Jeff Dean, the former head of Google Brain, will become its chief scientist. The move is part of a broader realignment at Google to focus on scale and multimodality in the development of artificial intelligence.
Vector Recoloring Demo in Adobe Firefly. Image: Adobe
Adobe: Adobe Firefly has recently been updated with a new AI feature called Illustrator Vector Recoloring, which allows users to quickly adjust the color scheme of vector-based images. The new feature helps users generate different color and palette variations from uploaded Scalable Vector Graphics (SVG) files, allowing users to enter a text description or choose from a list of sample prompts. According to the company, Illustrator Vector Recoloring does not create new images, but simply provides new ways to modify existing images.
Adobe also announced that Firefly, its AI image generator tool, has been updated with video editing capabilities. Using AI, Firefly allows users to create color corrections and enhancements, animated text and motion graphics, matching B-roll footage, broadcast-ready sound effects, custom music, and pre-visualization before shooting. Users can describe the desired look through text prompts, and the AI instantly generates the desired results. Firefly can also analyze a script to automate the process of finding appropriate b-roll footage.
Buildbox AI. Image: Buildbox
Buildbox: The popular no-code game engine, Buildbox, has been enhanced with generative AI (GenAI) capabilities, called Buildbox AI, to increase developer productivity and simplify game development. With Buildbox AI, creators can type in the game idea and the GenAI platform will generate unique characters, game objects and environments for them. The generated assets can then be used directly in Buildbox as well as in other game engines such as Unity and Unreal.
Snapchat AI Chatbot. Image: Snap (modified by TechCrunch)
Snap: Snapchat’s AI chatbot, My AI, is now open to a global audience (of more than 300 million) for free. The chatbot, which is powered by OpenAI's GPT technology, will be able to offer suggestions for places on Snap Map and Lenses, as well as birthday gift ideas, hikes, dinner recipes and more.
Snapchat was recently upgraded with "a new generation of lenses powered by generative AI." Called "Cosmic Lens," these new AR lenses can transform a user and their surroundings into an animated sci-fi scene. Snapchat also announced improvements to its lens carousel ranking algorithms and AR bar, making it easier for users to find relevant lenses. Snapchat joins other social media platforms that are using AI to enhance their AR offerings, such as TikTok, which recently launched its hyper-realistic AI-powered "Bold Glamour" filter.
Additional Links for “Announcements and Product Launches”:
AstroFeather Product Launch Coverage - Read summaries.
4. Open-Source Models Have the Potential to Challenge Commercial Models.
Training Cost for LLMs in Millions of US Dollars. Image: 2023 AI Index Report
Developing and running state-of-the-art large language models (LLMs), such as OpenAI's ChatGPT, Microsoft's Bing Chat, and Google's Bard, is an expensive and time-consuming process. Depending on several factors, including the number of LLM parameters, the target GPU platform, and the size of the engineering team, the cost of training an LLM can range from $8 million to $12 million. As a result, only deep-pocketed research labs and enterprises can afford to develop leading LLMs, and the current competitive landscape will continue to discourage the industry from open-sourcing their models, leaving the most powerful platforms closed behind commercial APIs, limiting customization and use with sensitive data.
Fully open source models may be able to reduce the limitations of closed models if the open source community can improve the output quality of open models to match that of closed models. Thankfully, there has been some progress on this front with the release of the following fully open and semi-open models:
StableLM: Stability AI, the company behind the Stable Diffusion image generator, has released its open-source family of language models called StableLM, which can generate text and code. The current models, which have between 3 and 7 billion parameters, are now available on GitHub for developers to use and customize. The models were trained on a larger version of the open-source Pile dataset, which includes information from a variety of sources, including Wikipedia, Stack Exchange, and PubMed. [HuggingFace | GitHub]
Redpajama: Menlo Park-based startup Together recently announced a collaborative project called RedPajama that's focused on creating a reproducible, fully open, state-of-the-art language model. In a recent blog post, Together listed three key components of the project: high-quality pre-training data, base models to be trained (using the pre-training data), and instruction tuning to ensure that the base models are safe and commercially viable. So far, Together has released the first component, a 1.2 trillion token pre-training dataset following the LLaMA recipe. According to a recent Tweet thread from Together, the plan is to release a suite of trained LLMs "in the coming weeks under the Apache 2.0 license". [HuggingFace | GitHub]
OpenAssistant: OpenAssistant, an open source alternative to ChatGPT, has been released, including models and training data. The OpenAssistant team spent months collecting a "human-generated, human-annotated, assistant-style conversation corpus consisting of 161,443 messages distributed across 66,497 conversation trees, in 35 different languages, annotated with 461,292 quality ratings" with the help of more than 13,500 volunteers, before using this data to refine their large language models (LLMs). The largest LLM is based on the LLaMA model, which has 30 billion parameters. While these models suffer from problems common to large language models, such as hallucinations, they have been shown in studies with human volunteers to approach the performance of ChatGPT's gpt-3.5-turbo model. [HuggingFace | GitHub | YouTube]
MiniGPT-4: MiniGPT-4 is an open-source large language model (LLM) that can accept both images and text as input. It can describe images or answer questions about their contents, generate web page designs from handwritten instructions, and even generate image descriptions for the visually impaired. MiniGPT-4 uses Vicuna, an advanced LLM based on Meta's LLaMA, which follows the Alpaca formula, to handle the generation of detailed image descriptions and the creation of web pages from handwritten designs. The model requires only 10 hours of training on 4 A100 GPUs, and the code, demos, and training instructions are available on Github. [HuggingFace | GitHub | YouTube | Demo]
Additional Links for “Open-Source Models Have the Potential to Challenge Commercial Models”:
AstroFeather Large Language Models (LLMs) News Headlines - Read summaries
Thanks for reading this issue of the AstroFeather newsletter!
Be sure to check out the AstroFeather site for daily AI news updates and roundups. There, you'll be able to discover high-quality news articles from a curated list of publishers (ranging from well-known organizations like Ars Technica and The New York Times to authoritative blogs like Microsoft's AI Blog) and get recommendations for additional news articles, topics, and feeds you might enjoy.
See you in the next issue!
Adides Williams, Founder @ AstroFeather (astrofeather.com)
Reply