AI Voice Clones Everywhere

PLUS: Exciting Research, Product Launches, and Big Company Announcements

AstroFeather Issue #6

Welcome to the 6th issue of the AstroFeather AI newsletter!

This week was an eventful one, with headlines about voice clones (voice clones...everywhere), product launches (including Runway's iOS app that lets you turn videos into Claymation art and other styles), big corporate announcements (including PricewaterhouseCoopers' $1 billion investment in AI), and an open source project that lets you edit video objects in real time!

I hope you enjoy reading this week’s updates and if you have any helpful feedback, feel free to respond to this email or contact me directly at my LinkedIn Profile - here.

Thanks - Adides Williams, Founder @ AstroFeather

In today’s recap (10 min read time):

  • Attack of the (AI) Voice Clones.

  • Exciting Research and Open-Source Developments (HuggingChat, Bark, and TAM).

  • Product Previews and Launches (Apple, Runway, Microsoft, Yelp).

  • Company Announcements (OpenAI, Replit, Pinecone, PwC).

Must-Read News Articles and Updates

Update #1. Attack of the (AI) Voice Clones.

The “Kempelen” speaking machine. Image: Google - Arts and Culture.

With all the recent talk about AI-generated voice clones, you might be tempted to think that speech synthesis machines capable of generating artificial voices and human speech are new inventions. However, it's possible that the history of speech synthesis can be traced back to what is widely considered to be the first speaking machine, invented by Wolfgang von Kempelen around 1770 - 1780. Kempelen's Speaking Machine was a manually operated box-like device that used bellows, pipes, and a rubber mouth and nose to simulate a few recognizably human utterances.

Since then, and with advances in generative AI (GenAI), voice synthesis has evolved from vague human utterances and speech-like sounds to voice clones that are often indistinguishable from the real voices they're modeled after. Headline-grabbing platforms such as ElevenLabs' Prime Voice AI and Microsoft's VALL-E can create audio of anyone saying anything while retaining the speaker's emotional tone, intonation, pitch, and other speech characteristics.

While the advances in AI voice technology are exciting, it's worth examining how the growing popularity, use, and proliferation of voice clones are beginning to impact some industries and society at large (see below):

ArtemisDiana. Image: iStock / Getty Images

Industry: Fraud Detection and Prevention

Observation - Increased Number of Scams: There has been a recent increase in the number of voice clone scams, with the Federal Trade Commission (FTC) issuing a consumer alert urging people to be wary of phone calls using AI-generated voice clones.

Societal Impact: In one case reported by local news station, Arizona's Family (AZFamily), a mother received a call from an unknown number with the crying voice of her 15-year-old daughter, claiming she had been kidnapped.

In another incident, the Washington Post reported that an elderly Canadian couple was tricked into sending ~$15,000 USD to scammers who used an AI voice clone of the couple's son to ask for help.

Finally, NBC News highlighted an incident in which a father (similarly) received a call from an unknown number with the distressed voice of his daughter asking for help, followed by a person demanding a ransom in exchange for his daughter.

Guidance: The FTC has provided guidance to help people understand, prepare for, and respond to these enhanced “family emergency schemes.”

Grimes Performing Live. Image: Matt Cowan/Getty Images for Coachella

Industry: Music

Observation - The Rise in Popularity of AI-generated Music: In AstroFeather Newsletter Issue #5, I reviewed the recent rise in popularity of AI-generated music using cloned voices of top-selling artists. AI-generated songs such as "Heart on My Sleeve" (featuring cloned voices of Drake and The Weeknd), an AI-generated cover of Beyonce's latest hit "Cuff It" (featuring cloned vocals of Rihanna), and an entire AI-generated album called "AISIS - The Lost Tapes" (in the style of Oasis and frontman Liam Gallagher), and several more have collectively garnered millions of views and streams on various online platforms.

Societal Impact: Perhaps unsurprisingly, the music community's reaction to AI-generated songs has been mixed. In general, music fans have been supportive of the realistic-sounding AI songs. However, the world's largest music company, Universal Music Group (UMG), has taken a harder stance, calling AI-generated music fraudulent.

As for music artists, Drake recently expressed his dissatisfaction with an AI-generated song, but Grimes has publicly offered a 50% royalty split on any AI-generated song featuring her voice, and Liam Gallagher has cheered on his AI clone (and the AI Oasis album) by stating, "I sound mega.”

Minare Koda from Wave, Listen to Me! Image: Funimation via Kotaku

Industry: Voice Acting and Voice-Over

Observation - Cloned Voices for Sale: The public release of ElevenLabs' Prime AI Voice platform was quickly followed by intense criticism after malicious actors took advantage of its (then) free tier to generate voice deepfakes of famous actors, as well as popular TV and anime characters, which were later traded and sold on various forums.

Societal Impact: Top voice actors including Jennifer Hale (Commander Shepard from the Mass Effect series), Steve Blum (English voice of Cowboy Bebop's Spike Spiegel), and Sean Schemmel (English voice of Dragon Ball Z's Goku) were among the first to speak out against sites that began hosting cloned voices for anyone to purchase.

In a series of Twitter posts, Hale, Blum, and Schemmel urged their followers to boycott websites that illegally copy and sell their voices, emphasizing that the voice clones were created and distributed without their consent.

In a separate incident, Remie Michelle Clarke (the Irish voice of Microsoft Bing) recently shared with the Washington Post her ongoing shock and concern that her voice is showing up on "text-to-speech websites" where customers can pay to use her voice for any purpose, including advertising and YouTube audio.

Additional Links for “Attack of the (AI) Voice Clones”:

Update #2. Exciting Research and Open-Source Developments (HuggingChat, Bark, and TAM).

HuggingChat Interface. Image: Screenshot by THE DECODER

Free ChatGPT Alternative with HuggingChat: Hugging Face has launched an open source chatbot called HuggingChat, which is intended to be an alternative to OpenAI's ChatGPT. HuggingChat generates natural language text or code on demand and is based on OpenAssistant, an open-source competitor to ChatGPT. While HuggingChat is free to use, and the code is fully accessible, inappropriate requests are rejected and there is (currently) no storage of chat data or user accounts.

It should be noted that HuggingChat is currently limited in the quality of its output, as the model has only been instruction-tuned and not improved by reinforcement learning with human feedback (RLHF). According to HuggingFace, their goal is to eventually make all high-quality chat models available through one hub.

Dog with Headphones. Image: Midjourney Prompted by THE DECODER

Turning Text into Audio with Bark: Suno AI recently released Bark, a text-to-audio model that can generate highly realistic, multilingual speech and a variety of audio types, including music and sound effects. The model is also capable of producing voice clones that preserve tone, emotion, and pitch, as well as non-verbal communication such as laughing, sighing, and crying.

Bark supports multiple languages, including English, German, Spanish, French, Japanese, and Hindi, and can automatically detect the language from input text. It should be noted that the English quality of Bark is currently the best available, but other languages are expected to improve as the technology scales.

Video Object Segmentation with TAM. Image: Mingqi Gao

Select, Track, and Remove any Item from Video with Track Anything: When researchers at Meta AI released their Segment Anything Model (SAM), it was greeted with excitement for its ability to identify and "cut out" any object in any image with a single click. However, while SAM worked well for images, researchers found that it performed poorly at consistent segmentation in video.

In response, a research team from the Visual Intelligence and Perception Lab at SUSTech in China developed the Track Anything Model (TAM), an extension of Meta's SAM that allows users to interactively track video objects in real time. With TAM, users can track multiple objects in a video simultaneously, or isolate a single object to modify or remove it from the scene entirely. According to the research team, TAM can be used for:

  • Isolating and tracking video objects in real time across scene changes.

  • Editing or removing identified video objects in a scene.

  • Filling in missing regions of a video scene (inpainting).

Additional Links for “Exciting Research and Open-Source Developments”:

Update #3. Product Previews and Launches (Apple, Runway, Microsoft, Yelp).

Illustration of Apple Logo. Image: Image: The Verge

Apple Quartz (AI Health): Apple is reportedly working on an AI health coaching service called Quartz, which will use data collected from the Apple Watch to develop customized coaching programs for users. Teams from Apple's Health, Siri, and AI departments are said to be involved in the project. In addition to the new service, Apple's Health app is expected to get tools for tracking emotions and managing vision conditions like nearsightedness, while a new iPad version of the app will be released later this year. The company is also reportedly working on bringing blood pressure monitoring to the Apple Watch.

Gen-1 iOS App. Image: Runway

Runway Video Generator: AI startup Runway has launched a new iOS app that allows users to create videos from their phones using its Gen-1 video-to-video generative AI model. Users can select presets or upload images to turn their videos into claymation, watercolor art, and paper origami. The app generates four previews for users to choose from and takes between one and two minutes to produce the final product. Free and premium plans are available through in-app purchases.

Designer Interface. Image: Microsoft

Microsoft Designer: Microsoft’s latest AI graphic design tool was recently updated with new features to streamline the creation of social media posts. Designer can now generate captions and hashtags for social media posts, as well as resize designs to fit up to 20 different social media layout sizes. The app's new AI toolset also includes text-to-image capabilities that allow users to generate images using text prompts, as well as animation capabilities to apply text transitions and animated backgrounds.

Yelp App. Image: Yelp

Yelp AI: Yelp is reportedly rolling out AI-driven features, such as Yelp Guaranteed, which provide insights on local businesses and helps users find the right business for their needs, and a Surprise Me button that suggests highly rated nearby restaurants. Yelp has also introduced a new feature that allows reviewers to add videos to their reviews. The high-definition, short-form videos can be uploaded on iOS and Android and are up to 12 seconds long, along with text and photos to document the experience.

Additional Links for “Product Previews and Launches (Apple, Runway, Microsoft, Yelp)”:

Update #4. Company Announcements (OpenAI, Replit, Pinecone, PwC).

OpenAI Illustration. Image : Justin Jay Wang / OpenAI

OpenAI - ChatGPT Data Management: In a recent blog post, OpenAI announced new controls for ChatGPT (located in the ChatGPT settings) that allow users to opt out of providing their conversation history as data for training AI models. Users can now disable chat history and export it for local storage. The conversations with disabled chat history will not be used for model training, nor will they appear in the history sidebar. OpenAI will keep these conversations internally for 30 days before deleting them permanently.

Curiously, the option to "opt-out" of model training is not enabled by default for all users. To this end, OpenAI also states that they are working on a "ChatGPT Business" subscription that will opt users out of model training by default, but have not yet listed a launch date for this option (or subscription price).

Replit LLM Performance Comparisons. Image: Replit Developer Day

Replit Funding and LLM Reveal: Replit has raised $97.4 million in funding at a $1.2 billion valuation, led by prominent venture capital firms including Andreessen Horowitz (a16z). Replit has helped users create over 200 million projects on its platform and has a user base of 22.5 million developers. The company is perhaps best known to developers for its Ghostwriter Chat programming assistant, which generates, auto-completes, or transforms code in dozens of programming languages, similar to its competitor, GitHub's Copilot.

Recently, Replit held its first "Developer Day" event and announced a LLaMA-style large language model (LLM) for code generation. Interestingly, the base model (replit-code-v1-3b) is only 2.7 billion parameters in size, was trained on 525 billion tokens of licensed code in 10 days, and supports 20 languages. Another model fine-tuned on Replit data (replit-finetuned-v1-3b) has also been shown to outperform OpenAI's Codex (which powers GitHub Copilot) on some performance benchmarks.

Image: Composite by VentureBeat using Canva Pro and Pinecone Logo

Pinecone Funding: Pinecone, the vector database company that provides long-term storage for large language models (LLMs), has raised $100 million in Series B funding led by Andreessen Horowitz (a16z) at a $750 million valuation. Pinecone's vector database service, which helps developers connect chatbots directly to enterprise data to provide accurate answers (for example), has seen a growth in paying customers across industries, including Gong and Zapier, and is being used to power autonomous AI agents such as Auto-GPT and BabyAGI.

PwC building. Image: Leon Neal/Getty Images

PwC's Big Investment: PricewaterhouseCoopers (PwC) is planning to invest $1 billion over the next three years in generative AI (GenAI) to automate aspects of its tax, audit, and advisory services. PwC plans to develop and integrate GenAI into its own technology stack and client service platforms, as well as advise other firms on how best to use GenAI. The investment includes funding for hiring and training, as well as targeting AI software makers for potential acquisitions.

According to PwC, they will use GenAI tools to write reports, prepare compliance documents, analyze business strategies, identify inefficiencies, and create marketing materials and sales campaigns.

Additional Links for “Company Announcements (Replit, Pinecone, PwC, OpenAI)”:

Thanks for reading this issue of the AstroFeather newsletter!

Be sure to check out the AstroFeather site for daily AI news updates and roundups. There, you'll be able to discover high-quality news articles from a curated list of publishers (ranging from well-known organizations like Ars Technica and The New York Times to authoritative blogs like Microsoft's AI Blog) and get recommendations for additional news articles, topics, and feeds you might enjoy.

See you in the next issue!

Adides Williams, Founder @ AstroFeather (astrofeather.com)

Reply

or to participate.