mamot.fr is one of the many independent Mastodon servers you can use to participate in the fediverse.
Mamot.fr est un serveur Mastodon francophone, géré par La Quadrature du Net.

Server stats:

3.3K
active users

#texttospeech

49 posts9 participants0 posts today

Here is my latest creation. I got this idea because in the dectalk archive in the vocalwriter directory, there is a cover of the original forever Young song by Alphaville. However, there's a new song by Ava Max and Alphaville, which released late last year, and I thought, instead of having the voice of Alphaville singing, we'll have that vocalwriter voice singing with her. #TextToSpeech #SpeechSynthesizer #PopMusic #singingSynthesizer

00:00/02:37

Favorite thing lately is finding an article I wish were in podcast form, saving the text to a .txt file, then having TTS Util use RH Voice to convert the file into an audio reading, and listen to my own little robotic FOSS nanny read me the stories I want to hear in my headphones as I do yardwork.

TTS: Informatiker Thorsten Müller stellt seine KI-gestützte Sprachausgabe “Thorsten-Voice” der Allgemeinheit kostenlos zur Verfügung. Sie liest Texte nicht nur neutral, sondern auch wütend, betrunken oder im hessischen Dialekt vor. Ein Beitrag zur Barrierefreiheit oder riskante Preisgabe persönlicher Identität?
#TextToSpeech #Barrierefreiheit #KünstlicheIntelligenz
netzpolitik.org/2025/text-to-s

netzpolitik.org · Text-to-Speech: Dieser Mann hat seine Stimme verschenktThorsten Müller hat eine KI-gestützte Sprachausgabe entwickelt, die jeder Mensch frei nutzen darf. Müllers Stimme liest Texte nicht nur neutral, sondern auf Wunsch auch wütend, betrunken oder hessisch vor.

Northeastern University: Northeastern researchers develop AI app to help speech-impaired users communicate more naturally. “Computer science professors Aanchan Mohan and Mirjana Prpa are developing an AI-integrated app that will give speech-impaired users access to a range of communication tools on their phones: speech recognition, text, whole-word selection, emojis and personalized […]

https://rbfirehose.com/2025/03/22/northeastern-university-northeastern-researchers-develop-ai-app-to-help-speech-impaired-users-communicate-more-naturally/

ResearchBuzz: Firehose | Individual posts from ResearchBuzz · Northeastern University: Northeastern researchers develop AI app to help speech-impaired users communicate more naturally | ResearchBuzz: Firehose
More from ResearchBuzz: Firehose

How MS Edge’s Immersive Reader Helps Me Slow Down

We all probably know the drill of a typical workday: back-to-back meetings, side conversations in team chats about some other topics, drafting & scanning emails, creating Jira issues, and juggling multiple project threads. The sheer volume of information coming in such a short time can be challenging.

Normally, this isn’t an issue for me. But sometimes I find myself struggling to read long texts in the middle of these high-intensity stretches. Not because I lack the time, but because my mind is already racing ahead to the next thing. I can’t seem to slow it down. This is annoying and, to be honest, a little frightening, because I realise that my mind is in a very short-cycle mode – clear evidence that I’m under stress.

Over time, I’ve found a simple trick that helps: Microsoft Edge’s Immersive Reader mode. I don’t just use it to declutter the according web page. I let the browser read the text out loud to me.

Yes, that’s right! I hit the play button, lean back, and keep my hands off the mouse and keyboard to avoid getting distracted by other tabs or windows.

It forces me to slow down and listen instead of skimming the whole page. It eliminates the temptation to jump between paragraphs or skim entire sections. It enforces a slower pace that I have to accept. At first, it’s a bit of a struggle – but I’ve come to realize: it helps me to calm down a bit.

If you ever feel overwhelmed by the sheer speed of work, maybe give it a try. Sometimes, all we need is a different approach to regain control.

locked.de/how-ms-edges-immersi
#ImmersiveReader #MentalLoad #MicrosoftEdge #StressRelief #TextToSpeech #WorkStress

The IT Blog · How MS Edge’s Immersive Reader Helps Me Slow DownWe all probably know the drill of a typical workday: back-to-back meetings, side conversations in team chats about some other topics, drafting & scanning emails, creating Jira issues, and juggling multiple project threads. The sheer volume of information coming in such a short time can be challe

Here's an audio file of Vocalwriter singing yellowribbon. Accompanying that is the RVC version of Mac Fred. Dane originally made an audio file of just Mac Fred singing it. He didn't think it was the best he could've done, but I thought it was better than nothing. Luckily I downloaded it from his mastodon before he got banned from his account. I thought it would be great to have the RVC version of Fred singing along with the Vocalwriter synthesizer. The only thing that could've probably been better is if I added just a little more reverb on Fred, but I think it's really nice. @jaybird110127 If the Dectalk archive was still a thing, I would have put this file up because it's a remake of the original. #TextToSpeech.

00:00/02:52

Spark-TTS: Text-2-Speech Model Single-Stream Decoupled Tokens [pdf] — arxiv.org/abs/2503.01710
#HackerNews #SparkTTS #TextToSpeech #AI #DecoupledTokens #MachineLearning

arXiv.orgSpark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech TokensRecent advancements in large language models (LLMs) have driven significant progress in zero-shot text-to-speech (TTS) synthesis. However, existing foundation models rely on multi-stage processing or complex architectures for predicting multiple codebooks, limiting efficiency and integration flexibility. To overcome these challenges, we introduce Spark-TTS, a novel system powered by BiCodec, a single-stream speech codec that decomposes speech into two complementary token types: low-bitrate semantic tokens for linguistic content and fixed-length global tokens for speaker attributes. This disentangled representation, combined with the Qwen2.5 LLM and a chain-of-thought (CoT) generation approach, enables both coarse-grained control (e.g., gender, speaking style) and fine-grained adjustments (e.g., precise pitch values, speaking rate). To facilitate research in controllable TTS, we introduce VoxBox, a meticulously curated 100,000-hour dataset with comprehensive attribute annotations. Extensive experiments demonstrate that Spark-TTS not only achieves state-of-the-art zero-shot voice cloning but also generates highly customizable voices that surpass the limitations of reference-based synthesis. Source code, pre-trained models, and audio samples are available at https://github.com/SparkAudio/Spark-TTS.

MIT Technology Review: A woman made her AI voice clone say “arse.” Then she got banned.. “Joyce doesn’t use her voice clone all that often. She finds it impractical for everyday conversations. But she does like to hear her old voice and will use it on occasion. One such occasion was when she was waiting for her husband, Paul, to get ready to go out. Joyce typed a message for her voice […]

https://rbfirehose.com/2025/03/01/mit-technology-review-a-woman-made-her-ai-voice-clone-say-arse-then-she-got-banned/