La Quadrature du Net @LaQuadrature

49 posts9 participants0 posts today

**Bryce Belcher** @Blindy@convo.casa · 4d

Here is my latest creation. I got this idea because in the dectalk archive in the vocalwriter directory, there is a cover of the original forever Young song by Alphaville. However, there's a new song by Ava Max and Alphaville, which released late last year, and I thought, instead of having the voice of Alphaville singing, we'll have that vocalwriter voice singing with her. #TextToSpeech #SpeechSynthesizer #PopMusic #singingSynthesizer

00:00/02:37

**Red the Bean** @resl@mas.to · 6d

Red the Bean @resl@mas.to

Favorite thing lately is finding an article I wish were in podcast form, saving the text to a .txt file, then having TTS Util use RH Voice to convert the file into an audio reading, and listen to my own little robotic FOSS nanny read me the stories I want to hear in my headphones as I do yardwork.

#FOSS #TTS #TextToSpeech

Replied in thread

**Elias Probst** @eliasp@mastodon.social · Mar 31 *

Mar 31 *

Elias Probst @eliasp@mastodon.social

@ToniBarth #KDE's text editing framework had #TTS support for a long time and it was recently improved to be more accessible via the context menu:
https://invent.kde.org/frameworks/ktexteditor/-/merge_requests/797

A very powerful and versatile editor based on this framework is #Kate:
https://kate-editor.org/

#KateEditor #TextToSpeech

**JimBSR** @jimlee@sfba.social · Mar 24

Mar 24

JimBSR @jimlee@sfba.social

The quality of Text-to-Speech has been improved a lot. The voice is much realistic and conformable.
https://creators.spotify.com/pod/show/jim-bsr/episodes/Where-to-Sell-Used-GPUs-Graphics-Cards-e2jotfd

Spotify for CreatorsWhere to Sell Used GPUs Graphics Cards? by Green TechnologyIn this talk, we will discuss where to sell used graphics cards and GPUs. This will be very useful for gamers and AI startups ( small business owners), as it will teach you how to buy and sell used GPUs online. The content is based on this article Best Places to Sell GPUs, from BuySellRam.com, where you may sell used RAM as well.

#texttospeech #ai #audio

**Global Threads** @globalthreads@mastodon.social · Mar 23

Mar 23

Global Threads @globalthreads@mastodon.social

AI
OpenAI Unveils New Voice & Transcription Models

"gpt-4o-mini-tts" offers natural, expressive speech with customizable tones.
"gpt-4o-transcribe" improves on Whisper, excelling in noisy environments.
OpenAI won't release these models as open source due to size concerns.

#OpenAI #AI #SpeechToText

**Harald Klinke** @HxxxKxxx@det.social · Mar 22

Mar 22

Harald Klinke @HxxxKxxx@det.social

TTS: Informatiker Thorsten Müller stellt seine KI-gestützte Sprachausgabe “Thorsten-Voice” der Allgemeinheit kostenlos zur Verfügung. Sie liest Texte nicht nur neutral, sondern auch wütend, betrunken oder im hessischen Dialekt vor. Ein Beitrag zur Barrierefreiheit oder riskante Preisgabe persönlicher Identität?
#TextToSpeech #Barrierefreiheit #KünstlicheIntelligenz
https://netzpolitik.org/2025/text-to-speech-dieser-mann-hat-seine-stimme-verschenkt/

netzpolitik.org · Mar 21Text-to-Speech: Dieser Mann hat seine Stimme verschenktThorsten Müller hat eine KI-gestützte Sprachausgabe entwickelt, die jeder Mensch frei nutzen darf. Müllers Stimme liest Texte nicht nur neutral, sondern auf Wunsch auch wütend, betrunken oder hessisch vor.

**Jonas** @jonas99g@nrw.social · Mar 22

Mar 22

Jonas @jonas99g@nrw.social

Danke an Thorsten, dass er seine Stimme an uns alle verschenkt hat. #TTS #TextToSpeech

Sogar die low-Verwion der Piper Stimme klingt echt gut und läuft mit SherpaTTS auf meinem Handy.
https://github.com/woheller69/ttsEngine

https://netzpolitik.org/2025/text-to-speech-dieser-mann-hat-seine-stimme-verschenkt/

**ResearchBuzz: Firehose** @researchbuzz_firehose@rbfirehose.com · Mar 22

Mar 22

ResearchBuzz: Firehose @researchbuzz_firehose@rbfirehose.com

Northeastern University: Northeastern researchers develop AI app to help speech-impaired users communicate more naturally. “Computer science professors Aanchan Mohan and Mirjana Prpa are developing an AI-integrated app that will give speech-impaired users access to a range of communication tools on their phones: speech recognition, text, whole-word selection, emojis and personalized […]

https://rbfirehose.com/2025/03/22/northeastern-university-northeastern-researchers-develop-ai-app-to-help-speech-impaired-users-communicate-more-naturally/

ResearchBuzz: Firehose | Individual posts from ResearchBuzz · Mar 22Northeastern University: Northeastern researchers develop AI app to help speech-impaired users communicate more naturally | ResearchBuzz: Firehose

More from

ResearchBuzz: Firehose

#accessibility #ai #aiassisted

**Winbuzzer** @winbuzzer@mastodon.social · Mar 20

Mar 20

Winbuzzer @winbuzzer@mastodon.social

OpenAI has upgraded its AI speech models, enhancing transcription accuracy and improving voice realism

#AI #GenAI #OpenAI #AISpeech #VoiceAI #AITranscription #TextToSpeech #SpeechToText #AIethics #SyntheticVoices

https://winbuzzer.com/2025/03/20/openai-enhances-ai-speech-models-with-more-realistic-voices-and-improved-transcription-xcxwbn/

**IT News** @itnewsbot@schleuss.online · Mar 19

Mar 19

IT News @itnewsbot@schleuss.online

“Glasses” That Transcribe Text To Audio - Glasses for the blind might sound like an odd idea, given the traditional purpose ... - https://hackaday.com/2025/03/19/glasses-that-transcribe-text-to-audio/ #opticalcharacterrecognition #speechsynthesis #wearablehacks #texttospeech #raspberrypi #glasses

Hackaday · Mar 19“Glasses” That Transcribe Text To AudioGlasses for the blind might sound like an odd idea, given the traditional purpose of glasses and the issue of vision impairment. However, eighth-grade student [Akhil Nagori] built these glasses wit…

**Winbuzzer** @winbuzzer@mastodon.social · Mar 17

Mar 17

Winbuzzer @winbuzzer@mastodon.social

Google has integrated its Chirp 3 HD voice model into Vertex AI enhancing speech synthesis capabilities with customizable and lifelike voice features

#AI #GoogleAI #VertexAI #Chirp3 #VoiceSynthesis #AIVoices #TextToSpeech #GenAI #CustomAIVoices #Alphabet

https://winbuzzer.com/2025/03/17/google-expands-vertex-ai-with-chirp-3-hd-voice-model-xcxwbn/

**The IT Blog** @blog@www.locked.de · Mar 10

Mar 10

The IT Blog @blog@www.locked.de

How MS Edge’s Immersive Reader Helps Me Slow Down

We all probably know the drill of a typical workday: back-to-back meetings, side conversations in team chats about some other topics, drafting & scanning emails, creating Jira issues, and juggling multiple project threads. The sheer volume of information coming in such a short time can be challenging.

Normally, this isn’t an issue for me. But sometimes I find myself struggling to read long texts in the middle of these high-intensity stretches. Not because I lack the time, but because my mind is already racing ahead to the next thing. I can’t seem to slow it down. This is annoying and, to be honest, a little frightening, because I realise that my mind is in a very short-cycle mode – clear evidence that I’m under stress.

Over time, I’ve found a simple trick that helps: Microsoft Edge’s Immersive Reader mode. I don’t just use it to declutter the according web page. I let the browser read the text out loud to me.

Yes, that’s right! I hit the play button, lean back, and keep my hands off the mouse and keyboard to avoid getting distracted by other tabs or windows.

It forces me to slow down and listen instead of skimming the whole page. It eliminates the temptation to jump between paragraphs or skim entire sections. It enforces a slower pace that I have to accept. At first, it’s a bit of a struggle – but I’ve come to realize: it helps me to calm down a bit.

If you ever feel overwhelmed by the sheer speed of work, maybe give it a try. Sometimes, all we need is a different approach to regain control.

https://www.locked.de/how-ms-edges-immersive-reader-helps-me-slow-down/
#ImmersiveReader #MentalLoad #MicrosoftEdge #StressRelief #TextToSpeech #WorkStress

The IT Blog · Mar 10How MS Edge’s Immersive Reader Helps Me Slow DownWe all probably know the drill of a typical workday: back-to-back meetings, side conversations in team chats about some other topics, drafting & scanning emails, creating Jira issues, and juggling multiple project threads. The sheer volume of information coming in such a short time can be challe

**Bryce Belcher** @Blindy@convo.casa · Mar 9 *

Mar 9 *

Bryce Belcher @Blindy@convo.casa

Here's an audio file of Vocalwriter singing yellowribbon. Accompanying that is the RVC version of Mac Fred. Dane originally made an audio file of just Mac Fred singing it. He didn't think it was the best he could've done, but I thought it was better than nothing. Luckily I downloaded it from his mastodon before he got banned from his account. I thought it would be great to have the RVC version of Fred singing along with the Vocalwriter synthesizer. The only thing that could've probably been better is if I added just a little more reverb on Fred, but I think it's really nice. @jaybird110127 If the Dectalk archive was still a thing, I would have put this file up because it's a remake of the original. #TextToSpeech.

00:00/02:52

**Hacker News** @h4ckernews@mastodon.social · Mar 8

Mar 8

Hacker News @h4ckernews@mastodon.social

Spark-TTS: Text-2-Speech Model Single-Stream Decoupled Tokens [pdf] — https://arxiv.org/abs/2503.01710
#HackerNews #SparkTTS #TextToSpeech #AI #DecoupledTokens #MachineLearning

arXiv.orgSpark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech TokensRecent advancements in large language models (LLMs) have driven significant progress in zero-shot text-to-speech (TTS) synthesis. However, existing foundation models rely on multi-stage processing or complex architectures for predicting multiple codebooks, limiting efficiency and integration flexibility. To overcome these challenges, we introduce Spark-TTS, a novel system powered by BiCodec, a single-stream speech codec that decomposes speech into two complementary token types: low-bitrate semantic tokens for linguistic content and fixed-length global tokens for speaker attributes. This disentangled representation, combined with the Qwen2.5 LLM and a chain-of-thought (CoT) generation approach, enables both coarse-grained control (e.g., gender, speaking style) and fine-grained adjustments (e.g., precise pitch values, speaking rate). To facilitate research in controllable TTS, we introduce VoxBox, a meticulously curated 100,000-hour dataset with comprehensive attribute annotations. Extensive experiments demonstrate that Spark-TTS not only achieves state-of-the-art zero-shot voice cloning but also generates highly customizable voices that surpass the limitations of reference-based synthesis. Source code, pre-trained models, and audio samples are available at https://github.com/SparkAudio/Spark-TTS.

**Content King Prasenjit** @Prasenjit_Pro@mastodon.social · Mar 8

Mar 8

Content King Prasenjit @Prasenjit_Pro@mastodon.social

Try My AI Website!

Text to Voice: Convert text into speech
Text to Image: Generate AI visuals
Fast & Easy!

https://ttsimage91.blogspot.com/

#AI #TextToSpeech #TextToImage

**ResearchBuzz: Firehose** @researchbuzz_firehose@rbfirehose.com · Mar 1

Mar 1

ResearchBuzz: Firehose @researchbuzz_firehose@rbfirehose.com

MIT Technology Review: A woman made her AI voice clone say “arse.” Then she got banned.. “Joyce doesn’t use her voice clone all that often. She finds it impractical for everyday conversations. But she does like to hear her old voice and will use it on occasion. One such occasion was when she was waiting for her husband, Paul, to get ready to go out. Joyce typed a message for her voice […]

https://rbfirehose.com/2025/03/01/mit-technology-review-a-woman-made-her-ai-voice-clone-say-arse-then-she-got-banned/

#accessibility #ai #aiassisted