mamot.fr is one of the many independent Mastodon servers you can use to participate in the fediverse.
Mamot.fr est un serveur Mastodon francophone, géré par La Quadrature du Net.

Server stats:

3.1K
active users

#trainingdata

3 posts3 participants0 posts today

Open Web Crawl is such a security vulnerability, that I don’t know why it isn’t the top of the news every day.

If you turn on a general suction hose, how do you not realise there’s going to be a party of attackers right there feeding it all the #propaganda they possibly can?

How can you be so nonchalant about it? How do you not realise you created the biggest attack vector in the history of computing?

'The New York Times' takes OpenAI to court. ChatGPT's future could be on the line

A group of news organizations, led by The New York Times, took ChatGPT maker OpenAI to federal court on Tuesday in a hearing that could determine whether the tech company has to face the publishers in a high-profile copyright infringement trial.

#NYT #media #copyright #legal #ChatGPT #OpenAI #artificialintellilgence #AI #LLM #data #TrainingData #data #technololgy #tech

npr.org/2025/01/14/nx-s1-52589

We've made #Swedish language training data for development of #HTR models available for download, riksarkivet.se/psidata/traning

This data, together with data from other archives whose training data is not for us to publish, is the basis for our HTR-model Swedish Lion Libre, huggingface.co/collections/Rik

If you do use the training data, the model or, even better, you have ground-truth data you'd like to share, just get in touch!

RiksarkivetTräningsdata för HTR-modeller - RiksarkivetDatasetet innehåller noggrant och manuellt avskrivna och uppdelade texter från arkivhandlingar på Riksarkivet.

Harvard Is Releasing a Massive Free AI Training Dataset Funded by OpenAI and Microsoft

Harvard University is releasing a high-quality dataset of nearly 1 million #publicdomain books that could be used by anyone to train large language models and other AI tools. It contains books scanned as part of the #GoogleBooks project that are no longer protected by copyright

#Harvard #Microsoft #OpenAI #ArtificialIntelligence #AI #data #bigdata #trainingdata #technology #tech

wired.com/story/harvard-ai-tra

WIRED · Harvard Is Releasing a Massive Free AI Training Dataset Funded by OpenAI and MicrosoftBy Kate Knibbs

Senate Bill Targets AI ‘Black Box’ Problem, Eyes Transparency in Use of Copyrighted Works

Transparency and Responsibility for Artificial Intelligence Networks (TRAIN) Act on Monday in the latest effort to shield songwriters, musicians and other creators from the unauthorized use of their works in training generative AI models.

#copyright #music #legal #TRAIN #ArtificialIntelligence #AI #data #bigdata #TrainingData #technology #tech

billboard.com/pro/senate-train

Billboard · Senate Bill Targets AI ‘Black Box’ Problem, Eyes Transparency in Use of Copyrighted WorksBy Marc Schneider

Dialogue from 53,000 movies and 85,000 TV episodes is included in an AI-training data set that has been used by Apple, Anthropic, Meta, Nvidia, Salesforce, Bloomberg, and other companies.

It includes writing from every film nominated for Best Picture from 1950 to 2016 and at least 616 episodes of The Simpsons.

#OpenSubtitles #hollywood #TV #movies #copyright #ArtificialIntelligence #AI #LLM #TrainingData #data #bigdata #technology #tech

theatlantic.com/technology/arc

The Atlantic · The Hollywood AI DatabaseBy Alex Reisner