La Quadrature du Net @LaQuadrature

Recent searches

Search options

Only available when logged in.

**tante** @tante@tldr.nettime.org · Jan 15

Cool project: "Nepenthes" is a tarpit to catch (AI) web crawlers.

"It works by generating an endless sequences of pages, each of which with dozens of links, that simply go back into a the tarpit. Pages are randomly generated, but in a deterministic way, causing them to appear to be flat files that never change. Intentional delay is added to prevent crawlers from bogging down your server, in addition to wasting their time. Lastly, optional Markov-babble can be added to the pages, to give the crawlers something to scrape up and train their LLMs on, hopefully accelerating model collapse."

https://zadzmo.org/code/nepenthes/

zadzmo.orgZADZMO code

**altruios phasma** @altruios@mastodon.social · Jan 15

Jan 15

altruios phasma @altruios@mastodon.social

@tante I have mixed feelings.

Crawlers should respect robots.txt….

At the same time: there is clearly an emotionally based bias happening with LLM’s.

I feel weird about the idea of actively sabotaging. Considering it is only towards bad actors… and considering maybe robots.txt often are too restrictive in my opinion… the gray areas overlap a bit.

Why should we want to actively sabatoge AI dev? Wouldn’t that lead to possible catastrophic results? Who benefits from dumber ai?

**Aaron** @aaron@zadzmo.org · Jan 15

Jan 15

Aaron @aaron@zadzmo.org

@altruios Hi, author of Nepenthes here.

I respect your discomfort, but honestly I'm angry enough about their behavior I want to see them burn. There's been far too much of this:

https://mastodon.social/@khobochka/113724300122190730

I cannot trust traffic to my site to be harmless, so I don't see any reason why something connecting to every site on the internet should be able to trust the site isn't harmful.

@tante

Mastodonkhobochka (@khobochka@mastodon.social)Attached: 1 image #LLMs are a fucking scourge. Perceiving their training infrastructure as anything but a horrific all-consuming parasite destroying the internet (and wasting real-life resources at a grand scale) is delusional. #ChatGPT isn't a fun toy or a useful tool, it's a _someone else's_ utility built with complete disregard for human creativity and craft, mixed with malicious intent masquerading as "progress", and should be treated as such. https://pod.geraspora.de/posts/17342163

Thib @thibaultamartin@mamot.fr

Thanks for your work

@aaron @altruios @tante

Jan 15, 2025, 07:55 PM·

0boosts·7favorites

**ploum** @ploum · Jan 15

Jan 15

ploum @ploum

@thibaultamartin @aaron @altruios @tante : this is awesome and has only one flaw: nepenthes domains can easily be identified and blacklisted.

I don’t know how to really solve this.

**Aaron** @aaron@zadzmo.org · Jan 15

Jan 15

Aaron @aaron@zadzmo.org

@ploum How do you propose identifying them?

Or suppose you want them to stop training on your data. They blocklist your domain and stop crawling it. Mission accomplished?

@thibaultamartin @altruios @tante

Drag & drop to upload

Recent searches

Search options

Administered by:

Server stats:

Recent searches

Search options

Administered by:

Server stats:

Back