It's hard to overstate what a scam academic and scientific publishing is. It's run by an oligopoly of wildly profitable companies that coerce academics into working for free for them, and then sell the product of their labors back to the academics' employers (often public institutions) for eye-popping sums.


Here's how that works: a publicly funded researcher (often working for a public institution) does some research. In order to progress up the career ladder and secure more funding, they need to publish their research in a prestigious journal. That journal asks other publicly funded researchers (chosen by a volunteer editorial board of publicly funded researchers) to peer-review and edit the paper.


If the paper is selected for publication, the researcher signs over their copyright in it - life plus 70 years - to the journal, for free.

Then, the sales department of the journal pays a call on institutions that pays the salaries of the paper's authors. They offer a "subscription" to the journal - that is, access to a database that costs almost nothing to maintain - that can cost tens of thousands of dollars per year. Journals have experienced rapid, sustained price inflation for decades.


If someone at that institution were to share the paper their colleague produced in the next lab over, they'd be committing copyright infringement - because their colleague had to give their copyright away to the publisher as a condition of publication, which is, in turn, a condition of career advancement.


This is chokepoint capitalism at its finest: publishers' primary "asset" is a legally defensible barrier between academics and their career prospects, so it can coerce them into accepting all kinds of abusive conduct.

But as bad as it is that billion-dollar multinationals are extracting huge, parasitic rents from our publicly funded knowledge-creation system, that's really just the tip of the iceberg. The real harms come from what this does to science and scholarship.


Locking up all those papers means that researchers who aren't affiliated with wealthy institutions are denied access to the raw materials of study and experimentation. (Naturally, this problem is most keenly felt in the Global South, which means that scientists and scholars in poor countries are denied access to the materials that might help them alleviate the scourges of poverty).


Just as bad is the problem of the study of knowledge itself - the kind of textual analysis that can reveal holes, biases and defects in our research programs. Large-scale text-mining is essential to this kind of work, especially when it comes to fighting corruption in science.


Back when Aaron Swartz was at Stanford, he used his access to the digital law library to do text-mining that showed that when law schools got money from fossil fuel giants, their scholars produced papers that exonerated carbon barons from liability for climate change.


No one knows what Aaron was doing when he downloaded millions of papers from MIT - an act that resulted in his malicious prosecution, leading to his eventual suicide. But many of Aaron's friends suspect he was in the early stages of a similar project.

Aaron's was not the first researcher with a great idea for text-mining analysis, and he won't be the last. And the next Aaron will have a much easier time of it.


That's because Carl Malamud has just released "The General Index" - a full-text-searchable index of 100,000,000 scientific articles.

The catalog contains 355 billion words, and returns five-word snippets (firmly within fair use's boundaries) and citations in response to queries. It's publicly available for all to mine and search.


It's the latest in a series of breathtaking open knowledge efforts from Malamud, who was a frequent collaborator of Aaron's.

And it's part of a wider movement to free up access to scientific and scholarly knowledge by any means necessary, from wildcat efforts like Sci Hub to the campaign to end vaccine apartheid:


how about turning that into a github copilot-like auto-completer? if microsoft can do it without bothering to abide by licenses, can't we?
Sign in to participate in the conversation
La Quadrature du Net - Mastodon - Media Fédéré est une serveur Mastodon francophone, géré par La Quadrature du Net.