La Quadrature du Net @LaQuadrature

Recent searches

Search options

Only available when logged in.

Here is the first article of a series, on how to build a search engine, from scratch, in Rust.

Feel free to give me some feedback

https://jdrouet.github.io/posts/202503161800-search-engine-intro/

👋 Hello · Mar 16👋 HelloHave you ever wondered how search engines work under the hood? I'm not talking about vector search or anything fancy, but just the good all search engines that could really find what we would look for…

#rustlang #encryption #searchengine

Mar 17, 2025, 07:25 AM··Mastodon for Android

7boosts·7favorites

**Jeremie** @jdrouet · Mar 22 *

Mar 22 *

Jeremie @jdrouet

Following the introduction, here is the part 1 of my series of articles on how to build a crossplatform search engine from scratch, in #rustlang.

This section will handle how we'll store the encrypted data on any platform.

Enjoy reading it, feel free to provide some feedback, here or directly on GitHub

https://jdrouet.github.io/posts/202503170800-search-engine-part-1/

jdrouet · Mar 22Building a search engine from scratch, in Rust: part 1Or how to write on disk efficiently in the browser or any other device.

#rust #encryption #searchengine

**Jeremie** @jdrouet · Mar 28

Mar 28

Jeremie @jdrouet

Following part 1, here is the part 2 of my series of articles on how to build a #crossplatform search engine from scratch, in #rustlang .

This article is about how we go from a document to a set of structured indexes.

Enjoy reading it, feel free to provide some feedback, here or directly on GitHub

Here is the link: https://jdrouet.github.io/posts/202503191700-search-engine-part-2/

Hello · Mar 28HelloIn the previous article, I explained how we'll write on disk and how we'll implement an abstraction so that it works on any device as well as in the browser. Now, it's time to start thinking about wha…

#encryption #searchengine #opensource

**Jeremie** @jdrouet · Apr 5

Apr 5

Jeremie @jdrouet

And now, the part 3 or my series of articles on how to build a #crossplatform #searchengine, in #rustlang.

This article is about how the sharding mechanism works.

Enjoy reading it, feel free to provide me some feedback, here or directly on GitHub

If you enjoy it, feel free to share it on other platforms!

Here is the link: https://jdrouet.github.io/posts/202503231000-search-engine-part-3/

jdrouet · Apr 5Building a search engine from scratch, in Rust: part 3Or how we'll implement sharding and transactions for our search engine.

#encryption #opensource #wasm

**Jeremie** @jdrouet · 5d

Jeremie @jdrouet

And now, the part 4 or my series of articles on how to build a #crossplatform #searchengine, in #rustlang.

This article is about how we query the indexes, aggregate the results and generate some scores.

Enjoy reading it, feel free to provide me some feedback, here or directly on GitHub

If you enjoy it, feel free to share it on other platforms!

Here is the link: https://jdrouet.github.io/posts/202503311500-search-engine-part-4/

jdrouet · 6dBuilding a search engine from scratch, in Rust: part 4Or how we'll find something in all this.

**small circle in calmness** @smallcircles@social.coop · Mar 23

Mar 23

small circle in calmness @smallcircles@social.coop

@jdrouet just wanted to say you started a great series. Looking forward to upcoming articles, thank you!

**radentscheider** @pmmeurcatpics@ieji.de · Mar 24

Mar 24

radentscheider @pmmeurcatpics@ieji.de

@jdrouet good article, looking forward to more!

I've got a question regarding `Directory::files`: when iterating over the files, you're `await`ing them one at a time – doesn't that kind of defeat the whole purpose of async? Could one instead spawn a task for each file, and then `join` all the handles?
I think an even nicer implementation could be some kind of `directory.files().filter(Path::is_file).collect()`, but that would require `AsyncIterator`, which AFAIK Rust doesn't currently have

**Jeremie** @jdrouet · Mar 24

Mar 24

Jeremie @jdrouet

@pmmeurcatpics thanks for your feedback!

Spawning a task for each file would require importing a runtime, which I decided not to do. Here, we only use `futures_lite` which doesn't have this `join` macro.

Taking a step back, this function will almost never be used so even if we optimise this, it will not be visible.

**Marginalia** @marginalia@mastodon.social · 5d

Marginalia @marginalia@mastodon.social

@jdrouet You can get some pretty big performance improvements by intersecting the binary indices on the go.

Depending on how they are laid out, you can intersect any number of postings lists in linear to sublinear time, with zero memory overhead. This scales much better than intersecting hash tables.

"Search Engines: Information Retrieval in Practice" has a section discussing the technique in chapter 5.4.7.

**Marginalia** @marginalia@mastodon.social · 5d

Marginalia @marginalia@mastodon.social

@jdrouet This article discusses the technique in more detail with regards to skip lists, though it does (as noted in SeIRP) work with any sorted list.

https://nlp.stanford.edu/IR-book/html/htmledition/faster-postings-list-intersection-via-skip-pointers-1.html

nlp.stanford.eduFaster postings list intersection via skip pointersFaster postings list intersection via skip pointers

**Jeremie** @jdrouet · 5d

Jeremie @jdrouet

@marginalia really interesting! I'll have a look at it. Maybe not for the next article (although the topic is the optimisation). Thanks!

**Marginalia** @marginalia@mastodon.social · 5d *

5d *

Marginalia @marginalia@mastodon.social

@jdrouet It's also fully possible the juice might not not be worth the squeeze for these types optimizations at the scale you're targeting. Though I figured I'd share it none the less, as it's genuinely a very cool optimization that's pretty intuitive.

**Jeremie** @jdrouet · 5d

Jeremie @jdrouet

@marginalia yeah, right now the bottleneck is not here, but more at the encryption/decryption level...

Drag & drop to upload

Recent searches

Search options

Administered by:

Server stats:

Recent searches

Search options

Administered by:

Server stats:

Back