mamot.fr is one of the many independent Mastodon servers you can use to participate in the fediverse.
Mamot.fr est un serveur Mastodon francophone, géré par La Quadrature du Net.

Server stats:

3.2K
active users

Following the introduction, here is the part 1 of my series of articles on how to build a crossplatform search engine from scratch, in .

This section will handle how we'll store the encrypted data on any platform.

Enjoy reading it, feel free to provide some feedback, here or directly on GitHub 😉

jdrouet.github.io/posts/202503

jdrouet · Building a search engine from scratch, in Rust: part 1Or how to write on disk efficiently in the browser or any other device.

🧵 Following part 1, here is the part 2 of my series of articles on how to build a search engine from scratch, in .

📰 This article is about how we go from a document to a set of structured indexes.

💬 Enjoy reading it, feel free to provide some feedback, here or directly on GitHub 😉

🔗 Here is the link: jdrouet.github.io/posts/202503

Hello · HelloIn the previous article, I explained how we'll write on disk and how we'll implement an abstraction so that it works on any device as well as in the browser. Now, it's time to start thinking about wha…

🧵 And now, the part 3 or my series of articles on how to build a , in .

📰 This article is about how the sharding mechanism works.

💬 Enjoy reading it, feel free to provide me some feedback, here or directly on GitHub 😃

:fediverse: If you enjoy it, feel free to share it on other platforms!

🔗 Here is the link: jdrouet.github.io/posts/202503

jdrouet · Building a search engine from scratch, in Rust: part 3Or how we'll implement sharding and transactions for our search engine.

🧵 And now, the part 4 or my series of articles on how to build a , in .

📰 This article is about how we query the indexes, aggregate the results and generate some scores.

💬 Enjoy reading it, feel free to provide me some feedback, here or directly on GitHub 😃

:fediverse: If you enjoy it, feel free to share it on other platforms!

🔗 Here is the link: jdrouet.github.io/posts/202503

jdrouet · Building a search engine from scratch, in Rust: part 4Or how we'll find something in all this.

@jdrouet just wanted to say you started a great series. Looking forward to upcoming articles, thank you!

@jdrouet good article, looking forward to more!

I've got a question regarding `Directory::files`: when iterating over the files, you're `await`ing them one at a time – doesn't that kind of defeat the whole purpose of async? Could one instead spawn a task for each file, and then `join` all the handles?
I think an even nicer implementation could be some kind of `directory.files().filter(Path::is_file).collect()`, but that would require `AsyncIterator`, which AFAIK Rust doesn't currently have

@pmmeurcatpics thanks for your feedback!

Spawning a task for each file would require importing a runtime, which I decided not to do. Here, we only use `futures_lite` which doesn't have this `join` macro.

Taking a step back, this function will almost never be used so even if we optimise this, it will not be visible. 😉

@jdrouet You can get some pretty big performance improvements by intersecting the binary indices on the go.

Depending on how they are laid out, you can intersect any number of postings lists in linear to sublinear time, with zero memory overhead. This scales much better than intersecting hash tables.

"Search Engines: Information Retrieval in Practice" has a section discussing the technique in chapter 5.4.7.

@marginalia really interesting! I'll have a look at it. Maybe not for the next article (although the topic is the optimisation). Thanks!

@jdrouet It's also fully possible the juice might not not be worth the squeeze for these types optimizations at the scale you're targeting. Though I figured I'd share it none the less, as it's genuinely a very cool optimization that's pretty intuitive.

@marginalia yeah, right now the bottleneck is not here, but more at the encryption/decryption level...