Follow

Creating a database from the leaked data of 533+ million users. The goal is to make a to be able to see if your has been leaked, some graphs and maybe a site.

About 30 seconds to go through a database of 3,000,000 rows on my laptop... and the total will be 533,000,000... How do sites like haveibeenpwned.com get through a few billion rows so quickly?

@retiolus
- they use indexes
- they have *a lot* of RAM to serve all that.
- they are likely not using sqlite, but a real sgbd like MariaDB or PostgreSQL

@vincib Additionally, they may also split their data in chunks to be handled by different worker nodes? (just a guess)

@vincib Adding an index to the sqlite database --> the result of a data at row 3,500,000 is instant, thanks for the tip!

Sign in to participate in the conversation
La Quadrature du Net - Mastodon - Media Fédéré

Mamot.fr est une serveur Mastodon francophone, géré par La Quadrautre du Net.