La Quadrature du Net @LaQuadrature

Recent searches

Search options

Only available when logged in.

Optimizing #OpenRefine is fun! Today I took it for a ride on the #sirene database (all french companies). The CSV file weighs 7.8GB uncompressed. It took a while to import it, but… tada!

The OpenRefine UI on a project with 81,692,446 rows, showing the first few rows in it… and not choking!

Apr 16, 2023, 11:26 AM··Web

6boosts·17favorites

**Antonin Delpeuch** @pintoch · Apr 16, 2023

Apr 16, 2023

Antonin Delpeuch @pintoch

This is using the new architecture for the upcoming 4.0 version. There are still quite a few cases in which the tool will definitely choke on a dataset that big, but I am working on those one step at a time.

**Antonin Delpeuch** @pintoch · Apr 16, 2023

Apr 16, 2023

Antonin Delpeuch @pintoch

I have to say 81M rows is beyond the range of what I expect people to use the tool for, but the many performance issues that show up with such a dataset are often worth addressing anyway, because they will still make a difference on smaller datasets.

Drag & drop to upload

Recent searches

Search options

Administered by:

Server stats:

Recent searches

Search options

Administered by:

Server stats:

Back