Optimizing #OpenRefine is fun! Today I took it for a ride on the #sirene database (all french companies). The CSV file weighs 7.8GB uncompressed. It took a while to import it, but… tada!
This is using the new architecture for the upcoming 4.0 version. There are still quite a few cases in which the tool will definitely choke on a dataset that big, but I am working on those one step at a time.
I have to say 81M rows is beyond the range of what I expect people to use the tool for, but the many performance issues that show up with such a dataset are often worth addressing anyway, because they will still make a difference on smaller datasets.