Anyone have a good (open) corpus or generator of human names that covers a good amount of the different types of names people can have?

Preferably tagged with ethnicity or nationality. The names don’t have to be real, just representative.

@eumiro maybe?

Follow

@pganssle @eumiro Hi there. I crafted a corpus of 650k names with countries and ethnicities out of PubMed. HIH ! gist.github.com/mazieres/0b905

· · Web · 1 · 2 · 2

@mazieres @eumiro Very nice data set, and pretty cool analysis, though this does seem to be only surnames, and it doesn’t preserve capitalization.

@pganssle

Why don’t you look at “baby names” websites? They have all that. E.g., behindthename.com/. Although it’s probably not free to scrape… But they list also some sources at behindthename.com/info/copyrig

@mazieres @eumiro

@FailForward @mazieres @eumiro That is for given names. It’s not terribly difficult to find lists of given names or lists of surnames, but I’d like more variety. Many people have multiple given names, multiple last names, no last name, no given name, patronymics, etc.

Sign in to participate in the conversation
La Quadrature du Net - Mastodon - Media Fédéré

Mamot.fr est une serveur Mastodon francophone, géré par La Quadrature du Net.