This is a new version of the genderBR package that includes a new function: get_gender_nn(), which uses a character-level neural network to predict gender from Brazilian first names. This model can generalise to names not present in the IBGE census dataset, so it can be used as a complement to the existing functionality in the package. The release also includes some improvements, tests, and documentation updates.
get_gender_nn() is a new exported function that uses a character-level neural network to predict gender from Brazilian first names. Unlike get_gender(), this function can generalise to names not present in the IBGE census dataset.clear_nn_cache() to manage the in-memory model cache.download_gender_model(), an internal function that handles downloading and caching the neural network model weights and vocabulary from Hugging Face.iconv() with chartr() for stripping accents in name cleaning. The previous approach relied on iconv(name, to = "ASCII//TRANSLIT"), which is platform-dependent and returns NA on macOS for accented names (e.g., "joão"). The encoding argument in get_gender, get_gender_nn, and map_gender is now deprecated and will be removed in a future version.torch to Imports; luz and httr2 to Suggests.get_gender.nomes now includes probabilities for 2010 and 2022 (prob_fem10, prob_fem22) and is used when internal = TRUE. This data covers 141,742 unique Brazilian first names.%>% with the base |> operator, thus removing the magrittr dependency (requires R 4.1.0 or higher).data.table for faster joins and removed dplyr/tibble dependencies.In this version, a few improvements and bug fixed were introduced. Most important, connection errors now return informative messages to users.
map_gender and get_gender now return informative error messages when reach timeoutget_gender function better handles non-ASCII charactersIn this minor release, the genderBR package was improved in two ways. First, bugs and some minor issues were fixed, making the package's functions more stable. Second, the package now contains an internal dataset with all the names reported by the IBGE's Census that is used by the get_gender function to predict gender from Brazilian first names. Therefore, classifying a vector with more than 1,000 names takes no more than a few seconds now. Overall, these are the improvements:
NEWS.md file to track changes to the package.get_gender function.round_guess funcion.get_gender function to work with internal data.