The gender commonly associated with names changes over times. In the chart below, names were thought of as female when they appear near the top, and male when they appear near the bottom. You can change the names that are displayed below.
When you enter your first name and birth year, this application looks your name up in the Social Security Administration’s dataset of baby names. It figures out the proportion of male and female births for the year you were born, and guesses your gender based on that.
This application is based on the gender package for R, which does this prediction in a more sophisticated way. For example, the gender package makes corrections to biases in the SSA dataset, and lets you predict gender based on a range of birth years. It also lets you use other datasets from other times and places. Currently the gender package has two main datasets: the SSA dataset for the twentieth-century United States and a Census dataset for the nineteenth-century United States.
A common problem in historical research is that one has a list of names and wishes to infer additional information from it. Think, for example, of these kinds of lists: company payrolls, military rosters, passenger bookings, records of correspondence, or lists of published works. Some of these records might contain associated information such as a person’s address, age, or rank, while others contain little more than a list of personal names. A common question in the history of American religion, for example, is whether more women than men participated in a church or religious group. A simple membership roll might contain names, but it is unlikely to identify the gender of the members. One might also ask a question about changing patterns in male and female authorship of books. Library catalogs contain enormous amounts of data about publication, but they do not record the gender of authors.
If a historian tried to guess the gender of historical names using contemporary data, then he or she would likely make errors when naming conventions changed. Predicting gender from first names requires a fundamentally historical method.
Sorry. The percentages just might not represent you. And if you weren’t born in the United States, then the data is far less likely to represent you accurately. And of course the application can be as good as the data that is supplied to it. In particular, it has to use the same categories as that data.
For performance reasons, I'm only using names that have been used at least 5,000 times for people born in the United States in the past 130 years. That's about 3,500 names. The complete SSA dataset has roughly 91,000 names.
I made this to experiment with Shiny web application framework for R by RStudio. (Turns out that Shiny is pretty awesome.) But it is also intended as a demonstration of the basics of the gender package’s historical method.