A Visual Exploration of Immigration throughout the Century

Leah Bevis - Spring 2009

Appendix

This data was downloaded from the historical census data in NHGIS, in the “Background/Ancestry” section of “General Population Statistics.”  The table name changed over the years, but was usually called something akin to “White Foreign-Born Population by Country of Birth,” (in the early years) or “Foreign-Born Population by Country of Birth.”  The countries and regions listed differed greatly year to year, as summarized in this table: [LINK TO CensusForeignListingsByYear.xls] 

The first step of my analysis was aggregating these countries and regions into my own, new regions.  I did this based on four factors: the quantity of immigrants who came from a country, the political importance of a country’s immigrant group, the consistency for which a country’s immigrants were listed, and the changing borders of particular countries.  I spoke with a professor in the Middlebury History Department, who specialized in Eastern European history, in order to best fit countries to regions.  Having aggregated census countries/regions into my own regions, I also did two larger, aggregated regions: Europe (from ‘Western Europe,’ ‘Scandinavia’ and ‘Britain’) and Asia (from ‘East Asia’ and ‘South Asia and the Middle East’).

The second step of my analysis was using historical US county data compiled by Caitlin Sargent to fit my data into year 2000 counties.  As Caitlin Sargent had used Cascading Density Weighting to estimate the foreign-born counts in county atoms (intersections between historical county definitions for 1900 and every census year onward), I simply assumed that the within-county distribution of each region’s immigrants would be equivalent to the within-county distribution of all foreign-born.  This allowed me to calculate each atom’s immigrants, for each regional group and for each year, by the equation:

(Atom’s FB/Atom’s County’s FB) = (Atom’s Region-FB/Atom’s County’s Region-FB)

After finding each atom’s immigrant numbers for each year, I summed the atoms by 2000 counties, to complete the process of interpolation. 

A couple years had particular problems with their data, which I ‘fixed’ in particular ways.  My data for 1960 showed immigrant numbers that were far, far too high.  After analyzing the numbers for immigrants in 1950, 1960 and 1970, I concluded that what the data was actually showing was not “Foreign-born by Country of Origin,” but “Native and Foreign-born by Country of Origin.”  This hypothesis was supported by the fact that the 1960 variable I was using is called “Population by Country of Origin,” which is clearly not specific to the foreign-born population.  Also, from 1950 onwards there was no accounting for “Native Country of Origin,” but from 1970 onwards both native and foreign-born country of origin was accounted for.  It is my guess that 1960 was a sort of ‘mess up’ year, where the government decided that it would be useful to account for native residents’ country of origin, but for some reason, perhaps by accident, the census did not separate the native and foreign-born country of origin information. 

I ‘fixed’ this 1960 data problem by finding, for each county, the 1970 proportion of residents listed under ‘Native Country of Origin’ to residents listed under ‘Foreign-born Country of Origin.”  I then applied this proportion to the 1960 data, for each immigrant region, so as to estimate the number of immigrants from each region for each county.  When I compared the state and country-level totals of my 1960 estimated immigrant numbers to the 1950 and 1970 census immigrant numbers, the estimates seemed quite accurate.

Data for 1950 was also problematic.  Many counties, especially in the South, had no immigrants listed, while other counties within the same state would have numbers far too high.  The state totals, however, seemed accurate, when compared with 1940, 1970, and new-1960 totals.  Therefore, I applied the 1940 proportions of county to state immigrant numbers to my 1950 state totals.  When mapped alongside the county data for 1940 and 1960, this newly-calculated county data seemed fairly accurate.

There are also problems with the data that I never solved.  For instance, it is evident in my cluster trend charts that Latin America data for 1930, and perhaps some of the other data for 1930, is off. (Two clusters show Latin American immigrant numbers dipping unbelievably low in 1930.)  The data for 1970 may also be off slightly in some way (which would, in turn, make my 1960 ‘fix’ less than perfect).  A few of the cluster charts show a smalls spike in 1970 immigrant numbers, but I am sure whether this spike reflects reality or a data error.  I am inclined, however, to think they reflect an actual spike in immigrant numbers. 

PREVIOUSHOME