The CDC collects mortality statistics at the state and national level based on death certificates issued by local coroners, these death certificates include personal information about the decedent such as the country he was born in, and the circumstances surrounding the death.
Unlike the homicide data from the FBI, the death certificate data collected by the CDC does not suffer from missing data problems. The data also does suffer from the racial misclassification problems that plague FBI data1 (no one calls coroners racist if too many Hispanics or Blacks are the victims of homicides, plus doctors tend to have much higher IQs than cops and be better at filling out forms). This is the same data that is used to study the Hispanic mortality paradox.
Example death certificate from Wikipedia
Disparities in homicide rates by country of birth have been widely studied and there exists a vast empirical literature (but somehow economist never seem to have heard of these studies. Wonder why???)
The results show that foreign-born persons differ in their risks of violent death vis-á-vis the native-born population by the amount of the time they have livedin the USA. In particular, recent immigrants (less than 15 years) display higher risks from homicide Differential mortality risks from violent causes for foreign- and native-born residents of the USA
—Explaining the Mexican‐American Health Paradox Using Selectivity Effects2
and also
The age-adjusted mortality rates due to homicide are exceptionally high among males in all three migrant populations, with the Puerto Rican rate equal to that of Blacks. Rates among the Cuban and Mexican born, however, are also high
—Mortality Differentials among Persons Born in Cuba, Mexico, and Puerto Rico Residing in the United States, 1979-813
Hispanics’ immigrant-to nonimmigrant homicide differential became statistically significant in 1980, and immigrants remained at higher risk through 1992.
—Homicide Risk among Immigrants in California, 1970 through 19924
The results indicate that the foreign-born population exhibits a roughly equal risk of death from all combined violent causes in comparison to the native-born population. Yet, when these causes of violent death are broken down into homicide, suicide, car accidents, and other accidents, a different picture emerges. Specifically, immigrants who have been in the USA for less than 15 years are at much greater risk of homicide mortality, yet experience much lower risks from suicide and other accident mortality in comparison to the native-born population
—Differential mortality risks from violent causes for foreign and native-born residents of the USA 5
I couldn’t let an opportunity pass and decided to download the CDC death certificate data. Fortunately, country of birth of the deceased is available from the multiple cause of death microdata since 1979; unfortunately, they stopped making country of birth information available to the public in 2004:
No geographic identifiers are included in the files for 2005-on due to a restriction imposed by the States. Information on applying for state- and county-identified data from NCHS is available.
—Mortality Data — Vital Statistics NCHS’ Multiple Cause of Death Data, 1959-2016
Here’s what the data for 2004 looks like:
country of birth | year | population | number of homicides | rate |
Mexico | 2004 | 105,11,711 | 1,166 | 11.1 |
Puerto Rico | 2004 | 1,472,217 | 129 | 8.8 |
Cuba | 2004 | 956,352 | 62 | 6.5 |
US | 2004 | 254,619,910 | 15,067 | 5.9 |
Canada | 2004 | 819,256 | 26 | 3.2 |
Unknown | 2004 | NA | 214 | NA |
Island Territories | 2004 | NA | 10 | NA |
Remainder of the World | 2004 | NA | 1,167 | NA |
The break down by country can be pretty rough, since it only includes Mexico, Puerto Rico, Cuba, Canada and a group all category called Remainder of the World. Those born in the US include people of all races/ethnic origins, obviously, and population data was interpolated by monotonic splines from the IPUMS decennial census samples.
Here’s a chart of the same data:
Since the data is available from 1979 it is probably a good idea to plot it all:
The homicide rates of Hispanic immigrants were stratospheric back in the 80’s, but they’ve been going down ever since. I wonder why this happened?
The Cuban data looks a little weird with the big increase in 1980. This is the same year of the Mariel Boatlift. If you’ve seen Scarface you know that Fidel Castro transported many prisoners to port of Mariel so he could get rid of them. This is certainly an interesting result and one that merits more research (specially given how when I interpolated the population numbers from the 1980 and 1990 censuses, I took the increases as constant, with no Mariel Boatlift pulse).
Since immigrants tend to skew male and males are over-represented among victims of homicide, it’s probably a good idea to plot only male homicide rates. In 1980 the Black male homicide rate was 65.6 (38.1 for all genders), so foreign-born hispanics were even more violent than blacks back in the 80’s, but by 2004 their rates were lower.
To download the NBER mortality files you can use this bash script (you’ll need to install csvkit)
1#!/bin/bash2DATADIR=data3mkdir -p $DATADIR4for i in {1979..2004}5do6 if [ ! -f "$DATADIR"/mort$i-homicide.csv.xz ]; then7 TEMPFILE=$(tempfile).zip8 wget -O "$TEMPFILE" http://www.nber.org/mortality/$i/mort$i.csv.zip9 unzip -p "$TEMPFILE" > "$TEMPFILE".csv10 # the files has a column named ucod with the ICD-10 codes,11 # we need to find it's number to be able to filter12 col_number=$(awk -F',' ' { for (i = 1; i <= NF; ++i) print i, $i; exit } ' "$TEMPFILE".csv | grep ucod\" | sed 's|"ucod"||g')13 # filter to include only homicides and non-terrorism14 # operations of war/legal interventions ICD codes15 if [ $i -le 1978 ]; then16 csvgrep -c "$col_number" -f icd8-homicide-codes.txt "$TEMPFILE".csv >> "$DATADIR"/mort$i-homicide.csv17 else18 if [ $i -le 1998 ]; then19 csvgrep -c "$col_number" -f icd9-homicide-codes.txt "$TEMPFILE".csv >> "$DATADIR"/mort$i-homicide.csv20 else21 csvgrep -c "$col_number" -f icd10-homicide-codes.txt "$TEMPFILE".csv >> "$DATADIR"/mort$i-homicide.csv22 fi23 fi24 xz -9 "$DATADIR"/mort$i-homicide.csv25 rm -f "$TEMPFILE" "$TEMPFILE".csv26 fi27done
Because the CDC’s estimates of legal intervention deaths are unreliable I include them in the homicide totals. These are the codes I used to filter homicides (the ones starting with ’#’ were excluded):
icd10-homicide-codes.txt
1#U011 (Terrorism involving destruction of aircraft)2#U012 (Terrorism involving other explosions and fragments)3#U014 (Terrorism involving firearms)4X855X866X877X888X899X9010X9111X9212X9313X9414X9515X9616X9717X9818X9919Y0020Y0121Y0222Y0323Y0424Y0525Y06026Y06127Y06228Y06829Y06930Y07031Y07132Y07233Y07334Y07835Y07936Y0837Y0938Y35039Y35140Y35241Y35342Y35543Y35644Y35745Y36246Y36447Y36648Y36749Y36850Y36951Y87152Y89053#Y891 (Sequelae of war operations)
icd9-homicide-codes.txt
1960029601396149620596216962279629896399641096501196511296521396531496541596551696561796571896581996592096621967022967123967924968025968126968227968328968429968830968931969329703397134972359733697437975389763997740#978 "Legal execution"41991242991943993449954599646997047#9979 "Unspecified form of unconventional warfare"48#999 "Late effect of injury due to war operations"
- The Validity of Race and Hispanic-origin Reporting on Death Certificates in the United States: An Update. https://www.ncbi.nlm.nih.gov/pubmed/28436642↩
- Explaining the Mexican‐American Health Paradox Using Selectivity Effects. https://link.springer.com/article/10.1023/A:1006318101290↩
- Mortality Differentials among Persons Born in Cuba, Mexico, and Puerto Rico Residing in the United States, 1979-81 https://ajph.aphapublications.org/doi/abs/10.2105/AJPH.77.5.603↩
- Homicide Risk among Immigrants in California, 1970 through 1992 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1380372/↩
- Differential mortality risks from violent causes for foreign and native-born residents of the USA https://link.springer.com/article/10.1023/A:1006318101290↩