This visualization nerd-sniped me today two weeks ago (see original Reddit thread). It is based on the Special Eurobarometer 2015 (EBM) and it suggests all kinds of horribleness about my home country:

(Heatmap of Europe, colored by responses to the Special Eurobarometer 2015 question Would you feel comfortable if one of your children was in a relationship with an X person?) For my non-European friends, my home country is the red one in the center-left.

(Heatmap of Europe, colored by responses to the Special Eurobarometer 2015 question “Would you feel comfortable if one of your children was in a relationship with an X person?”) For my non-European friends, my home country is the red one in the center-left.

This looks all sorts of unpleasant. More to the point, it looks wrong. Czechs think of themselves as the cool stoner uncle of the region, secular and colorblind and live-and-let-live. Could we really be this bad? More importantly - sure, we’ve had some issues recently, but how could Czechland become more xenophobic than Poland? (Humorous tongue-in-cheek nationalist chauvinism is to Europe as baseball is to the US. You can see how this could lead to problems.)

So, is it a survey artifact or is it real? Can we perhaps dismiss it as noise driven by a mistranslation? In terms of mechanism, can the dataset suggest any? Could this, for instance, be driven by the urban-rural gap?

Luckily, this was data from the Special Eurobarometer 2015, so it’s publicly accessible along with complete documentation. That means we can poke it. I kept this RMarkdown notebook to retrace my steps, so that you can poke it too. (You can download it from the top-right button.) Note that I usually haven’t gone back to re-visualize early graphs on the basis of later explorations.

This is equal parts a dataset exploration and data-tidying exercise so if you’re here to talk about why all of this is bullcrap, feel free to skip the parts with R code and go straight to the discussion of validity.

1 Getting the data

Eurobarometer releases its full dataset, which is also indexed by social-science aggregators like ICPSR and GESIS. Using the gesis package, extracting the desired Eurobarometer data is easy. (The only difficult part was determining which Eurobarometer to download because GESIS follows the original EBM numbering convention of (survey round #).(wave #), whereas the EBM reports usually go by a single-integer ID. Here, we knew what year we wanted, so finding the GESIS equivalent was a matter of one Google search.)

You will have to register for free and use those credentials to download the dataset.

setwd("~/Coding/ebm2015/")
library(gesis)
library(tidyverse)
library(haven)
library(ggplot2)
library(scales)
library(ggalt)
library(countrycode)
library(sjmisc)
library(cowplot)
library(printr)
library(stringr)
library(forcats)
library(RColorBrewer)
library(countrycode)
theme_set(theme_bw())
# Look at availability
study_types <- gesis::get_study_groups()
datasets <-    gesis::get_datasets("0008") # EB - Standard and Special Eurobarometer
gesis_login <- gesis::login(username = "simon.podhajsky@gmail.com",
                            password = Sys.getenv('GESIS_PWD')) # Saved in .Renviron
# If you're not planning to share your code, you could type in your password
# above, but you should really get used to storing your secrets elsewhere.

gesis::download_dataset(s = gesis_login, doi = 6595) # Eurobarometer 83.4 (2015)
# past:
# - 2012: 77.4 (DOI 5613)
# - 2009: 71.2 (DOI 4972)
# - 2008: 69.1 (DOI 4743)
# - 2007: 65.4 (DOI 4508)
# - 2002: 57.0 (DOI 3638)
# gesis::download_codebook(doi = 6595) # errors out, for some reason

Downloading the codebook from the GESIS API hasn’t worked for me, so you can get it from ICPSR instead.

# There's no control over the naming of the saved file, so you'll have
# to take a look to make sure you're loading the right one - but GESIS
# will save each survey under a consistent filename, so at least you've
# got that
eb15 <- read_dta("ZA6595_v2-0-0.dta")
dim(eb15)
[1] 27718   512

That’s a lot of Eurobarometer data! But even without downloading the codebook, the structure is well-documented in the data frame attributes. This means that we can take a look-see.

# A lot of well-documented variables -- so many that it overwhelms R console,
# so we'll save the output into a file and look it over in a text editor or
# work it with grep
sink("labels.txt")
(eb15_labels <- sapply(names(eb15), function(x) attributes(eb15[[x]])))
sink()

1.1 Basic properties

Each row is a record of one interview. Grouping by country reveals that each has a sample of about a thousand, except for Montenegro, Northern Ireland, Luxembourg, and former East Germany:

# Look over sample size per country
eb15 %>% group_by(isocntry) %>% summarize(n())

Other identifying info I care about is a listed nationality (which is not the same as country of interview - Czechs can be interviewed abroad, dual citizenship is a thing, …), size of community the respondent lives in (stored separately for each country in p6* columns, but same for the V4 countries), and region of origin at NUTS-2 level. (Each region in each country is sampled roughly equally. I don’t know exactly how the authors of the survey do their stratified sampling, so it’s possible that I’m missing some re-weighting.)

The measures I care about the most were “comfort with X coworker” and “comfort with X child-in-law”. For each value of X, these are stored in columns qc13_* and qc14_*, respectively. There’s a lot of other interesting goodies, but these will do for now.

2 A first look: responses versus community size

To make our analysis easier, we extract columns of interest and then convert the attitude questions from wide to long. We’ll also label some of the values to make visualization easier.

# See raw Rmd original or the codebook for description of question codes
# select countries of the V4 + relevant cols
eb15_v4 <- eb15 %>% 
  select(isocntry, q1_17, q1_19, q1_23, q1_24, uniqid, # country + citizenship
         starts_with("qc13"), starts_with("qc14"), 
         starts_with("p6"), starts_with("p7"),
         ends_with("cz"), ends_with("sk"), 
         ends_with("pl"), ends_with("hu")) %>%
  filter(isocntry %in% c("CZ", "SK", "PL", "HU"))
# Data reshaping:
# - merge "size of community" for CZ, SK and PL
# - make long all presently loaded survey questions (qc*)
eb15_v4_long <- eb15_v4 %>% 
  mutate(community_size = coalesce(p6pl, p6cz, p6sk, p6hu)) %>% 
  select(-starts_with("p6")) %>%
  gather(question, response, starts_with("qc13"), starts_with("qc14")) %>%
  mutate(response = factor(response, levels = 1:13, 
                           labels = c("Not at all comfortable", 2:9, 
                                      "Totally comfortable", 
                                      "Indifferent", "It depends", "Don't know")),
         community_size = factor(community_size, 
                                 labels = c("1" = "Rural area",
                                            "2" = "Towns and suburbs",
                                            "3" = "Cities")))

(Sidenote: here, I convert data into factors manually. This is because when I did this two weeks ago, I did not think to dig through sjmisc, a package that is excellent about digging out and applying the metadata from .dta files. I do use it later.)

Remember the metadata we stored separately? We can create a lookup table for facet_wrap in ggplot2 in order to make our labels more informative.

# Clean up the question codes
question_names <- sapply(eb15_labels, function(x) x$label)
question_names <- gsub("COLLEAGUES AT WORK: ", "", question_names)
question_names["qc13_11"] <- "Trans* person" # Too long for facet labels
question_names <- gsub("LOVE RELATIONSSHIP OF CHILD: ", "", question_names)
question_names["qc14_11"] <- "Trans* person"
question_names <- sapply(question_names, 
                         function(x) R.utils::capitalize(stringr::str_to_lower(x)))
# We'll also create a named vector of country codes for later use as a lookup 
# table
ebs_countries <- unique(eb15$isocntry)
# setNames is a way to get names for an unnamed vector
ebs_country_dict <- setNames(
  countrycode(ebs_countries, 
              "iso2c", "country.name", 
              custom_match = c("GB-NIR" = "GB (Northern Ireland)",
                               "GB-GBN" = "Great Britain",
                               "DE-E" = "Germany (East)",
                               "DE-W" = "Germany (West)")),
  ebs_countries)
# Some of the `isocntry` labels are not actually in the ISO standard, but we
# still want to label them, so we provide a custom_match dictionary

Initially, I visualized all categories of interest in the same graph, but that turned out to be a little overwhelming. I split the attitudes into three categories and ordered them so that the most “popular” subgroup is at the top. (If you’re dealing with a lot of factors, forcats has methods to do this automatically.)

# Reorder the facets
ethnicities = paste0("qc14_", c(4, 3:1))
religions = paste0("qc14_", c(8:9, 5, 7, 6))
sexdisability = paste0("qc14_", 10:12)
ordered_questions = question_names[c(ethnicities, religions, sexdisability)]
# Other than reordering the factors, this also gives them the names from
# the codebook
eb15_v4_long <- eb15_v4_long  %>%
  mutate(question = fct_relevel(as.factor(question),
                                names(ordered_questions)))

We’ll make three graphs that we’ll then combine with the cowplot package, which means that we’ll be reusing a lot of ggplot2 layers. Luckily, we can prepare those ahead of time in a list. (The following assumes ggplot2 knowledge; the short summary is that it implements a “grammar of graphics”, which lets you assign variables to different graphical representations and handles the imperative work that used to take up the most time. Harvard Data Science Services has a good intro.)

## Prepare layers for reuse - we'll be graphing a lot of bar charts! (How can we
# use variables in here before they're ever defined? It is the magic of
# non-standard evaluation! You'll want to read Advanced R by Hadley Wickham, but
# for now, just settle for the explanationof "arguments in ggplot2 are only
# evaluated when it's time to draw the graphs.")
layers <- list(
  # separate graph for each country and each attitude
  facet_grid(question ~ isocntry,
             labeller = labeller(question = as_labeller(question_names),
                                 isocntry = ebs_country_dict)),
  # separate the 1-10 comfort rating from indifference/IDK
  geom_vline(xintercept = 10.5), 
  theme(legend.position = "top", strip.text = element_text(size = 11),
        axis.text.x = element_text(angle = 60, hjust = 1, size = 11)),
  xlab(""), 
  ylab("Response count"),
  ylim(0, 800)
  # highlight community size of respondents to shine some initial light on
  # the urban-rural question
)
makeMainTitle <- function(subset_description = "all", countries = "CZ, HU, PL & SK") {
  ggtitle(paste0("How comfortable would you feel if one of your children was in ",
                 "a love relationship with ___?"),
          subtitle = paste0("Eurobarometer 83.4 (2015) via GESIS, ", 
                            subset_description, " questions, ", countries, 
                            " answers"))
}

Now, we prepare the graphs and put them together with cowplot::plot_grid.

v4_love_ethnicities <- eb15_v4_long %>%
  filter(question %in% ethnicities) %>%
  ggplot() + makeMainTitle("ethnicity") +
  geom_bar(aes(x = response, fill = community_size), width = 0.8) +
  layers
  
v4_love_religions <- eb15_v4_long %>%
  filter(question %in% religions) %>%
  ggplot() + makeMainTitle("religious-affiliation") +
  geom_bar(aes(x = response, fill = community_size), width = 0.8) +
  layers
v4_love_sexdisability <- eb15_v4_long %>%
  filter(question %in% sexdisability) %>%
  ggplot() + makeMainTitle("LGBTQ/disability") +
  geom_bar(aes(x = response, fill = community_size), width = 0.8) +
  layers
v4_love_all <- cowplot::plot_grid(v4_love_ethnicities + makeMainTitle(), 
          v4_love_religions + ggtitle("Subsection: Religions") + 
            theme(legend.position = "none"),
          v4_love_sexdisability + ggtitle("Subsection: LGBTQ and disability") +
            theme(legend.position = "bottom"), 
          ncol = 1, rel_heights = c(4.5, 5, 4))
v4_love_all

This isn’t bad as a first pass. Here’s a couple of things of note:

  1. Different community sizes weren’t sampled evenly, so there’s fewer rural respondents – and they aren’t all conservative. They don’t drive the trends. (A mosaic plot could tell us more about the within-rural split, but doing it with categorical variables would require an edge version of ggmosaic and ggplot2, which has been breaking things for me.)
  2. Holy shit we all suck, but Poland and Hungary have a bimodal distribution of discomfort (a solid second-place finish for “Totally comfortable”), whereas Czech and Slovak respondents differ mostly on the degree of discomfort. This is odd.
  3. Most Czech and Slovak response distributions look oddly similar.

We’ll get to the weirdness in a bit.

2.1 Collapsing the scales

Visual comparisons with a 10-point Likert scale are difficult, so let’s collapse it. This frees up space to print percentages on top. We’ll do the collapsing with forcats::fct_collapse; for now, we’ll stick with my initial choice of grouping the top four response levels, the bottom four, and everything in between. (This is not a neutral choice, of course - more about this in the next section. Do note that in the original EBM report, “Indifferent” was grouped with the most positive responses.)

data_collapsed <- eb15_v4_long %>%
  filter(question %in% c(ethnicities, religions, sexdisability)) %>%
  mutate(response_detailed = fct_relevel(response, # to prettify the legend order
                                         "Don't know", "Indifferent", 
                                         "It depends", after = 5),
         response = 
           fct_collapse(response,
                        "Uncomfortable" = c("Not at all comfortable", 2:3),
                        "Meh" = c(4:6, "Indifferent", 
                                  "It depends", "Don't know"),
                        "Comfortable" = c(7:9, "Totally comfortable")) %>%
           fct_drop())
# precalculated percentages!
# Doing this dance to get the total number of respondents that's different 
# for every country (and possibly every question, though hopefully that does
# not change)
# 
# Note: if not filtered before use, the inclusion of this in a ggplot2 layer 
#   will create additional facets.
data_collapsed_percentages_temp <- data_collapsed %>%
  group_by(isocntry, question, response) %>%
  summarize(count = n())
data_collapsed_percentages <- data_collapsed_percentages_temp %>%
  summarize(total_count = sum(count)) %>%
  merge(data_collapsed_percentages_temp) %>%
  mutate(perc = round(100*count/total_count, 1))
# replace the settings that wouldn't work here
layers_collapsed <- layers
layers_collapsed[[2]] <- scale_fill_manual(
  values = colorRampPalette(brewer.pal(11, "RdYlGn"))(13),
  guide = guide_legend(nrow = 1, label.position = "bottom", title = NULL))
layers_collapsed[[3]] <- theme(panel.grid = element_blank(), 
                               axis.text.x = element_text(hjust = 1, angle = 25),
                               legend.position = "top", strip.text = element_text(size = 11))
layers_collapsed[[6]] <- ylim(0, 1000)
layers_collapsed[[7]] <- geom_bar(aes(x = response, fill = response_detailed), width = 0.8)
collapsed_ethnicities <- ggplot(data_collapsed %>% filter(question %in% ethnicities)) + 
  makeMainTitle("ethnicities") +
  geom_text(data = data_collapsed_percentages %>% filter(question %in% ethnicities), 
            size = 3.5,
            aes(x = response, y = count + 65, label = paste0(perc, "%"))) +
  layers_collapsed
collapsed_religions <- ggplot(data_collapsed %>% filter(question %in% religions)) + 
  makeMainTitle("religious-affiliation") +
  geom_text(data = data_collapsed_percentages %>% filter(question %in% religions), 
            size = 3.5,
            aes(x = response, y = count + 65, label = paste0(perc, "%"))) +
  layers_collapsed
collapsed_sexdisability <- ggplot(data_collapsed %>% filter(question %in% sexdisability)) + 
  makeMainTitle("LGBTQ/disability") +
  geom_text(data = data_collapsed_percentages %>% filter(question %in% sexdisability), 
            size = 3.5,
            aes(x = response, y = count + 65, label = paste0(perc, "%"))) +
  layers_collapsed
collapsed_all <- plot_grid(collapsed_ethnicities + makeMainTitle(), 
          collapsed_religions + ggtitle("Subsection: Religions") + 
            theme(legend.position = "none"),
          collapsed_sexdisability + ggtitle("Subsection: LGBTQ and disability") +
            theme(legend.position = "none"), 
          ncol = 1, rel_heights = c(4.5, 5, 4))
collapsed_all

So that looks pretty bad. Can we blame this on methodology?

3 Methodological criticisms

3.1 The issue of mistranslation

The most common criticism is that the Slovak and Czech surveys have been translated sloppily, which made the data ineligible for cross-border comparisons or, for people who like to make quick conclusions, disqualified this survey in particular and Eurobarometer in general. For the most part, I think that this line of inquiry overreaches.

(You can download each translation from GESIS. The Czech one is here.)

3.1.1 What’s “comfortable”?

Jan Kulveit argues that EBM shat the bed on translating “comfortable” to Czech and Slovak. If you speak Czech, go read it. Here are the four main arguments (some of which appear in the comments):

  1. “Comfortable,” the operative response keyword, is not translated into Czech and Slovak in a way that guarantees one-to-one equivalence of connotations.
  2. The Czech translation of the question is at odds with its colloquial usage, which primes respondents negatively. “How comfortable would you feel if X” is usually used with toddlers who are still working on their theory of mind; “X” is usually “that other kid took your toys away.”
  3. The Czech and Slovak translation of the top response option is awkward. The Czech question asks (approximately) “how great would you feel,” and the most-positive label, “totally great,” is a very peculiar translation of “totally comfortable.” Some argue that it comes off as “extremely overjoyed,” since a common colloquial usage is understating your chagrin by saying that you’re “not extremely overjoyed”.
  4. Although the isn’t a flaw shared by all translations, the translations differ among each other - the top-most response ranges from “wouldn’t mind at all” (Polish) to “would be totally cool with it” (Hungarian & most everything else). Some make positive responses more natural than others. Consequently, cross-national comparisons are right out.

The author then goes on to (accurately) rip the media misrepresentation of the survey and concludes by dismissing the survey as a whole (which is a step too far).

I’m grateful that Jan looked into the nitty-gritty of survey wording. Jan’s hypothesis also explains some of the weirdness I’ve noted - the uncanny Czech/Slovak similarity could be due to the shared mistranslation, and the comparative lack of positive-response spike makes plausible the unpalatability of the positive-most response.

So why shouldn’t we bin the study?

  1. The translation is close enough; one-to-one connotational equivalence is an unreasonably high standard. It’s great if it can happen! It should happen whenever it can! But a lot of our vocabulary, especially the vocabulary that describes culturally shaped subjective experience, some disparity will necessarily exist. It’s the price we pay for comparative research. We should absolutely keep track of differences and should call bullshit when the disparity grows too large.

But in this case, I don’t think the translations are so different as to measure a different construct. “Zcela příjemné” is awkward but not insanely off. Given EBM’s wealth of questions, we can actually check that empirically:

  1. The translation hasn’t precluded Czechs and Slovaks from using the full scale almost as often as other nations when it came to attitudes to a potential white inlaw. Look at the graph from earlier and focus on the “white people” plots in the top row:
v4_love_ethnicities

Jan presents this as conclusive evidence of negative skew / overly positive translation - in the lily-white Czechland and Slovakia, surely you’d expect more than ~70% of people to be “totally comfortable” with a white inlaw, right? I concur that positive responses are probably bleeding into the “indifferent” column - which, for the record, the original EBM report counted among “comfortable” responses. But the fact that the majority of people did use the “totally comfortable” response for a white inlaw also means that the top-most response isn’t off the menu by virtue of catastrophic mistranslation. You can’t shrug off the non-selection of this option for any other demographic subgroup by appealing to mild semantic differences.

The mid-scale bump provides an upper bound on the suggested error. Mostly, it’s pretty tiny. All V4 countries have it to some extent, as does Britain, Germany, and many others. All of this suggests that positive attitudes aren’t getting drastically underreported there. (Both German samples have a higher rate of “Indifferent” responses, though, which is consistent with translation variation redirecting responses there - but again, since “Indifferent” was interpreted as a positive response, this wouldn’t matter for the original EBM report.)

# All countries' attitudes to potential white inlaw
eb15_graph_white <- eb15 %>% select(isocntry, qc14_4) %>% 
  mutate(qc14_4 = to_label(qc14_4)) %>%
  ggplot(aes(x = qc14_4)) + 
  facet_wrap(~ isocntry, ncol = 5, labeller = labeller(isocntry = ebs_country_dict)) + 
  stat_count(mapping = aes(y = ..prop.., group = 1), width = 0.7) +
  geom_rect(data = data.frame(isocntry = c("CZ", "HU", "PL", "SK")), inherit.aes = FALSE, 
            fill = "blue", alpha = 0.1, xmin = -Inf, ymin = -Inf, xmax = Inf, ymax = Inf) +
  scale_y_continuous(labels = percent, sec.axis = dup_axis()) +
  ggtitle(paste0("How comfortable would you feel if one of your children was in \n",
                 "a love relationship with a white person?")) + labs(x = "", y = "") +
  geom_vline(xintercept = 10.5) +
  theme(axis.text.x = element_text(angle = 60, hjust = 1))
eb15_graph_white

  1. Even if the most positive label was problematic, the most negative level is not. “I wouldn’t feel great at all [about having X as an inlaw]” is roughly equivalent to “I wouldn’t feel at all comfortable”. Both express the kind of fear and apprehension that the study is after. At worst, the peculiar translation of the positive label introduces some noise into the top of the scale for Czech and Slovak respondents; it doesn’t invalidate it altogether, and it doesn’t invalidate the negative responses at all.

In summary, the critique properly cautions about transnational response-level percentage comparisons, especially at the positive end of the scale, and about the possible disparate floor/ceiling levels between languages. But it overplays its hand.

(For further empirical testing, see the section below that compares the Eurobarometer results with those from the European Social Survey.)

3.2 Social desirability bias and actual discrimination

Three other criticism I’ve seen floating around:

  • Czechs’ and Slovaks’ answers are not as positive as other nations’ because we’re honest. We’re not as subject to social desirability bias as other countries. If Czechs and Slovaks don’t feel a particular taboo as strongly as other Europeans, isn’t that an interesting finding in its own right? Not to mention that what people feel comfortable saying is an important element of your everyday experience in a society.
  • Due to social desirability bias, positive answers in other countries don’t necessarily translate into tolerant behavior. True. Do note, however, that the converse does not apply: since there’s no social force forcing the negative responses, there’s no reason to think that negative responses don’t predict intolerance (or that the negative respondents are secretly super-tolerant). If people say they’re uncomfortable about minorities, we should probably believe them.
  • The survey cannot sniff out hidden discriminatory behavior. Absolutely. Unlike other measures, it isn’t designed to do that. This doesn’t mean it’s worthless.

To be clear, there’s no way to determine the amount of social desirability bias in each country. You can claim that the differences between countries estimate the difference in social desirability bias rather than the difference in level of (dis)comfort with different people. That’s fine. But let’s not criticize a puzzle piece for failing to contain the whole picture.

So Eurobarometer is not the end-all and be-all of surveys, but its data retain meaning. What better approaches can we take to reporting and visualizing it?

4 Alternative 1: Make conservative estimates of self-reported intolerance

Let’s assume that Jan is correct that many a 5 response in Czech and Slovak is the nonchalant colorblind tolerance that respondents in other languages would put down as 10. If we move response levels 3 and 4 from “Uncomfortable” into “Meh”, we’ll get an adequate count of otherness-based discomfort for Czechs and Slovaks.

Of course, if Jan is right, then the “Meh” and “Comfortable” responses aren’t meaningful. Let’s remove them from the visualization in the best Tuftean tradition. (Which makes it a glorified table, but tables are great. Fancy geometries are for suckers.)

(Under these assumptions, we’re undercounting bigoted Poles and Hungarians, but not by a lot – the distributions cluster around the extremes and not that many people were picking 3 and 4 on the scale anyway.)

It turns out that things don’t get much better.

# We can take the previous data and reshape it
data_collapsed_12 <- data_collapsed %>%
  mutate(response = 
           fct_collapse(response_detailed,
                        "Uncomfortable" = c("Not at all comfortable", 2),
                        "Meh" = c(3:6, "Indifferent", 
                                  "It depends", "Don't know"),
                        "Comfortable" = c(7:9, "Totally comfortable")) %>%
           fct_drop())
data_collapsed_percentages_temp <- data_collapsed_12 %>%
  group_by(isocntry, question, response) %>%
  summarize(count = n())
data_collapsed_12_percentages <- data_collapsed_percentages_temp %>%
  summarize(total_count = sum(count)) %>%
  merge(data_collapsed_percentages_temp) %>%
  mutate(perc = round(100*count/total_count, 1))
# ...but now, we'll remove everything but the "Uncomfortable" factor
data_collapsed_12 <- filter(data_collapsed_12, response == "Uncomfortable")
data_collapsed_12_percentages <- filter(data_collapsed_12_percentages, 
                                        response == "Uncomfortable")
ggplot(data_collapsed_12_percentages, aes(x = isocntry, y = fct_rev(question), fill = perc)) +
  geom_tile() + geom_label(aes(label = paste0(perc, "%")), color = "black") +
  scale_fill_distiller(limits = c(0, 100), palette = "Reds", direction = 1) +
  scale_x_discrete(position = "top", labels = ebs_country_dict) +
  scale_y_discrete(labels = question_names) + ylab("") + xlab("") +
  ggtitle(paste0("How comfortable would you feel if one of your children was in \n",
                 "a love relationship with ___?"),
          subtitle = "Percentage of respondents who rated their comfort below 2 out of 10\n(inclusive), where 1 is 'not at all comfortable'") +
  theme_minimal() +
  theme(legend.position = "none", axis.ticks.x = element_blank(), 
        axis.text = element_text(size = 11))

5 Alternative 2: Measure relative discomfort

The other way to avoid the semantics debate: we can look at respondents’ comfort relative to the comfort with the dominant majority group in the country. (For V4, that’s pretty clearly white people.). This seems like a better measure in a number of ways: for one, it works even if response item 10 in Polish doesn’t translate to response item 10 in Czech. For another, the majority-minority differential is arguably what we mean when we talk about discrimination.

(Alternatively, we could norm relative to the respondent’s own most favored group. I don’t have a clear idea of what we’d be gaining/missing out on.)

Of course, there’s a reason why the EBM people aren’t doing this. The most important that I can think of is that the non-responses – “Don’t know”, “Indifferent”, and “It depends” – don’t fall neatly in the response scale, but excluding them misses out on real information. (We’d miss out on the entire response set of respondents who selected one of these responses for their reference attitude.) Placing them on the scale, however, gives them meaning they might not have. (The decision to place them in the “Meh” category above was only okay because I had no plans to make any inferences about “Meh”.)

This seems like a fun visualization to think about; I hope to get to it later.

6 Alternative 3: Abandon cross-country comparisons

It’s really tempting to compare my tribe to yours. (Admittedly, this is how I fell into this rabbit hole.) But there’s plenty to be gleaned from within-country comparisons, too! So far, the visualizations I’ve been using have invited cross-country comparison - but I don’t actually need to see Poland’s share of uncomfortable attitudes toward the Romani to say that the Czech ones are horribly high.

As a bonus, this also allows for a display of multiple positions that the respondents were asked about the comfort for.

# Let's start again and shape the data better
binary_measures <- vars(starts_with("qc2"), -qc2t, starts_with("qc3"))
comfort_measures <- vars(starts_with("qc4"),
                         starts_with("qc13"), starts_with("qc14"), starts_with("qc18"))
eb15_all <-  select(eb15, isocntry, uniqid,
                    starts_with("qc"), starts_with("sd"), starts_with("d")) %>%
  mutate_all(to_label) %>%
  mutate_at(binary_measures, function(x) x != "Not mentioned") %>% # Binary measures
  mutate_at(comfort_measures, function(x) {
    fct_recode(x, 
               "Not at all comfortable" = "1 Not at all comfortable",
               "Totally comfortable" = "10 Totally comfortable",
               "Indifferent" = "Indifferent (SPONTANEOUS)",
               "Don't know" = "DK")
  })
# question_names_valid <- setNames(names(question_names), make.names(question_names))
# question_names_valid <- question_names_valid[question_names_valid %in% names(eb15_comfort)]
question_names_original <- sapply(eb15_labels, function(x) x$label)
eb15_comfort_wide <- select(eb15_all, isocntry, uniqid, !!! comfort_measures) %>%
  gather(question, response, !!! comfort_measures, -isocntry, -uniqid)
attributes are not identical across measure variables; they will be dropped
eb15_comfort_wide$question <- question_names_original[eb15_comfort_wide$question]
eb15_comfort_wide <- separate(eb15_comfort_wide, question, 
                              into = c("question_kind", "target"), sep = ": ") %>%
  mutate_at(vars(question_kind, target), funs(str_to_lower))
country_counts <- eb15 %>% group_by(isocntry) %>% summarize(total = n())
replication_base <- eb15_comfort_wide %>%
  mutate(target = fct_relevel(target, "different ethnic origin", "white person", "asian person", "black person", "roma person", "different religion", "christian person", "atheist person", "jewish person", "buddhist person", "muslim person", "person under 25 years", "aged under 30", "person over 60 years", "aged over 75")) %>% 
  group_by(isocntry, question_kind, target, response)
# creating a map with measures we're more certain about
uncomfortable_map <- replication_base %>%
  summarize(count = n()) %>% filter(response %in% c("Not at all comfortable", "2")) %>%
  summarize(selected_count = sum(count)) %>%
  merge(country_counts) %>%
  mutate(negative2 = round(100 * selected_count / total, 2))
question_kind_dict = c("colleagues at work" = "How comfortable would you feel if one of your colleagues at work was a  ___?",
                       "love relationsship of child" = "How comfortable would you feel if one of your children was in a love relationship with ___?",
                       "elected politician" = "How comfortable would you feel if the politician in the highest elected office was a ___?")
uncomfortable_map %>% filter(isocntry %in% c("CZ"), question_kind != "showing affection in public") %>%
  ggplot(aes(y = question_kind, x = target, fill = negative2)) +
  facet_wrap(~ question_kind, ncol = 1, scales = "free", 
             labeller = labeller(question_kind = question_kind_dict)) + 
  geom_tile() +
  theme_minimal(base_size = 15) +
  theme(axis.text.y = element_blank(),
        axis.text.x = element_text(angle = 15, hjust = 1, size = 14),
        legend.position = "none",
        plot.margin = unit(c(1, 1, 1, 3), "cm")) +
  scale_fill_distiller(limits = c(0, 100), palette = "Reds", direction = 1) +
  geom_label(aes(label = paste0(negative2, "%")), color = "black") +
  ggtitle("Czech discomfort",
          subtitle = "Percentage of respondents who rated their comfort below 2 out of 10 (inclusive)") +
  labs(x = "", y = "")

I don’t like this visualization - there’s almost certainly a better way to go about doing this. It does get the point across, though: whatever their standing relative to the rest of Europe, there certainly are groups of people that Czechs openly admit being much less comfortable about than others.

7 Alternative 4: Use a better survey

“The European Social Survey measures the same thing better, so why do people keep using the Eurobarometer?” Indeed it does! Sort of. It asks the question in terms of “minding,” which some claim makes the Eurobarometer results different, and it only asks about immigrants of a different race or ethnicity. Here’s an overview for all surveyed countries.

# integrated data file for ESS 2014 - we're looking for "imdetmr"
ess14 <- read_dta("ESS7e02_1.stata/ESS7e02_1.dta")
ess14 %>% group_by(cntry) %>% summarize(n())
library(ggalt)
library(countrycode)
library(sjmisc)
ess14_sub <- ess14 %>% select(cntry, imdetbs, imdetmr, 
                              dweight, pweight, pspwght) %>% # weighting variables
  mutate_at(vars(imdetbs, imdetmr), 
            sjmisc::to_label) %>% 
  filter(cntry != "AT") # apparently, the question was improperly administered in Austria
ess14_overview <- ggplot(ess14_sub, aes(x = imdetmr)) +
  stat_count(mapping = aes(x = imdetmr, y = ..prop.., group = 1), width = 0.7) +
  scale_y_continuous(labels = percent) +
  facet_wrap(~ cntry, ncol = 4, 
             labeller = labeller(cntry = function(x) countrycode(x, "iso2c", "country.name", 
                                                                 custom_match = c("GB" = "Great Britain")))) +
  theme(axis.text.x = element_text(angle = 60, hjust = 1)) +
  geom_vline(xintercept = 11.5) +
  labs(title = paste0("How much you would mind or not mind if an immigrant of ",
                      "a different race \nor ethnic group married a close relative of yours?"),
       subtitle = "European Social Survey 2014, without weighting",
       x = "", y = "") +
  geom_rect(data = data.frame(cntry = c("CZ", "HU", "PL")), inherit.aes = FALSE, 
            fill = "blue", alpha = 0.1, xmin = -Inf, ymin = -Inf, 
            xmax = Inf, ymax = Inf)
ess14_overview

There’s a bunch of properties that make ESS a clumsy comparison. For one, it’s just not as detailed as Eurobarometer: it asks two questions about discomfort/minding and both are coached in general terms (“immigrant of a different race or ethnic group”), so we can neither do the “comfort with white person” sanity check nor can we match the question directly to any one EBM item. ESS also hasn’t included Slovakia, the one other country we know to show a similar pattern potentially related to mistranslation, so we can’t diagnose that either. Finally, ESS has 11 response-scale points contra EBM’s 10 + the “Indifferent option”.

Plus, y’know, different survey taken at a different time with a different methodology asking a different question.

But this entire exercise is about making do with what we have, so let’s compare with the closest EBM equivalent: attitudes towards potential black and Asian inlaws for the subset of countries that were surveyed in both.

(Note that the positive-to-negative direction is reversed in ESS – I didn’t want to mess with reverse scoring. I also drew the intersecting set of ESS/EBM countries manually, so I might have missed some.)

eb15_black_asian <- eb15 %>% select(isocntry, qc14_2, qc14_3) %>% 
  mutate_at(vars(qc14_2, qc14_3), to_label) %>%
  rename(black = qc14_2, asian = qc14_3) %>%
  gather(inlaw, response, black, asian) %>%
  mutate(response = fct_relevel(factor(response), "10 Totally comfortable", after = 9))
ebs_ess_countries <- c("CZ", "HU", "PL", "NL", "BE", "FR", "DK")
ebs_comparison <- eb15_black_asian %>% filter(isocntry %in% ebs_ess_countries)
ess_comparison <- ess14_sub %>% filter(cntry %in% ebs_ess_countries)
ebs_graph <- ggplot(ebs_comparison, aes(x = response)) + 
  facet_wrap(~ isocntry, ncol = 1, labeller = labeller(isocntry = ebs_country_dict)) + 
  stat_count(mapping = aes(y = ..prop.., fill = inlaw, group = inlaw), 
             position = position_dodge(), width = 0.7) +
  labs(x = "", y = "") +
  geom_vline(xintercept = 10.5) +
  theme(axis.text.x = element_text(angle = 60, hjust = 1), legend.position = "bottom") +
  scale_y_continuous(labels = percent, limits = c(0, .75)) +
  ggtitle("Eurobarometer 2015", 
          subtitle = paste0("How comfortable you would feel if one of your children was in \n",
                 "a love relationship with a black or an Asian person?")) 
ess_graph <- ggplot(ess_comparison, aes(x = imdetmr)) +
  stat_count(mapping = aes(x = imdetmr, y = ..prop.., group = 1), width = 0.7) +
  scale_y_continuous(labels = percent, limits = c(0, .75)) +
  facet_wrap(~ cntry, ncol = 1, 
             labeller = labeller(cntry = function(x) countrycode(x, "iso2c", "country.name"))) +
  theme(axis.text.x = element_text(angle = 60, hjust = 1)) +
  geom_vline(xintercept = 11.5) +
  xlab("") + ylab("") +
  ggtitle("European Social Survey 2014", 
          subtitle = paste0("How much you would mind or not mind if an immigrant of ",
                 "a different race \nor ethnic group married a close relative of yours?"))
combined_ebs_ess <- cowplot::plot_grid(ess_graph, ebs_graph, axis = "tblr", align = "h")
combined_ebs_ess

Most countries have a similar response distribution for both surveys, but the Czech one in ESS is noticeably flatter. Let’s look at the extreme ends of each survey’s scale to see how much the two surveys differed:

# Difference: compute percentages, rename levels ("most negative", "most positive"), merge, and make a lollipop chart of differences
ebm_minmax <- ebs_comparison %>% filter(inlaw == "black") %>% 
  group_by(isocntry, response) %>%
  summarize(response_count = n()) %>% 
  mutate(response_perc = response_count / sum(response_count)) %>%
  filter(response %in% c("1 Not at all comfortable", "10 Totally comfortable")) %>%
  ungroup() %>%
  transmute(cntry = isocntry, response_perc,
            response_kind = 
              fct_recode(response,
                         "Negative" = "1 Not at all comfortable",
                         "Positive" = "10 Totally comfortable") %>%
              as.character())
ess_minmax <- ess_comparison %>%  group_by(cntry, imdetmr) %>%
  summarize(response_count = n()) %>% 
  mutate(response_perc = response_count / sum(response_count)) %>%
  filter(imdetmr %in% c("Not mind at all", "Mind a lot")) %>%
  ungroup() %>%
  transmute(cntry, response_perc,
            response_kind = 
              fct_recode(imdetmr,
                         "Negative" = "Mind a lot",
                         "Positive" = "Not mind at all") %>%
              as.character())
ess_ebm_minmax_difference <- inner_join(ebm_minmax, ess_minmax, 
                                        by = c("cntry", "response_kind"), 
                                        suffix = c(".ebm", ".ess")) %>%
  mutate(ess_ebm_delta = response_perc.ess - response_perc.ebm)
Column `cntry` has different attributes on LHS and RHS of join
library(ggalt)
ggplot(data = ess_ebm_minmax_difference, aes(x = 100 * response_perc.ebm, 
                                             xend = 100 * response_perc.ess,
                                             y = cntry)) +
  geom_dumbbell(size_x = 1.8, size_xend = 2, 
                colour_x = "grey12", colour_xend = "dodgerblue2") + 
  facet_grid(. ~ response_kind, 
             labeller = labeller(response_kind = 
                                   c("Negative" = "Mind a lot (ESS) / Not comfortable at all (EBM)",
                                     "Positive" = "Don't mind at all (ESS) / Totally comfortable (EBM)"))) +
  # Unfortunately, since geom_dumbbell cannot create labels for us, we need to
  # draw them manually
  geom_text(data_frame(response_kind = c("Negative", "Negative", 
                                         "Positive", "Positive"), # for faceting
                       x_position = c(2.1, 19.3, 28.5, 38.2),
                       survey_name = c("ESS", "EBM", "ESS", "EBM")),
            inherit.aes = FALSE,
            mapping = aes(y = "PL", x = x_position, 
                          label = survey_name, color = survey_name),
            size = 2.5, fontface = "bold", nudge_y = 0.02) + 
  scale_color_manual(values = c("grey12", "dodgerblue2"), guide = "none") +
  scale_x_continuous(limits = c(0, 60), breaks = c(0, 15, 30, 45, 60)) +
  labs(x = "Percentage points", y = "",
       title = "Fight! European Social Survey 2014 vs. Eurobarometer 2015",
       subtitle = "(using question about potential black inlaws as proxy for the more general question in ESS)",
       caption = "Positive scores include the 4 most positive responses; in EBM, it also includes the 'Indifferent' answer.") +
  theme(plot.caption = element_text(color = "grey40", size = 8))

Bearing in mind the limitations of this comparison (different year, sample, and question mean that we cannot determine how much of the difference is due to measurement error and how much due to genuine attitude difference), allow me to draw your attention to a couple of things:

  • The Czech measure that jumped a lot is not the potentially mistranslated positive one; it’s the negative one. In other words, if the original map was based on ESS, it would look very similar.
  • It’s worth noting, though, that the Czech sample is the only one in which the positive rating decreased from ESS to EBM. (Well, technically, the Belgian sample saw a decrease as well, but one that was well within the margin of error.) This is weak evidence in support of the mistranslation hypothesis - arguendo, the mistranslation didn’t make a difference when it should have, thus exaggerating the true cross-country differences.
    • French was one of the original languages of the Eurobarometer survey, so - by definition - it could not have been mistranslated. Nonetheless, the 22-point rise in positive ratings from ESS to EBM is the largest of any changes - is it because that’s the “natural” result of the reframing? Or perhaps because there’s a larger difference in perception between “immigrant” and “black person” due to the number of French citizens from the overseas territories, so the compared question is not really the same? (Again, I find myself wishing that ESS asked the same questions as EBM.)
  • But there are a lot of country samples that were invariant to the different question and the different response scale, lending support to the idea that the measured construct - explicit admission of discomfort - is robust to different question phrasings in many European countries.

Again, due to the difference between the two surveys, we have to take the differences with a grain of salt. I really wish ESS did ask the same question.

7.1 Are there other ways to (in)validate the EBM results?

My cousin Jan suggested the racial-differences Implicit Association Test, which has some unfortunate issues. Perhaps someone has done the resume methodology? I will be grateful for suggestions – like I said, this is neither my circus nor my monkeys.

8 Making new maps

My original goal was to remake and play around with the little map. This took me the longest and I went through a bunch of attempts, which will in time get their own article.

## Setup
library(tmap)
library(tmaptools)
data(Europe)
convertEBMCountryCodes <- function(isocntry_column) {
  countrycode::countrycode(isocntry_column, "iso2c", "iso3c", 
                           custom_match = c("GB-NIR" = "GBR",
                                            "GB-GBN" = "GBR",
                                            "DE-E" = "DEU",
                                            "DE-W" = "DEU"))
}
# This is a different reshaping of the data
mappable_ebm <- eb15_comfort_wide %>%
  mutate(iso_a3 = convertEBMCountryCodes(isocntry))
ebm_country_counts <- eb15 %>% 
  mutate(iso_a3 = convertEBMCountryCodes(isocntry)) %>%
  group_by(iso_a3) %>%
  summarize(total_respondents = n())
mappable_prepped <- mappable_ebm %>% 
  mutate(response = fct_collapse(as.factor(response), 
                                 "Uncomfortable" = c("Not at all comfortable",
                                                     "2"),
                                 "Comfortable" = c("7", "8", "9", "Totally comfortable", "Indifferent"))) %>%
  mutate_at(vars(iso_a3, question_kind, target), as.factor) %>%
  group_by(iso_a3, question_kind, target, response) %>%
  summarize(response_count = n())
mappable_uncomfortable <- mappable_prepped %>%
  filter(response == "Uncomfortable") %>%
  merge(ebm_country_counts) %>%
  mutate(uncomfortable_percent = 100 * response_count / total_respondents) %>%
  select(iso_a3, question_kind, target, uncomfortable_percent)
mappable_comfortable <- mappable_prepped %>%
  filter(response == "Comfortable") %>%
  merge(ebm_country_counts) %>%
  mutate(comfortable_percent = 100 * response_count / total_respondents) %>%
  select(iso_a3, question_kind, target, comfortable_percent)
mappable_both <- full_join(mappable_comfortable, mappable_uncomfortable,
                           by = c("iso_a3", "question_kind", "target"))
# and reshape the data from ESS to be plottable alongside
mappable_ess <- ess14_sub %>%
  mutate(iso_a3 = countrycode::countrycode(cntry, "iso2c", "iso3c")) %>%
  mutate(imdetmr = fct_collapse(imdetmr, 
                                "Comfortable" = c("Not mind at all", "1", "2", "3"),
                                "Uncomfortable" = c("9", "Mind a lot"))) %>%
  group_by(iso_a3, imdetmr) %>% summarize(response_count = n()) %>%
  mutate(response_percentage = 100 * (response_count / sum(response_count))) %>%
  select(-response_count) %>%
  filter(imdetmr %in% c("Comfortable", "Uncomfortable")) %>%
  spread(imdetmr, response_percentage) %>% ungroup()
## Workhorse function for map display
# expects a long dataset that is ready for a merge by `iso_a3` column
mapLoveComfort <- function(dataset, target_name, column_of_interest = "comfortable_percent",
                           column_title = "% totally comfortable\n(7-10 on scale or answered 'Indifferent')", 
                           palette = "RdYlGn") {
  append_data(Europe, dataset %>%
                filter(question_kind == "love relationsship of child",
                       target == target_name),
              "iso_a3", "iso_a3", ignore.na = TRUE, 
              ignore.duplicates = TRUE) %>%
    tm_shape() +
    tm_polygons(column_of_interest,
                title = column_title,
                palette = palette,
                auto.palette.mapping = FALSE,
                breaks = seq.int(0, 100, 10), 
                legend.format = function(x) paste0(x, "%")) +
    tm_layout(legend.outside = F, legend.position = c("RIGHT", "TOP"),
              legend.frame = T, 
              main.title = paste0("How comfortable would you feel if your child\nwas in ",
                                  "a love relationship with a ", R.utils::capitalize(target_name), "?"),
              main.title.size = 1)
}
targets <- paste0(c("black", "asian", "muslim", "jewish"), " person")

8.1 Replicating the original map

This is, sort of, the original map. The main difference is that it uses yellow midpoints for the transition between red and green.

original_comfort_maps <- sapply(targets, function(x) mapLoveComfort(mappable_both, x, column_of_interest = "comfortable_percent"), simplify = FALSE)
do.call(tmap_arrange, original_comfort_maps)

This is an improvement on the original map’s colorscheme, I think. There’s still an implication that the green areas are “correct”, though, which misleadingly suggests a normative criterion at >50% “comfortable” population, so…

8.1.1 The original map with a better color scale

…we can make a single-color map, which also has the benefit of not being an asshole to people with red-green colorblindness.

original_comfort_maps <- sapply(targets, function(x) mapLoveComfort(mappable_both, x, column_of_interest = "comfortable_percent", palette = "-Reds"), simplify = FALSE)
do.call(tmap_arrange, original_comfort_maps)

8.2 The European Social Survey “comfort” map

As a relevant comparison, we can, of course, take a look at the European Social Survey’s most positive responses. For clarity’s sake, I show them using both palettes.

ess_common_map <- append_data(Europe, mappable_ess,
                              "iso_a3", "iso_a3", ignore.na = TRUE, 
                              ignore.duplicates = TRUE) %>%
  tm_shape() +
  tm_layout(legend.outside = F, legend.position = c("RIGHT", "TOP"),
            legend.frame = T, 
            main.title = paste0("How comfortable would you feel if your child\nwas in ",
                                "a love relationship with an immigrant?"),
            main.title.size = 1)
ess_single_palette_map <- ess_common_map + tm_polygons("Comfortable",
              title = "% comfortable (7 to 'Not mind at all')",
              palette = "-Reds",
              auto.palette.mapping = FALSE,
              breaks = seq.int(0, 100, 10), 
              legend.format = function(x) paste0(x, "%"))
ess_original_palette_map <- ess_common_map + tm_polygons("Comfortable",
              title = "% comfortable (7 to 'Not mind at all')",
              palette = "RdYlGn",
              auto.palette.mapping = FALSE,
              breaks = seq.int(0, 100, 10), 
              legend.format = function(x) paste0(x, "%"))
tmap_arrange(ess_single_palette_map, ess_original_palette_map)

8.3 The conservative “discomfort” map

As noted in the Methodological criticisms section, we have a bunch of reasons to think that the extreme negative end of the scale might be more informative than the positive end.

In this case, we’ve applied stricter inclusion criteria – we only take the lowest two points rather than the lowest four.

original_discomfort_maps <- sapply(targets, function(x) mapLoveComfort(mappable_both, x, column_of_interest = "uncomfortable_percent", column_title = "% not at all comfortable\n(1 or 2 on scale)",  palette = "Reds"), simplify = FALSE)
do.call(tmap_arrange, original_discomfort_maps)

This makes my homeland look much less anti-Semitic, so hurray!

8.3.1 Conservative “discomfort” measure with the original palette

Finally, just because we can, here’s the same map with the original colors.

ebm_discomfort <- sapply(targets, function(x) mapLoveComfort(mappable_both, x, column_of_interest = "uncomfortable_percent", column_title = "% not at all comfortable\n(1 or 2 on scale)",  palette = "-RdYlGn"), simplify = FALSE)
do.call(tmap_arrange, ebm_discomfort)

9 Motivations and limitations

Since my motivations for writing this are going to come up, I thought I’d supply a couple of answers. (This is, of course, to divert the pure of heart away from the (((TrUtH))). You caught me.)

9.1 Who’s paying you?

Not George Soros, although back in ninth grade, I really wanted him to. Over the years, I’m afraid I actually spent much more time on Charles Koch’s dime. I was a Bakala Scholar, which would ~exPlain tHings~, but also a Kellner Family Scholar, which… wouldn’t. Hilariously enough, to the best of my knowledge, I have never received an EU grant.

Right now, the only institution that’s paying me is Yale, but not for this.

If you’re seeking to impute impure incentives to me, my list of past funding sources might prove confusing. It might almost be simpler to imagine that some people might hold beliefs that are independent of their funding.

To suggest an alternative to the monetary motive, I’d like to share three things about myself:

  1. I like detailed open data and the level of Eurobarometer’s coverage was delightful. Dismissing it out of hand seems wasteful.
  2. I like butting into arguments that allow for empirical counter-claims.
  3. I get stuck on interesting problems, especially when people are eager to argue about them.

9.2 You’re just protecting a study that confirms your beliefs!

You’re just dismissing a study that questions your beliefs! See how this is not a useful criticism? Nobody is free of cognitive biases.

But we can take a look at the data, make our best arguments, and see if others can tear them down in a way that makes us discard them. I might not change your mind and you might not change mine (though I’ll try!), but we don’t just do it for ourselves; we do it so that the people who aren’t as invested in the question can make a better decision. My epistemology needs not be flawless to benefit the community epistemics.

Honestly, what animates me the most in writing this post is the blithe dismissal of a survey that is carefully documented, its raw data publicly available, and its scope narrow enough that it can ask questions at a very granular level. It collected a large sample using in-person interviews, taking care to obtain geographical variability within each country. It actually did try to guard against mistranslation by implementing a back-translation step. In short, the people begind Eurobarometer did a lot of things right.

Some have called #OpenScience mere methodological terrorism. As that line of argument goes, we’re just mean bullies seeking to find a couple of flaws in order to discredit a whole research program. If we’ve become large, the critique goes, it is because we’re standing on the corpses of well-meaning colleagues. My preferred response to that criticism is giving credit where it’s due.

I care about the quality of science. I care about the quality of science a lot. In the past couple of years, this has often meant cutting away sloppy and/or underpowered research in order to reduce false positives. But it’s important to remember that we’re trying to reach a balance between Type I and Type II errors here, though: there’s an amount of error that’s sufficient to dismiss a dataset, but it’s not any amount.

(In other words, it’s that sweet, sweet Arnold Foundation cash I’m after.)

Which doesn’t actually mean that I’m letting Eurobarometer of the hook.

9.3 There are not the limitations you are looking for

In a real sense, much of the criticism misses the forest for the trees. It’s nitpicking, whether or not the nits are there, because nit-sized problems we can deal with.

Does any of the critiques I addressed matter? Even if the survey translation achieved perfect fidelity across languages and the original questions were phrased so that they avoided biasing the respondents, it would still be a social-science instrument which, along with everything, is fucked.

In one sense, this survey is fine. It attempted to observe an accurate estimate of the response distribution, sampling the population evenly; within its constraints, it succeeded. Interpretations and inferences are our problem, as is overcoming the inevitable conceptual challenges on the way. You wouldn’t blame the Apollo Command Module for its inability to land on the Moon. It isn’t a problem that we need the wider tapestry of research findings to embed this one in a context.

It’s important to note that the survey doesn’t go for any underlying constructs. Nobody claims that the questions predict particular behaviors. We do with this survey what we do with most surveys: describe it thoroughly, then wave it suggestively.

But then there’s a list of potential methodological issues that is longer than my arm. The thing with that list is that you can’t win. Doing in-person interviews? Some people will give answers to please the recruiter, not accurate self-report. Sending out an internet survey? You probably don’t have a representative sample. If your question doesn’t bias the respondent negatively, it’s probably just because it’s biasing them in the opposite direction. Should you count the 11% of people reporting the belief that the country is or might be run by lizardmen? Include them and your aggregate will get a lot of noise; omit them and you’re undersampling the conspiracy theorists.

(If you want to read a thorough takedown of “asking people things is useful,” Bertrand and Mullainathan (2001) provide a particularly vicious lashing, which nonetheless includes a measurement error perspective. Most of their examples and logic come from Plous (1993).)

Each decision in designing a study has to find the lesser evils. I can’t think of a design that is beyond reproach. Everywhere you look, you have to deal with imperfection.

This is what makes social science hard.

9.4 Quantifying the error

The original Eurobarometer Report’s Technical Appendix provides a table that gives the percentage-point margin of error. I assume that this margin accounts for sampling error. With the data and the criticisms we have, we can enlarge the margin by each of the following:

  • semantic shifts between translations (5-10 percentage points, based on the analysis above)
  • respondent error (3-5 percentage points, based on the analysis above)
  • people trolling (3-5 percentage points)

(I’m not including the social desirability bias because I think that’s a part of the construct under measurement. If you wish to use this study to estimate possible racist behavior, feel free to tack on that error, too.)

Am I pulling these out of my butt? You bet, but I think these are pretty pessimistic numbers, and it’s not like you don’t have them. For example, if you’re dismissing the study outright, then the error bars you assign must be unable to distinguish between 0% and 100% response. Since surveys will always contain errors, I’d say this framework is more useful than a blanket rejection – or a blanket acceptance – of a survey. For one, the framework also forces you to be explicit about the things that you consider well done.

Let’s assume the worst-case scenario: each of the errors is independent of others and all are on the higher end of the range. If we sum them up, we end up with a margin of error of twenty-something percentage points. With these error bars, the cross-country comparison is much rougher – but we can still say with some confidence that former Czechoslovakia has a larger explicit anti-Roma attitude problem than most of Europe.

9.5 …so what you’re saying is that social science is useless?

I prefer the term differently useful. No. I’m saying that if you’re going to dismiss a social-science result, you should do that on the basis of problems it actually has, and only to the extent that those problems warrant. The measurement crisis is upon us; let’s cut through the bullshit and talk about the problems that actually matter.

If that makes you not at all comfortable, good; me too.

10 References

Bertrand, Marianne, and Sendhil Mullainathan. 2001. “Do People Mean What They Say? Implications for Subjective Survey Data.” papers.ssrn.com. doi:10.2139/ssrn.260131.

Plous, Scott. 1993. “The Effects of Question Wording and Framing.” In The Psychology of Judgment and Decision Making, 65–76. McGraw-Hill Book Company. http://psycnet.apa.org/psycinfo/1993-97429-000.

---
title: "How prejudiced are we really? One more look at that 2015 Eurobarometer"
author: "Simon Podhajsky"
date: "27 August 2017"
output:
  html_notebook:
    code_folding: hide
    fig_height: 6
    fig_width: 8
    number_sections: yes
    theme: cosmo
    toc: yes
    toc_float: yes
  html_document:
    code_folding: hide
    fig_height: 6
    fig_width: 8
    number_sections: yes
    theme: cosmo
    toc: yes
    toc_float: yes
bibliography: survey_limitations.bib
---

This visualization nerd-sniped me ~~today~~ two weeks ago ([see original Reddit thread](https://www.reddit.com/r/MapPorn/comments/6t41vm/eu_would_you_feel_comfortable_if_your_child_was/)). It is based on the Special Eurobarometer 2015 (EBM) and it suggests all kinds of horribleness about my home country:

![_(Heatmap of Europe, colored by responses to the Special Eurobarometer 2015 question "Would you feel comfortable if one of your children was in a relationship with an X person?") For my non-European friends, my home country is the red one in the center-left._](https://i.redd.it/bfg2ndh786fz.png)

This looks all sorts of unpleasant. More to the point, it looks _wrong_. Czechs think of themselves as the cool stoner uncle of the region, secular and colorblind and live-and-let-live. Could we really be this bad? More importantly - sure, we've had some issues recently, but how could Czechland become more xenophobic than _Poland_? (Humorous tongue-in-cheek nationalist chauvinism is to Europe as baseball is to the US. You can see how this could lead to problems.)

So, is it a survey artifact or is it real? Can we perhaps dismiss it as noise driven by a mistranslation? In terms of mechanism, can the dataset suggest any? Could this, for instance, be driven by the urban-rural gap? 

Luckily, this was data from the Special Eurobarometer 2015, so it's publicly accessible along with complete documentation. That means we can poke it. I kept this RMarkdown notebook to retrace my steps, so that you can poke it too. (You can download it from the top-right button.) Note that I usually haven't gone back to re-visualize early graphs on the basis of later explorations.

This is equal parts a dataset exploration and data-tidying exercise so if you're here to talk about why all of this is bullcrap, feel free to skip the parts with R code and go straight to [the discussion of validity](#methodological-criticisms).

# Getting the data

Eurobarometer releases its full dataset, which is also indexed by social-science aggregators like [ICPSR](https://www.icpsr.umich.edu/icpsrweb/) and [GESIS.](https://www.gesis.org/home/) Using the `gesis` package, extracting the desired Eurobarometer data is easy. (The only difficult part was determining which Eurobarometer to download because GESIS follows the original EBM numbering convention of `(survey round #).(wave #)`, whereas the EBM reports usually go by a single-integer ID. Here, we knew what year we wanted, so finding the GESIS equivalent was a matter of one Google search.)

You will have to register for free and use those credentials to download the dataset.

```{r, message=FALSE, warning=FALSE}
setwd("~/Coding/ebm2015/")
library(gesis)
library(tidyverse)
library(haven)
library(ggplot2)
library(scales)
library(ggalt)
library(countrycode)
library(sjmisc)
library(cowplot)
library(printr)
library(stringr)
library(forcats)
library(RColorBrewer)
library(countrycode)
theme_set(theme_bw())
```
```{r, eval=FALSE, message=FALSE, warning=FALSE, cache=TRUE}
# Look at availability
study_types <- gesis::get_study_groups()
datasets <-    gesis::get_datasets("0008") # EB - Standard and Special Eurobarometer
gesis_login <- gesis::login(username = "simon.podhajsky@gmail.com",
                            password = Sys.getenv('GESIS_PWD')) # Saved in .Renviron
# If you're not planning to share your code, you could type in your password
# above, but you should really get used to storing your secrets elsewhere.

gesis::download_dataset(s = gesis_login, doi = 6595) # Eurobarometer 83.4 (2015)
# past:
# - 2012: 77.4 (DOI 5613)
# - 2009: 71.2 (DOI 4972)
# - 2008: 69.1 (DOI 4743)
# - 2007: 65.4 (DOI 4508)
# - 2002: 57.0 (DOI 3638)
# gesis::download_codebook(doi = 6595) # errors out, for some reason
```

Downloading the codebook from the GESIS API hasn't worked for me, so you can [get it from ICPSR instead](http://www.icpsr.umich.edu/cgi-bin/file?comp=none&study=36403&ds=1&file_id=1217673&path=ICPSR).

```{r}
# There's no control over the naming of the saved file, so you'll have
# to take a look to make sure you're loading the right one - but GESIS
# will save each survey under a consistent filename, so at least you've
# got that
eb15 <- read_dta("ZA6595_v2-0-0.dta")
dim(eb15)
```

That's a lot of Eurobarometer data! But even without [downloading the codebook](http://www.icpsr.umich.edu/cgi-bin/file?comp=none&study=36403&ds=1&file_id=1217673&path=ICPSR), the structure is well-documented in the data frame attributes. This means that we can take a look-see.

```{r, results="hide"}
# A lot of well-documented variables -- so many that it overwhelms R console,
# so we'll save the output into a file and look it over in a text editor or
# work it with grep
sink("labels.txt")
(eb15_labels <- sapply(names(eb15), function(x) attributes(eb15[[x]])))
sink()
```

## Basic properties

Each row is a record of one interview. Grouping by country reveals that each has a sample of about a thousand, except for Montenegro, Northern Ireland, Luxembourg, and former East Germany:

```{r}
# Look over sample size per country
eb15 %>% group_by(isocntry) %>% summarize(n())
```

Other identifying info I care about is a listed nationality (which is not the same as country of interview - Czechs can be interviewed abroad, dual citizenship is a thing, ...), size of community the respondent lives in (stored separately for each country in `p6*` columns, but same for the V4 countries), and region of origin at NUTS-2 level. (Each region in each country is sampled roughly equally. I don't know exactly how the authors of the survey do their stratified sampling, so it's possible that I'm missing some re-weighting.)

The measures I care about the most were "comfort with X coworker" and "comfort with X child-in-law". For each value of X, these are stored in columns `qc13_*` and `qc14_*`, respectively. There's a lot of other interesting goodies, but these will do for now. 

# A first look: responses versus community size

To make our analysis easier, we extract columns of interest and then convert the attitude questions from wide to long. We'll also label some of the values to make visualization easier.

```{r, eval=FALSE, include=FALSE}
# Identifiers: isocntry, q1_17 = Czech nationality, q1_24 = Slovak nationality
# Other: p6cz (size of community in CZ), p7cz (NUTS 2 regional code)
# qc1_*: perception of discrimination on the basis of X in country
# qc2_*: experienced discrimination on the basis of X
# qc3_*: job candidate is at a disadvantage because of X
# qc4_*: comfortable with highest elected politician who is X
# qc5_*: support vs. oppose diversity measures at work
# qc6: rating of effectiveness of efforts to fight discrimination
# qc7_*: enough being done to promote diversity
# qc12_*: diversity sufficiently reflected in the media
# qc13_*: comfort with colleague at work who is X
# qc16_*: equal-rights statements for gay/trans ppl
# qc18_*: comfort with X people showing affection in public
# sd1_*: had contact with minority X
# sd2_*: consider yourself a minority X
# sd3: your religion
# d1: left-right political placement
# d10: gender
# d11: age
# d8: stopped studying at age
# d25: community kind/size
# d60: economic difficulty (paying the bills)
# d63: SES self-placement
# d70: satisfied with life you lead
# d77: trying to persuade friends
# d72_* my voice counts in (1) EU (2) my country
# d78: EU attitude
# d73_*: EU/my country in right/wrong direction
#
# margin of error between 1.5 and 3 percentage points for this sample size
```
```{r}
# See raw Rmd original or the codebook for description of question codes
# select countries of the V4 + relevant cols
eb15_v4 <- eb15 %>% 
  select(isocntry, q1_17, q1_19, q1_23, q1_24, uniqid, # country + citizenship
         starts_with("qc13"), starts_with("qc14"), 
         starts_with("p6"), starts_with("p7"),
         ends_with("cz"), ends_with("sk"), 
         ends_with("pl"), ends_with("hu")) %>%
  filter(isocntry %in% c("CZ", "SK", "PL", "HU"))
```
```{r, message=FALSE, warning=FALSE}
# Data reshaping:
# - merge "size of community" for CZ, SK and PL
# - make long all presently loaded survey questions (qc*)
eb15_v4_long <- eb15_v4 %>% 
  mutate(community_size = coalesce(p6pl, p6cz, p6sk, p6hu)) %>% 
  select(-starts_with("p6")) %>%
  gather(question, response, starts_with("qc13"), starts_with("qc14")) %>%
  mutate(response = factor(response, levels = 1:13, 
                           labels = c("Not at all comfortable", 2:9, 
                                      "Totally comfortable", 
                                      "Indifferent", "It depends", "Don't know")),
         community_size = factor(community_size, 
                                 labels = c("1" = "Rural area",
                                            "2" = "Towns and suburbs",
                                            "3" = "Cities")))

```

(Sidenote: here, I convert data into factors manually. This is because when I did this two weeks ago, I did not think to dig through `sjmisc`, a package that is excellent about digging out and applying the metadata from `.dta` files. I do use it later.)

Remember the metadata we stored separately? We can create a lookup table for `facet_wrap` in ggplot2 in order to make our labels more informative.

```{r}
# Clean up the question codes
question_names <- sapply(eb15_labels, function(x) x$label)
question_names <- gsub("COLLEAGUES AT WORK: ", "", question_names)
question_names["qc13_11"] <- "Trans* person" # Too long for facet labels
question_names <- gsub("LOVE RELATIONSSHIP OF CHILD: ", "", question_names)
question_names["qc14_11"] <- "Trans* person"
question_names <- sapply(question_names, 
                         function(x) R.utils::capitalize(stringr::str_to_lower(x)))
```
```{r}
# We'll also create a named vector of country codes for later use as a lookup 
# table
ebs_countries <- unique(eb15$isocntry)
# setNames is a way to get names for an unnamed vector
ebs_country_dict <- setNames(
  countrycode(ebs_countries, 
              "iso2c", "country.name", 
              custom_match = c("GB-NIR" = "GB (Northern Ireland)",
                               "GB-GBN" = "Great Britain",
                               "DE-E" = "Germany (East)",
                               "DE-W" = "Germany (West)")),
  ebs_countries)
# Some of the `isocntry` labels are not actually in the ISO standard, but we
# still want to label them, so we provide a custom_match dictionary

```

Initially, I visualized all categories of interest in the same graph, but that turned out to be a little overwhelming. I split the attitudes into three categories and ordered them so that the most "popular" subgroup is at the top. (If you're dealing with a lot of factors, `forcats` has methods to do this automatically.)

```{r}
# Reorder the facets
ethnicities = paste0("qc14_", c(4, 3:1))
religions = paste0("qc14_", c(8:9, 5, 7, 6))
sexdisability = paste0("qc14_", 10:12)
ordered_questions = question_names[c(ethnicities, religions, sexdisability)]
# Other than reordering the factors, this also gives them the names from
# the codebook
eb15_v4_long <- eb15_v4_long  %>%
  mutate(question = fct_relevel(as.factor(question),
                                names(ordered_questions)))
```

We'll make three graphs that we'll then combine with the `cowplot` package, which means that we'll be reusing a lot of ggplot2 layers. Luckily, we can prepare those ahead of time in a list. (The following assumes `ggplot2` knowledge; the short summary is that it implements a "grammar of graphics", which lets you assign variables to different graphical representations and handles the imperative work that used to take up the most time. [Harvard Data Science Services](http://tutorials.iq.harvard.edu/R/Rgraphics/Rgraphics.html) has a good intro.)

```{r}
## Prepare layers for reuse - we'll be graphing a lot of bar charts! (How can we
# use variables in here before they're ever defined? It is the magic of
# non-standard evaluation! You'll want to read Advanced R by Hadley Wickham, but
# for now, just settle for the explanationof "arguments in ggplot2 are only
# evaluated when it's time to draw the graphs.")
layers <- list(
  # separate graph for each country and each attitude
  facet_grid(question ~ isocntry,
             labeller = labeller(question = as_labeller(question_names),
                                 isocntry = ebs_country_dict)),
  # separate the 1-10 comfort rating from indifference/IDK
  geom_vline(xintercept = 10.5), 
  theme(legend.position = "top", strip.text = element_text(size = 11),
        axis.text.x = element_text(angle = 60, hjust = 1, size = 11)),
  xlab(""), 
  ylab("Response count"),
  ylim(0, 800)
  # highlight community size of respondents to shine some initial light on
  # the urban-rural question
)
makeMainTitle <- function(subset_description = "all", countries = "CZ, HU, PL & SK") {
  ggtitle(paste0("How comfortable would you feel if one of your children was in ",
                 "a love relationship with ___?"),
          subtitle = paste0("Eurobarometer 83.4 (2015) via GESIS, ", 
                            subset_description, " questions, ", countries, 
                            " answers"))
}
```

Now, we prepare the graphs and put them together with `cowplot::plot_grid`.

```{r}
v4_love_ethnicities <- eb15_v4_long %>%
  filter(question %in% ethnicities) %>%
  ggplot() + makeMainTitle("ethnicity") +
  geom_bar(aes(x = response, fill = community_size), width = 0.8) +
  layers
  
v4_love_religions <- eb15_v4_long %>%
  filter(question %in% religions) %>%
  ggplot() + makeMainTitle("religious-affiliation") +
  geom_bar(aes(x = response, fill = community_size), width = 0.8) +
  layers
v4_love_sexdisability <- eb15_v4_long %>%
  filter(question %in% sexdisability) %>%
  ggplot() + makeMainTitle("LGBTQ/disability") +
  geom_bar(aes(x = response, fill = community_size), width = 0.8) +
  layers

```
```{r, fig.width = 10, fig.height = 24, messages = FALSE}
v4_love_all <- cowplot::plot_grid(v4_love_ethnicities + makeMainTitle(), 
          v4_love_religions + ggtitle("Subsection: Religions") + 
            theme(legend.position = "none"),
          v4_love_sexdisability + ggtitle("Subsection: LGBTQ and disability") +
            theme(legend.position = "bottom"), 
          ncol = 1, rel_heights = c(4.5, 5, 4))
v4_love_all
```


```{r, eval=FALSE, include=FALSE, fig.width = 8, fig.height = 10}
ggsave("love_ethnicities.png", 
       v4_love_ethnicities, 
       width = 10, height = 9)
ggsave("love_religions.png", 
       v4_love_religions, 
       width = 10, height = 10)
ggsave("love_sexdisability.png", 
       v4_love_sexdisability, 
       width = 10, height = 8)
```
```{r, eval=FALSE, fig.height=24, fig.width=10, include=FALSE}
ggsave("v4_love_all.png", v4_love_all, width = 10, height = 24)
```

This isn't bad as a first pass. Here's a couple of things of note:

1. Different community sizes weren't sampled evenly, so there's fewer rural respondents -- and they aren't all conservative. They don't drive the trends. (A mosaic plot could tell us more about the within-rural split, but doing it with categorical variables would require an edge version of `ggmosaic` and `ggplot2`, which has been [breaking things for me.](https://github.com/haleyjeppson/ggmosaic/issues/11))
2. Holy shit we all suck, but Poland and Hungary have a bimodal distribution of discomfort (a solid second-place finish for "Totally comfortable"), whereas Czech and Slovak respondents differ mostly on the degree of discomfort. This is odd.
3. Most Czech and Slovak response distributions look oddly similar.

We'll get to the weirdness in a bit.

## Collapsing the scales

Visual comparisons with a 10-point Likert scale are difficult, so let's collapse it. This frees up space to print percentages on top. We'll do the collapsing with `forcats::fct_collapse`; for now, we'll stick with my initial choice of grouping the top four response levels, the bottom four, and everything in between. (This is not a neutral choice, of course - more about this in the next section. Do note that in the original EBM report, "Indifferent" was grouped with the most positive responses.)

```{r, fig.height=24, fig.width=8, message=FALSE, warning=FALSE}
data_collapsed <- eb15_v4_long %>%
  filter(question %in% c(ethnicities, religions, sexdisability)) %>%
  mutate(response_detailed = fct_relevel(response, # to prettify the legend order
                                         "Don't know", "Indifferent", 
                                         "It depends", after = 5),
         response = 
           fct_collapse(response,
                        "Uncomfortable" = c("Not at all comfortable", 2:3),
                        "Meh" = c(4:6, "Indifferent", 
                                  "It depends", "Don't know"),
                        "Comfortable" = c(7:9, "Totally comfortable")) %>%
           fct_drop())

# precalculated percentages!
# Doing this dance to get the total number of respondents that's different 
# for every country (and possibly every question, though hopefully that does
# not change)
# 
# Note: if not filtered before use, the inclusion of this in a ggplot2 layer 
#   will create additional facets.
data_collapsed_percentages_temp <- data_collapsed %>%
  group_by(isocntry, question, response) %>%
  summarize(count = n())
data_collapsed_percentages <- data_collapsed_percentages_temp %>%
  summarize(total_count = sum(count)) %>%
  merge(data_collapsed_percentages_temp) %>%
  mutate(perc = round(100*count/total_count, 1))

# replace the settings that wouldn't work here
layers_collapsed <- layers
layers_collapsed[[2]] <- scale_fill_manual(
  values = colorRampPalette(brewer.pal(11, "RdYlGn"))(13),
  guide = guide_legend(nrow = 1, label.position = "bottom", title = NULL))
layers_collapsed[[3]] <- theme(panel.grid = element_blank(), 
                               axis.text.x = element_text(hjust = 1, angle = 25),
                               legend.position = "top", strip.text = element_text(size = 11))
layers_collapsed[[6]] <- ylim(0, 1000)
layers_collapsed[[7]] <- geom_bar(aes(x = response, fill = response_detailed), width = 0.8)
```

```{r, fig.width = 8, fig.height = 8}
collapsed_ethnicities <- ggplot(data_collapsed %>% filter(question %in% ethnicities)) + 
  makeMainTitle("ethnicities") +
  geom_text(data = data_collapsed_percentages %>% filter(question %in% ethnicities), 
            size = 3.5,
            aes(x = response, y = count + 65, label = paste0(perc, "%"))) +
  layers_collapsed
```
```{r, fig.height=9, fig.width=8}
collapsed_religions <- ggplot(data_collapsed %>% filter(question %in% religions)) + 
  makeMainTitle("religious-affiliation") +
  geom_text(data = data_collapsed_percentages %>% filter(question %in% religions), 
            size = 3.5,
            aes(x = response, y = count + 65, label = paste0(perc, "%"))) +
  layers_collapsed
```

```{r, fig.width = 8, fig.height = 7}
collapsed_sexdisability <- ggplot(data_collapsed %>% filter(question %in% sexdisability)) + 
  makeMainTitle("LGBTQ/disability") +
  geom_text(data = data_collapsed_percentages %>% filter(question %in% sexdisability), 
            size = 3.5,
            aes(x = response, y = count + 65, label = paste0(perc, "%"))) +
  layers_collapsed
```

```{r, fig.width = 8, fig.height = 24}
collapsed_all <- plot_grid(collapsed_ethnicities + makeMainTitle(), 
          collapsed_religions + ggtitle("Subsection: Religions") + 
            theme(legend.position = "none"),
          collapsed_sexdisability + ggtitle("Subsection: LGBTQ and disability") +
            theme(legend.position = "none"), 
          ncol = 1, rel_heights = c(4.5, 5, 4))
collapsed_all
```
```{r, eval=FALSE, include=FALSE}
ggsave("love_ethnicities_perc.png", 
       collapsed_ethnicities, 
       width = 8, height = 8)
ggsave("love_religions_perc.png", 
       collapsed_religions, 
       width = 8, height = 9)
ggsave("love_sexdisability_perc.png", 
       collapsed_sexdisability, 
       width = 8, height = 7)
ggsave("v4_love_all_perc.png", collapsed_all, width = 8, height = 24)
```

So that looks pretty bad. Can we blame this on methodology?

# Methodological criticisms

## The issue of mistranslation

The most common criticism is that the Slovak and Czech surveys have been translated sloppily, which made the data ineligible for cross-border comparisons or, for people who like to make quick conclusions, disqualified this survey in particular and Eurobarometer in general. For the most part, I think that this line of inquiry overreaches.

([You can download each translation from GESIS.](https://dbk.gesis.org/dbksearch/SDesc2.asp?no=6595&ll=10&af=&nf=1&db=e&search=&search2=&notabs=1&l=p&p=1) [The Czech one is here.](https://dbk.gesis.org/dbksearch/download.asp?db=E&id=58348))

### What's "comfortable"?

[Jan Kulveit argues that EBM shat the bed on translating "comfortable" to Czech and Slovak.](https://www.facebook.com/notes/jan-kulveit/netolerantn%C3%AD-%C4%8De%C5%A1i-a-slov%C3%A1ci-aneb-jak-probl%C3%A9m-zatemnit-statistikou/10153200911660108/) If you speak Czech, go read it. Here are the four main arguments (some of which appear in the comments): 

1. "Comfortable," the operative response keyword, is not translated into Czech and Slovak in a way that guarantees one-to-one equivalence of connotations.
2. The Czech translation of the question is at odds with its colloquial usage, which primes respondents negatively. "How comfortable would you feel if X" is usually used with toddlers who are still working on their theory of mind; "X" is usually "that other kid took _your_ toys away."
3. The Czech and Slovak translation of the top response option is awkward. The Czech question asks (approximately) "how great would you feel," and the most-positive label, "totally great," is a very peculiar translation of "totally comfortable." Some argue that it comes off as "extremely overjoyed," since a common colloquial usage is understating your chagrin by saying that you're "not extremely overjoyed". 
4. Although the isn't a flaw shared by all translations, the translations differ among each other - the top-most response ranges from "wouldn't mind at all" (Polish) to "would be totally cool with it" (Hungarian & most everything else). Some make positive responses more natural than others. Consequently, cross-national comparisons are right out.

The author then goes on to (accurately) rip the media misrepresentation of the survey and concludes by dismissing the survey as a whole (which is a step too far).

I'm grateful that Jan looked into the nitty-gritty of survey wording. Jan's hypothesis also explains some of the weirdness I've noted - the uncanny Czech/Slovak similarity could be due to the shared mistranslation, and the comparative lack of positive-response spike makes plausible the unpalatability of the positive-most response.

So why shouldn't we bin the study?

1. **The translation is close enough; one-to-one connotational equivalence is an unreasonably high standard.** It's great if it can happen! It should happen whenever it can! But a lot of our vocabulary, especially the vocabulary that describes culturally shaped subjective experience, some disparity will necessarily exist. It's the price we pay for comparative research. We should absolutely keep track of differences and should call bullshit when the disparity grows too large. 

But in this case, I don't think the translations are so different as to measure a different construct. "Zcela příjemné" is awkward but not insanely off. Given EBM's wealth of questions, we can actually check that empirically:

2. **The translation hasn't precluded Czechs and Slovaks from using the full scale almost as often as other nations when it came to attitudes to a potential white inlaw**. Look at the graph from earlier and focus on the "white people" plots in the top row:

```{r, fig.width = 8, fig.height = 8}
v4_love_ethnicities
```

Jan presents this as conclusive evidence of negative skew / overly positive translation - in the lily-white Czechland and Slovakia, surely you'd expect more than ~70% of people to be "totally comfortable" with a white inlaw, right? I concur that positive responses are probably bleeding into the "indifferent" column - which, for the record, [the original EBM report](http://ec.europa.eu/commfrontoffice/publicopinion/index.cfm/ResultDoc/download/DocumentKy/68004) counted among "comfortable" responses. But the fact that the majority of people _did_ use the "totally comfortable" response for a white inlaw also means that the top-most response isn't off the menu by virtue of catastrophic mistranslation. **You can't shrug off the non-selection of this option for any other demographic subgroup by appealing to mild semantic differences**.

The mid-scale bump provides an upper bound on the suggested error. Mostly, it's pretty tiny. All V4 countries have it to some extent, as does Britain, Germany, and many others. All of this suggests that positive attitudes aren't getting drastically underreported there. (Both German samples have a higher rate of "Indifferent" responses, though, which is consistent with translation variation redirecting responses there - but again, since "Indifferent" was interpreted as a positive response, this wouldn't matter for the original EBM report.)

```{r, fig.width = 10, fig.height = 10}
# All countries' attitudes to potential white inlaw
eb15_graph_white <- eb15 %>% select(isocntry, qc14_4) %>% 
  mutate(qc14_4 = to_label(qc14_4)) %>%
  ggplot(aes(x = qc14_4)) + 
  facet_wrap(~ isocntry, ncol = 5, labeller = labeller(isocntry = ebs_country_dict)) + 
  stat_count(mapping = aes(y = ..prop.., group = 1), width = 0.7) +
  geom_rect(data = data.frame(isocntry = c("CZ", "HU", "PL", "SK")), inherit.aes = FALSE, 
            fill = "blue", alpha = 0.1, xmin = -Inf, ymin = -Inf, xmax = Inf, ymax = Inf) +
  scale_y_continuous(labels = percent, sec.axis = dup_axis()) +
  ggtitle(paste0("How comfortable would you feel if one of your children was in \n",
                 "a love relationship with a white person?")) + labs(x = "", y = "") +
  geom_vline(xintercept = 10.5) +
  theme(axis.text.x = element_text(angle = 60, hjust = 1))
eb15_graph_white
```

```{r, eval = FALSE, include = FALSE}
ggsave("eb15_graph_white.png", eb15_graph_white, width = 10, height = 10)
```

3. **Even if the most positive label was problematic, the most negative level is not.** "I wouldn't feel great at all [about having X as an inlaw]" **is** roughly equivalent to "I wouldn't feel at all comfortable". Both express the kind of fear and apprehension that the study is after. At worst, the peculiar translation of the positive label introduces some noise into the top of the scale for Czech and Slovak respondents; it doesn't invalidate it altogether, and it doesn't invalidate the negative responses at all.

In summary, the critique properly cautions about transnational response-level percentage comparisons, especially at the positive end of the scale, and about the possible disparate floor/ceiling levels between languages. But it overplays its hand.

[(For further empirical testing, see the section below that compares the Eurobarometer results with those from the European Social Survey.)](#solution-4-use-a-better-survey)

## Social desirability bias and _actual_ discrimination

Three other criticism I've seen floating around:

- **Czechs' and Slovaks' answers are not as positive as other nations' because we're honest. We're not as subject to [social desirability bias](https://en.wikipedia.org/wiki/Social_desirability_bias) as other countries.** If Czechs and Slovaks don't feel a particular taboo as strongly as other Europeans, isn't that an interesting finding in its own right? Not to mention that what people feel comfortable saying is an important element of your everyday experience in a society.
- **Due to social desirability bias, positive answers in other countries don't necessarily translate into tolerant behavior.** True. Do note, however, that the converse does not apply: since there's no social force forcing the negative responses, there's no reason to think that negative responses don't predict intolerance (or that the negative respondents are secretly super-tolerant). **If people say they're uncomfortable about minorities, we should probably believe them.**
- **The survey cannot sniff out hidden discriminatory behavior.** Absolutely. Unlike other measures, it isn't designed to do that. This doesn't mean it's worthless.

To be clear, there's no way to determine the amount of social desirability bias in each country. You can claim that the differences between countries estimate the difference in social desirability bias rather than the difference in level of (dis)comfort with different people. That's fine. But let's not criticize a puzzle piece for failing to contain the whole picture.

So Eurobarometer is not the end-all and be-all of surveys, but its data retain meaning. What better approaches can we take to reporting and visualizing it?

# Alternative 1: Make conservative estimates of self-reported intolerance

Let's assume that Jan is correct that many a 5 response in Czech and Slovak is the nonchalant colorblind tolerance that respondents in other languages would put down as 10. If we move response levels 3 and 4 from "Uncomfortable" into "Meh", we'll get an adequate count of otherness-based discomfort for Czechs and Slovaks. 

Of course, if Jan is right, then the "Meh" and "Comfortable" responses aren't meaningful. Let's remove them from the visualization in the best Tuftean tradition. (Which makes it a glorified table, but tables are great. Fancy geometries are for suckers.)

(Under these assumptions, we're undercounting bigoted Poles and Hungarians, but not by a lot -- the distributions cluster around the extremes and not that many people were picking 3 and 4 on the scale anyway.)

It turns out that things don't get much better.

```{r, fig.width = 10, fig.height = 5}
# We can take the previous data and reshape it
data_collapsed_12 <- data_collapsed %>%
  mutate(response = 
           fct_collapse(response_detailed,
                        "Uncomfortable" = c("Not at all comfortable", 2),
                        "Meh" = c(3:6, "Indifferent", 
                                  "It depends", "Don't know"),
                        "Comfortable" = c(7:9, "Totally comfortable")) %>%
           fct_drop())

data_collapsed_percentages_temp <- data_collapsed_12 %>%
  group_by(isocntry, question, response) %>%
  summarize(count = n())
data_collapsed_12_percentages <- data_collapsed_percentages_temp %>%
  summarize(total_count = sum(count)) %>%
  merge(data_collapsed_percentages_temp) %>%
  mutate(perc = round(100*count/total_count, 1))

# ...but now, we'll remove everything but the "Uncomfortable" factor
data_collapsed_12 <- filter(data_collapsed_12, response == "Uncomfortable")
data_collapsed_12_percentages <- filter(data_collapsed_12_percentages, 
                                        response == "Uncomfortable")

ggplot(data_collapsed_12_percentages, aes(x = isocntry, y = fct_rev(question), fill = perc)) +
  geom_tile() + geom_label(aes(label = paste0(perc, "%")), color = "black") +
  scale_fill_distiller(limits = c(0, 100), palette = "Reds", direction = 1) +
  scale_x_discrete(position = "top", labels = ebs_country_dict) +
  scale_y_discrete(labels = question_names) + ylab("") + xlab("") +
  ggtitle(paste0("How comfortable would you feel if one of your children was in \n",
                 "a love relationship with ___?"),
          subtitle = "Percentage of respondents who rated their comfort below 2 out of 10\n(inclusive), where 1 is 'not at all comfortable'") +
  theme_minimal() +
  theme(legend.position = "none", axis.ticks.x = element_blank(), 
        axis.text = element_text(size = 11))
```


# Alternative 2: Measure relative discomfort

The other way to avoid the semantics debate: we can look at respondents' comfort _relative to the comfort with the dominant majority group in the country_. (For V4, that's pretty clearly white people.). This seems like a better measure in a number of ways: for one, it works even if response item 10 in Polish doesn't translate to response item 10 in Czech. For another, the majority-minority differential is arguably what we mean when we talk about discrimination.

(Alternatively, we could norm _relative to the respondent's own most favored group_. I don't have a clear idea of what we'd be gaining/missing out on.)

Of course, there's a reason why the EBM people aren't doing this. The most important that I can think of is that the non-responses -- "Don't know", "Indifferent", and "It depends" -- don't fall neatly in the response scale, but excluding them misses out on real information. (We'd miss out on the entire response set of respondents who selected one of these responses for their reference attitude.) Placing them on the scale, however, gives them meaning they might not have. (The decision to place them in the "Meh" category above was only okay because I had no plans to make any inferences about "Meh".)

This seems like a fun visualization to think about; I hope to get to it later.

# Alternative 3: Abandon cross-country comparisons

It's really tempting to compare my tribe to yours. (Admittedly, this is how I fell into this rabbit hole.) But there's plenty to be gleaned from within-country comparisons, too! So far, the visualizations I've been using have invited cross-country comparison - but I don't actually need to see Poland's share of uncomfortable attitudes toward the Romani to say that the Czech ones are horribly high.

As a bonus, this also allows for a display of multiple positions that the respondents were asked about the comfort for.

```{r}
# Let's start again and shape the data better
binary_measures <- vars(starts_with("qc2"), -qc2t, starts_with("qc3"))
comfort_measures <- vars(starts_with("qc4"),
                         starts_with("qc13"), starts_with("qc14"), starts_with("qc18"))
eb15_all <-  select(eb15, isocntry, uniqid,
                    starts_with("qc"), starts_with("sd"), starts_with("d")) %>%
  mutate_all(to_label) %>%
  mutate_at(binary_measures, function(x) x != "Not mentioned") %>% # Binary measures
  mutate_at(comfort_measures, function(x) {
    fct_recode(x, 
               "Not at all comfortable" = "1 Not at all comfortable",
               "Totally comfortable" = "10 Totally comfortable",
               "Indifferent" = "Indifferent (SPONTANEOUS)",
               "Don't know" = "DK")
  })
# question_names_valid <- setNames(names(question_names), make.names(question_names))
# question_names_valid <- question_names_valid[question_names_valid %in% names(eb15_comfort)]
question_names_original <- sapply(eb15_labels, function(x) x$label)
eb15_comfort_wide <- select(eb15_all, isocntry, uniqid, !!! comfort_measures) %>%
  gather(question, response, !!! comfort_measures, -isocntry, -uniqid)
eb15_comfort_wide$question <- question_names_original[eb15_comfort_wide$question]
eb15_comfort_wide <- separate(eb15_comfort_wide, question, 
                              into = c("question_kind", "target"), sep = ": ") %>%
  mutate_at(vars(question_kind, target), funs(str_to_lower))
```
```{r, fig.width = 10, fig.height = 6}
country_counts <- eb15 %>% group_by(isocntry) %>% summarize(total = n())
replication_base <- eb15_comfort_wide %>%
  mutate(target = fct_relevel(target, "different ethnic origin", "white person", "asian person", "black person", "roma person", "different religion", "christian person", "atheist person", "jewish person", "buddhist person", "muslim person", "person under 25 years", "aged under 30", "person over 60 years", "aged over 75")) %>% 
  group_by(isocntry, question_kind, target, response)
# creating a map with measures we're more certain about
uncomfortable_map <- replication_base %>%
  summarize(count = n()) %>% filter(response %in% c("Not at all comfortable", "2")) %>%
  summarize(selected_count = sum(count)) %>%
  merge(country_counts) %>%
  mutate(negative2 = round(100 * selected_count / total, 2))
question_kind_dict = c("colleagues at work" = "How comfortable would you feel if one of your colleagues at work was a  ___?",
                       "love relationsship of child" = "How comfortable would you feel if one of your children was in a love relationship with ___?",
                       "elected politician" = "How comfortable would you feel if the politician in the highest elected office was a ___?")
uncomfortable_map %>% filter(isocntry %in% c("CZ"), question_kind != "showing affection in public") %>%
  ggplot(aes(y = question_kind, x = target, fill = negative2)) +
  facet_wrap(~ question_kind, ncol = 1, scales = "free", 
             labeller = labeller(question_kind = question_kind_dict)) + 
  geom_tile() +
  theme_minimal(base_size = 15) +
  theme(axis.text.y = element_blank(),
        axis.text.x = element_text(angle = 15, hjust = 1, size = 14),
        legend.position = "none",
        plot.margin = unit(c(1, 1, 1, 3), "cm")) +
  scale_fill_distiller(limits = c(0, 100), palette = "Reds", direction = 1) +
  geom_label(aes(label = paste0(negative2, "%")), color = "black") +
  ggtitle("Czech discomfort",
          subtitle = "Percentage of respondents who rated their comfort below 2 out of 10 (inclusive)") +
  labs(x = "", y = "")
```

I don't like this visualization - there's almost certainly a better way to go about doing this. It does get the point across, though: whatever their standing relative to the rest of Europe, there certainly are groups of people that Czechs openly admit being much less comfortable about than others.

# Alternative 4: Use a better survey

"The European Social Survey measures the same thing better, so why do people keep using the Eurobarometer?" [Indeed it does!](http://nesstar.ess.nsd.uib.no/webview/index.jsp?v=2&submode=variable&study=http%3A%2F%2F129.177.90.83%3A-1%2Fobj%2FfStudy%2FESS7e02.1&gs=undefined&variable=http%3A%2F%2F129.177.90.83%3A80%2Fobj%2FfVariable%2FESS7e02.1_V181&mode=documentation&top=yes) Sort of. It asks the question in terms of "minding," which some claim makes the Eurobarometer results different, and it only asks about immigrants of a different race or ethnicity. Here's an overview for all surveyed countries.

```{r}
# integrated data file for ESS 2014 - we're looking for "imdetmr"
ess14 <- read_dta("ESS7e02_1.stata/ESS7e02_1.dta")
```
```{r}
ess14 %>% group_by(cntry) %>% summarize(n())
```
```{r, fig.width=8, fig.height=8}
library(ggalt)
library(countrycode)
library(sjmisc)
ess14_sub <- ess14 %>% select(cntry, imdetbs, imdetmr, 
                              dweight, pweight, pspwght) %>% # weighting variables
  mutate_at(vars(imdetbs, imdetmr), 
            sjmisc::to_label) %>% 
  filter(cntry != "AT") # apparently, the question was improperly administered in Austria

ess14_overview <- ggplot(ess14_sub, aes(x = imdetmr)) +
  stat_count(mapping = aes(x = imdetmr, y = ..prop.., group = 1), width = 0.7) +
  scale_y_continuous(labels = percent) +
  facet_wrap(~ cntry, ncol = 4, 
             labeller = labeller(cntry = function(x) countrycode(x, "iso2c", "country.name", 
                                                                 custom_match = c("GB" = "Great Britain")))) +
  theme(axis.text.x = element_text(angle = 60, hjust = 1)) +
  geom_vline(xintercept = 11.5) +
  labs(title = paste0("How much you would mind or not mind if an immigrant of ",
                      "a different race \nor ethnic group married a close relative of yours?"),
       subtitle = "European Social Survey 2014, without weighting",
       x = "", y = "") +
  geom_rect(data = data.frame(cntry = c("CZ", "HU", "PL")), inherit.aes = FALSE, 
            fill = "blue", alpha = 0.1, xmin = -Inf, ymin = -Inf, 
            xmax = Inf, ymax = Inf)
ess14_overview
```

There's a bunch of properties that make ESS a clumsy comparison. For one, it's just not as detailed as Eurobarometer: it asks two questions about discomfort/minding and both are coached in general terms ("immigrant of a different race or ethnic group"), so we can neither do the "comfort with white person" sanity check nor can we match the question directly to any one EBM item. ESS also hasn't included Slovakia, the one other country we know to show a similar pattern potentially related to mistranslation, so we can't diagnose that either. Finally, ESS has 11 response-scale points contra EBM's 10 + the "Indifferent option".

Plus, y'know, different survey taken at a different time with a different methodology asking a different question.

But this entire exercise is about making do with what we have, so let's compare with the closest EBM equivalent: attitudes towards potential black and Asian inlaws for the subset of countries that were surveyed in both.

(Note that the positive-to-negative direction is reversed in ESS -- I didn't want to mess with reverse scoring. I also drew the intersecting set of ESS/EBM countries manually, so I might have missed some.)

```{r, fig.height=10, fig.width=8, message=FALSE, warning=FALSE}
eb15_black_asian <- eb15 %>% select(isocntry, qc14_2, qc14_3) %>% 
  mutate_at(vars(qc14_2, qc14_3), to_label) %>%
  rename(black = qc14_2, asian = qc14_3) %>%
  gather(inlaw, response, black, asian) %>%
  mutate(response = fct_relevel(factor(response), "10 Totally comfortable", after = 9))
ebs_ess_countries <- c("CZ", "HU", "PL", "NL", "BE", "FR", "DK")
ebs_comparison <- eb15_black_asian %>% filter(isocntry %in% ebs_ess_countries)
ess_comparison <- ess14_sub %>% filter(cntry %in% ebs_ess_countries)
ebs_graph <- ggplot(ebs_comparison, aes(x = response)) + 
  facet_wrap(~ isocntry, ncol = 1, labeller = labeller(isocntry = ebs_country_dict)) + 
  stat_count(mapping = aes(y = ..prop.., fill = inlaw, group = inlaw), 
             position = position_dodge(), width = 0.7) +
  labs(x = "", y = "") +
  geom_vline(xintercept = 10.5) +
  theme(axis.text.x = element_text(angle = 60, hjust = 1), legend.position = "bottom") +
  scale_y_continuous(labels = percent, limits = c(0, .75)) +
  ggtitle("Eurobarometer 2015", 
          subtitle = paste0("How comfortable you would feel if one of your children was in \n",
                 "a love relationship with a black or an Asian person?")) 

ess_graph <- ggplot(ess_comparison, aes(x = imdetmr)) +
  stat_count(mapping = aes(x = imdetmr, y = ..prop.., group = 1), width = 0.7) +
  scale_y_continuous(labels = percent, limits = c(0, .75)) +
  facet_wrap(~ cntry, ncol = 1, 
             labeller = labeller(cntry = function(x) countrycode(x, "iso2c", "country.name"))) +
  theme(axis.text.x = element_text(angle = 60, hjust = 1)) +
  geom_vline(xintercept = 11.5) +
  xlab("") + ylab("") +
  ggtitle("European Social Survey 2014", 
          subtitle = paste0("How much you would mind or not mind if an immigrant of ",
                 "a different race \nor ethnic group married a close relative of yours?"))
combined_ebs_ess <- cowplot::plot_grid(ess_graph, ebs_graph, axis = "tblr", align = "h")
combined_ebs_ess
```
```{r, eval = FALSE, include = FALSE}
ggsave("combined_ebs_ess.png", combined_ebs_ess, width = 8, height = 10)
```

Most countries have a similar response distribution for both surveys, but the Czech one in ESS is noticeably flatter. Let's look at the extreme ends of each survey's scale to see how much the two surveys differed:

```{r}
# Difference: compute percentages, rename levels ("most negative", "most positive"), merge, and make a lollipop chart of differences
ebm_minmax <- ebs_comparison %>% filter(inlaw == "black") %>% 
  group_by(isocntry, response) %>%
  summarize(response_count = n()) %>% 
  mutate(response_perc = response_count / sum(response_count)) %>%
  filter(response %in% c("1 Not at all comfortable", "10 Totally comfortable")) %>%
  ungroup() %>%
  transmute(cntry = isocntry, response_perc,
            response_kind = 
              fct_recode(response,
                         "Negative" = "1 Not at all comfortable",
                         "Positive" = "10 Totally comfortable") %>%
              as.character())
ess_minmax <- ess_comparison %>%  group_by(cntry, imdetmr) %>%
  summarize(response_count = n()) %>% 
  mutate(response_perc = response_count / sum(response_count)) %>%
  filter(imdetmr %in% c("Not mind at all", "Mind a lot")) %>%
  ungroup() %>%
  transmute(cntry, response_perc,
            response_kind = 
              fct_recode(imdetmr,
                         "Negative" = "Mind a lot",
                         "Positive" = "Not mind at all") %>%
              as.character())
ess_ebm_minmax_difference <- inner_join(ebm_minmax, ess_minmax, 
                                        by = c("cntry", "response_kind"), 
                                        suffix = c(".ebm", ".ess")) %>%
  mutate(ess_ebm_delta = response_perc.ess - response_perc.ebm)
```
```{r, fig.width = 10, fig.height = 4}
library(ggalt)
ggplot(data = ess_ebm_minmax_difference, aes(x = 100 * response_perc.ebm, 
                                             xend = 100 * response_perc.ess,
                                             y = cntry)) +
  geom_dumbbell(size_x = 1.8, size_xend = 2, 
                colour_x = "grey12", colour_xend = "dodgerblue2") + 
  facet_grid(. ~ response_kind, 
             labeller = labeller(response_kind = 
                                   c("Negative" = "Mind a lot (ESS) / Not comfortable at all (EBM)",
                                     "Positive" = "Don't mind at all (ESS) / Totally comfortable (EBM)"))) +
  # Unfortunately, since geom_dumbbell cannot create labels for us, we need to
  # draw them manually
  geom_text(data_frame(response_kind = c("Negative", "Negative", 
                                         "Positive", "Positive"), # for faceting
                       x_position = c(2.1, 19.3, 28.5, 38.2),
                       survey_name = c("ESS", "EBM", "ESS", "EBM")),
            inherit.aes = FALSE,
            mapping = aes(y = "PL", x = x_position, 
                          label = survey_name, color = survey_name),
            size = 2.5, fontface = "bold", nudge_y = 0.02) + 
  scale_color_manual(values = c("grey12", "dodgerblue2"), guide = "none") +
  scale_x_continuous(limits = c(0, 60), breaks = c(0, 15, 30, 45, 60)) +
  labs(x = "Percentage points", y = "",
       title = "Fight! European Social Survey 2014 vs. Eurobarometer 2015",
       subtitle = "(using question about potential black inlaws as proxy for the more general question in ESS)",
       caption = "Positive scores include the 4 most positive responses; in EBM, it also includes the 'Indifferent' answer.") +
  theme(plot.caption = element_text(color = "grey40", size = 8))
```

Bearing in mind the limitations of this comparison (different year, sample, and question mean that we cannot determine how much of the difference is due to measurement error and how much due to genuine attitude difference), allow me to draw your attention to a couple of things:

* The Czech measure that jumped _a lot_ is not the potentially mistranslated positive one; it's the negative one. In other words, **if the original map was based on ESS, it would look very similar.**
* It's worth noting, though, that the Czech sample is the only one in which the positive rating _decreased_ from ESS to EBM. (Well, technically, the Belgian sample saw a decrease as well, but one that was well within the margin of error.) This is weak evidence in support of the mistranslation hypothesis - _arguendo_, the mistranslation didn't make a difference when it should have, thus exaggerating the true cross-country differences.
    * French was one of the original languages of the Eurobarometer survey, so  - by definition - it could not have been mistranslated. Nonetheless, the 22-point rise in positive ratings from ESS to EBM is the largest of any changes - is it because that's the "natural" result of the reframing? Or perhaps because there's a larger difference in perception between "immigrant" and "black person" due to the number of French citizens from the overseas territories, so the compared question is not really the same? (Again, I find myself wishing that ESS asked the same questions as EBM.)
* But there are a lot of country samples that were invariant to the different question and the different response scale, lending support to the idea that **the measured construct - explicit admission of discomfort - is robust to different question phrasings in many European countries.**

Again, due to the difference between the two surveys, we have to take the differences with a grain of salt. I really wish ESS did ask the _same_ question.

## Are there other ways to (in)validate the EBM results?

My cousin Jan suggested the racial-differences Implicit Association Test, [which has some unfortunate issues.](http://nymag.com/scienceofus/2017/01/psychologys-racism-measuring-tool-isnt-up-to-the-job.html) Perhaps someone has done the resume methodology? I will be grateful for suggestions -- like I said, this is neither my circus nor my monkeys.

# Making new maps

My original goal was to remake and play around with the little map. This took me the longest and I went through a bunch of attempts, which will in time get their own article.

```{r}
## Setup
library(tmap)
library(tmaptools)

data(Europe)
convertEBMCountryCodes <- function(isocntry_column) {
  countrycode::countrycode(isocntry_column, "iso2c", "iso3c", 
                           custom_match = c("GB-NIR" = "GBR",
                                            "GB-GBN" = "GBR",
                                            "DE-E" = "DEU",
                                            "DE-W" = "DEU"))
}

# This is a different reshaping of the data
mappable_ebm <- eb15_comfort_wide %>%
  mutate(iso_a3 = convertEBMCountryCodes(isocntry))
ebm_country_counts <- eb15 %>% 
  mutate(iso_a3 = convertEBMCountryCodes(isocntry)) %>%
  group_by(iso_a3) %>%
  summarize(total_respondents = n())
mappable_prepped <- mappable_ebm %>% 
  mutate(response = fct_collapse(as.factor(response), 
                                 "Uncomfortable" = c("Not at all comfortable",
                                                     "2"),
                                 "Comfortable" = c("7", "8", "9", "Totally comfortable", "Indifferent"))) %>%
  mutate_at(vars(iso_a3, question_kind, target), as.factor) %>%
  group_by(iso_a3, question_kind, target, response) %>%
  summarize(response_count = n())
mappable_uncomfortable <- mappable_prepped %>%
  filter(response == "Uncomfortable") %>%
  merge(ebm_country_counts) %>%
  mutate(uncomfortable_percent = 100 * response_count / total_respondents) %>%
  select(iso_a3, question_kind, target, uncomfortable_percent)
mappable_comfortable <- mappable_prepped %>%
  filter(response == "Comfortable") %>%
  merge(ebm_country_counts) %>%
  mutate(comfortable_percent = 100 * response_count / total_respondents) %>%
  select(iso_a3, question_kind, target, comfortable_percent)
mappable_both <- full_join(mappable_comfortable, mappable_uncomfortable,
                           by = c("iso_a3", "question_kind", "target"))

# and reshape the data from ESS to be plottable alongside
mappable_ess <- ess14_sub %>%
  mutate(iso_a3 = countrycode::countrycode(cntry, "iso2c", "iso3c")) %>%
  mutate(imdetmr = fct_collapse(imdetmr, 
                                "Comfortable" = c("Not mind at all", "1", "2", "3"),
                                "Uncomfortable" = c("9", "Mind a lot"))) %>%
  group_by(iso_a3, imdetmr) %>% summarize(response_count = n()) %>%
  mutate(response_percentage = 100 * (response_count / sum(response_count))) %>%
  select(-response_count) %>%
  filter(imdetmr %in% c("Comfortable", "Uncomfortable")) %>%
  spread(imdetmr, response_percentage) %>% ungroup()
```
```{r}
## Workhorse function for map display
# expects a long dataset that is ready for a merge by `iso_a3` column
mapLoveComfort <- function(dataset, target_name, column_of_interest = "comfortable_percent",
                           column_title = "% totally comfortable\n(7-10 on scale or answered 'Indifferent')", 
                           palette = "RdYlGn") {
  append_data(Europe, dataset %>%
                filter(question_kind == "love relationsship of child",
                       target == target_name),
              "iso_a3", "iso_a3", ignore.na = TRUE, 
              ignore.duplicates = TRUE) %>%
    tm_shape() +
    tm_polygons(column_of_interest,
                title = column_title,
                palette = palette,
                auto.palette.mapping = FALSE,
                breaks = seq.int(0, 100, 10), 
                legend.format = function(x) paste0(x, "%")) +
    tm_layout(legend.outside = F, legend.position = c("RIGHT", "TOP"),
              legend.frame = T, 
              main.title = paste0("How comfortable would you feel if your child\nwas in ",
                                  "a love relationship with a ", R.utils::capitalize(target_name), "?"),
              main.title.size = 1)
}

targets <- paste0(c("black", "asian", "muslim", "jewish"), " person")
```

## Replicating the original map

This is, sort of, the original map. The main difference is that it uses yellow midpoints for the transition between red and green.

```{r, fig.height=10, fig.width=10, message=FALSE, warning=FALSE}
original_comfort_maps <- sapply(targets, function(x) mapLoveComfort(mappable_both, x, column_of_interest = "comfortable_percent"), simplify = FALSE)
do.call(tmap_arrange, original_comfort_maps)
```

This is an improvement on the original map's colorscheme, I think. There's still an implication that the green areas are "correct", though, which misleadingly suggests a normative criterion at >50% "comfortable" population, so...

### The original map with a better color scale

...we can make a single-color map, which also has the benefit of not being an asshole to people with red-green colorblindness.

```{r, fig.height = 10, fig.width = 10, message=FALSE, warning=FALSE}
original_comfort_maps <- sapply(targets, function(x) mapLoveComfort(mappable_both, x, column_of_interest = "comfortable_percent", palette = "-Reds"), simplify = FALSE)
do.call(tmap_arrange, original_comfort_maps)
```

## The European Social Survey "comfort" map

As a relevant comparison, we can, of course, take a look at the European Social Survey's most positive responses. For clarity's sake, I show them using both palettes.

```{r, fig.height = 6, fig.width = 10, message=FALSE, warning=FALSE}
ess_common_map <- append_data(Europe, mappable_ess,
                              "iso_a3", "iso_a3", ignore.na = TRUE, 
                              ignore.duplicates = TRUE) %>%
  tm_shape() +
  tm_layout(legend.outside = F, legend.position = c("RIGHT", "TOP"),
            legend.frame = T, 
            main.title = paste0("How comfortable would you feel if your child\nwas in ",
                                "a love relationship with an immigrant?"),
            main.title.size = 1)

ess_single_palette_map <- ess_common_map + tm_polygons("Comfortable",
              title = "% comfortable (7 to 'Not mind at all')",
              palette = "-Reds",
              auto.palette.mapping = FALSE,
              breaks = seq.int(0, 100, 10), 
              legend.format = function(x) paste0(x, "%"))
ess_original_palette_map <- ess_common_map + tm_polygons("Comfortable",
              title = "% comfortable (7 to 'Not mind at all')",
              palette = "RdYlGn",
              auto.palette.mapping = FALSE,
              breaks = seq.int(0, 100, 10), 
              legend.format = function(x) paste0(x, "%"))
tmap_arrange(ess_single_palette_map, ess_original_palette_map)
```

## The conservative "discomfort" map

As noted in the Methodological criticisms section, we have a bunch of reasons to think that the extreme negative end of the scale might be more informative than the positive end.

In this case, we've applied stricter inclusion criteria -- we only take the lowest two points rather than the lowest four.

```{r, fig.height = 10, fig.width = 10, message=FALSE, warning=FALSE}
original_discomfort_maps <- sapply(targets, function(x) mapLoveComfort(mappable_both, x, column_of_interest = "uncomfortable_percent", column_title = "% not at all comfortable\n(1 or 2 on scale)",  palette = "Reds"), simplify = FALSE)
do.call(tmap_arrange, original_discomfort_maps)
```

This makes my homeland look much less anti-Semitic, so hurray!

### Conservative "discomfort" measure with the original palette

Finally, just because we can, here's the same map with the original colors.

```{r, fig.height = 10, fig.width = 10, message=FALSE, warning=FALSE}
ebm_discomfort <- sapply(targets, function(x) mapLoveComfort(mappable_both, x, column_of_interest = "uncomfortable_percent", column_title = "% not at all comfortable\n(1 or 2 on scale)",  palette = "-RdYlGn"), simplify = FALSE)
do.call(tmap_arrange, ebm_discomfort)
```

# Motivations and limitations

Since my motivations for writing this are going to come up, I thought I'd supply a couple of answers. (This is, of course, to divert the pure of heart away from the `(((TrUtH)))`. You caught me.)

## Who's paying you?

Not George Soros, although back in ninth grade, [I really wanted him to](http://osf.cz/cs/jak-pomahame/stipendia/stredoskolska/). Over the years, I'm afraid I actually spent much more time on [Charles Koch's dime](https://theihs.org/). I was a Bakala Scholar, which would `~exPlain tHings~`, but also a Kellner Family Scholar, which... wouldn't. Hilariously enough, to the best of my knowledge, I have never received an EU grant.

Right now, the only institution that's paying me is Yale, but not for this.

If you're seeking to impute impure incentives to me, my list of past funding sources might prove confusing. It might almost be simpler to imagine that some people might hold beliefs that are independent of their funding.

To suggest an alternative to the monetary motive, I'd like to share three things about myself:

1. I like detailed open data and the level of Eurobarometer's coverage was delightful. Dismissing it out of hand seems wasteful.
2. I like butting into arguments that allow for empirical counter-claims.
3. I get stuck on interesting problems, especially when people are eager to argue about them.

## You're just protecting a study that confirms your beliefs!

You're just dismissing a study that questions your beliefs! See how this is not a useful criticism? Nobody is free of cognitive biases.

But we can take a look at the data, make our best arguments, and see if others can tear them down in a way that makes us discard them. I might not change your mind and you might not change mine (though I'll try!), but we don't just do it for ourselves; we do it so that the people who aren't as invested in the question can make a better decision. My epistemology needs not be flawless to benefit the community epistemics.

Honestly, what animates me the most in writing this post is the blithe dismissal of a survey that is carefully documented, its raw data publicly available, and its scope narrow enough that it can ask questions at a very granular level. It collected a large sample using in-person interviews, taking care to obtain geographical variability within each country. It actually _did_ try to guard against mistranslation by implementing a back-translation step. In short, the people begind Eurobarometer did a lot of things right.

Some have called #OpenScience mere methodological terrorism. As that line of argument goes, we're just mean bullies seeking to find a couple of flaws in order to discredit a whole research program. If we've become large, the critique goes, it is because we're standing on the corpses of well-meaning colleagues. My preferred response to that criticism is giving credit where it's due.

I care about the quality of science. I care about the quality of science _a lot_. In the past couple of years, this has often meant cutting away sloppy and/or underpowered research in order to reduce false positives. But it's important to remember that we're trying to reach a balance between Type I and Type II errors here, though: there's _an_ amount of error that's sufficient to dismiss a dataset, but it's not _any_ amount.

(In other words, it's that sweet, sweet [Arnold Foundation](https://www.wired.com/2017/01/john-arnold-waging-war-on-bad-science/) cash I'm after.)

Which doesn't actually mean that I'm letting Eurobarometer of the hook.

## There are not the limitations you are looking for

In a real sense, much of the criticism misses the forest for the trees. It’s nitpicking, whether or not the nits are there, because nit-sized problems we can deal with.

Does any of the critiques I addressed matter? Even if the survey translation achieved perfect fidelity across languages and the original questions were phrased so that they avoided biasing the respondents, it would still be a social-science instrument [which, along with everything, is fucked.](https://hardsci.wordpress.com/2016/08/11/everything-is-fucked-the-syllabus/) 

In one sense, this survey is fine. It attempted to observe an accurate estimate of the response distribution, sampling the population evenly; within its constraints, it succeeded. Interpretations and inferences are our problem, as is overcoming the inevitable conceptual challenges on the way. You wouldn’t blame the Apollo Command Module for its inability to land on the Moon. It isn’t a problem that we need the wider tapestry of research findings to embed this one in a context.

It's important to note that the survey doesn't go for any underlying constructs. Nobody claims that the questions predict particular behaviors. We do with this survey what we do with most surveys: describe it thoroughly, then wave it suggestively.

But then there’s a list of potential methodological issues that is longer than my arm. The thing with that list is that you can’t win. Doing in-person interviews? Some people will give answers to please the recruiter, not accurate self-report. Sending out an internet survey? You probably don’t have a representative sample. If your question doesn’t bias the respondent negatively, it’s probably just because it’s biasing them in the opposite direction. Should you count [the 11% of people reporting the belief that the country is or might be run by lizardmen](http://slatestarcodex.com/2013/04/12/noisy-poll-results-and-reptilian-muslim-climatologists-from-mars/)? Include them and your aggregate will get a lot of noise; omit them and you’re undersampling the conspiracy theorists.

(If you want to read a thorough takedown of "asking people things is useful," @Bertrand2001-tk provide a particularly vicious lashing, which nonetheless includes a measurement error perspective. Most of their examples and logic come from @Plous1993-ls.)

Each decision in designing a study has to find the lesser evils. I can’t think of a design that is beyond reproach. Everywhere you look, you have to deal with imperfection.

This is what makes social science hard.

## Quantifying the error

The original Eurobarometer Report's Technical Appendix provides a table that gives the percentage-point margin of error. I assume that this margin accounts for sampling error. With the data and the criticisms we have, we can enlarge the margin by each of the following:

*	semantic shifts between translations (5-10 percentage points, based on the analysis above) 
*	respondent error (3-5 percentage points, based on the analysis above)
*	people trolling (3-5 percentage points)

(I’m not including the social desirability bias because I think that’s a part of the construct under measurement. If you wish to use this study to estimate possible racist behavior, feel free to tack on that error, too.)

Am I pulling these out of my butt? You bet, but I think these are pretty pessimistic numbers, and it’s not like you don’t have them. For example, if you’re dismissing the study outright, then the error bars you assign must be unable to distinguish between 0% and 100% response. Since surveys will always contain errors, I’d say this framework is more useful than a blanket rejection – or a blanket acceptance – of a survey.  For one, the framework also forces you to be explicit about the things that you consider well done.

Let’s assume the worst-case scenario: each of the errors is independent of others and all are on the higher end of the range. If we sum them up, we end up with a margin of error of twenty-something percentage points. With these error bars, the cross-country comparison is much rougher -- but we can **still** say with some confidence that former Czechoslovakia has a larger explicit anti-Roma attitude problem than most of Europe.

## ...so what you're saying is that social science is useless?

~~I prefer the term _differently useful_.~~ No. I'm saying that if you're going to dismiss a social-science result, you should do that on the basis of problems it actually has, and only to the extent that those problems warrant. [The measurement crisis is upon us;](https://twitter.com/JkayFlake/status/892266510273699840) let's cut through the bullshit and talk about the problems that actually matter.

If that makes you _not at all comfortable_, good; me too.

# References