Tuesday, December 22, 2009

Draft Post: Facebook Diversity Debate

By: Harry Waisbren

Note: this is a draft of a post to be published on the Qworky blog. Feedback would be much appreciated!

Facebook's Data Team has released a study entitled How Diverse is Facebook? that has caused a firestorm of analysis and criticism---coalescing on Twitter through the #FBDiversity tag.

The purpose of this study is described by Facebook as part of their effort to be as open and connected as possible while also working to understand how different populations of users join and use the social network. However, the original question that has begot such vehement criticism, first poised by @digitalsista, @kanter , and @womenwhotech, relates to the study's methodology and their motivations behind it. In fact, the question (first asked by @myrnatheminx) of whether the conclusions "seem self-fulfilling prophecy ish" is being increasingly assessed as the findings are further digested.

The methodology aspect of this study is quite tricky, as Facebook does not request information on race as they do for gender. Cheri Mullins analyzed this aspect of the study in detail in her post Facebook "Diversity" Study Fact or Fiction:

The primary method of identifying users as a given ethnicity or race for the study is by a user's reported last name. This methodology is based on the correlation of last names to self-reported ethnicity or race in the US Census statistics. Short of actually asking users to self-report their data, this approach seems reasonable. (I'll say a bit more about why I favor self-reporting later.)

However, what Facebook refers to as a mixture-modeling technique seems a bit sketchy. By their definition, they "back solve" for name based on ethnicity. This is recursive: one has to know a variable (in this case, race or ethnicity) in order to use it as a given. Certainly, using this back-solving method to cross-check data is valid. If one assumes that the makeup of Facebook does, indeed, parallel the (self-reported) ethnic and racial makeup reflected in the Census statistics, then determining whether study data correlates with the Census data is a valid data point to verify the categorization assumptions of the study. However, by both reporting correlation with the Census statistics as a result and using the same statistics to "refine" the statistics, the Facebook Data team has skewed the results to be highly self-referential.

The "highly self-referential" nature of their results seems to push conclusions to the "seems self-fulfilling prophecy ish" camp. If they are using an unabashedly flawed data set, then what is the purpose of this study? The purpose is particularly questionable given the broad proclamations that Facebook draws from their flawed data set, such as:

  • They have always been diverse yet diversity has increased significantly over the past year to the point where users nearly mirror the diversity of the overall U.S. population
  • Hispanics are 80% as likely to be on Facebook as White users
  • Black users are as likely to be on as Whites
  • Asian/Pacific Islanders are much more likely to be on Facebook than White users.

The motivation behind asserting such broad conclusions is further questionable in light of danah boyd's (@zephoria) speech during the Personal Democracy Forum discussing The Not-So-Hidden Politics of Class Online. Her research has achieved vastly different results than those from the internal study in regards to Facebook users vs those on MySpace, and her charges are damning to the supposedly diverse and inclusive nature of Facebook:

It wasn't just anyone who left MySpace to go to Facebook. In fact, if we want to get to the crux of what unfolded, we might as well face an uncomfortable reality...What happened was modern day "white flight." Whites were more likely to leave or choose Facebook. The educated were more likely to leave or choose Facebook. Those from wealthier backgrounds were more likely to leave or choose Facebook. Those from the suburbs were more likely to leave or choose Facebook. Those who deserted MySpace did so by "choice" but their decision to do so was wrapped up in their connections to others, in their belief that a more peaceful, quiet, less-public space would be more idyllic.

This dynamic was furthered by the press, an institution that stems from privilege and tends to reflect the lives of a more privileged class of people. They narrated MySpace as the dangerous underbelly of the Internet while Facebook was the utopian savior. And here we get back to Kat's point: MySpace has become the "ghetto" of the digital landscape. The people there are more likely to be brown or black and to have a set of values that terrifies white society. And many of us have habitually crossed the street to avoid what is seen as the riff-raff.

The fact that digital migration is revealing the same social patterns as urban white flight should send warning signals to everyone out there. And if we think back to the language used by teens who use Facebook when talking about MySpace, we should be truly alarmed.

boyd's speech paints a very different picture of Facebook than their data team's study suggests. Rather than an environment proportionally diverse to the U.S. population, it is one growing as an "idyllic community" free from the "riff raff" in MySpace, and such predispositions are being internalized by our country's youth in an alarming fashion.

In this context
, it is no wonder that Shireen Mitchell (aka @digitalsista) says that she thinks that the Facebook Data Team's study has everything to do with boyd's Myspace to Facebook "white flight" theory...

Yet despite the seemingly cause and effect nature of boyd's research and the Facebook Data Team's study asserting opposing conclusions merely months later, is it too much to charge that this was enough of an impetus for Facebook to issue a study with a self-fulling prophecy in mind? Moreover, even if they did, how much of an issue is it?

Despite her views on the flawed nature of their methodology, Mullins argues that there is a positive outlook to take from this study. She applauds their efforts to collect this data, and notes that "the 2010 Census data and adoption statistics that are current and more accurately reflect current Internet access capabilities and trends will provide better data against which to verify future studies."

However, if this is to be but a first step in Facebook's efforts to assert and/or achieve a diversified and inviting community, there is much more work to be done. As Mullins explains:

Ultimately, though, I wonder why Facebook does not simply add an optional (and optionally public) profile statistic for Facebook users to self-report ethnicity and race. If the options are identical to the 2010 Census options -- and identically described, one would expect to obtain results that are directly comparable to the Census statistics and therefore a better indicator of whether or not Facebook is representative of the population at large. Furthermore, Facebook could could provide users with an option to allow this statistic to be used only in cumulative reporting or also in reporting in conjunction with other demographics, which would facilitate a significant depth of data for analysis not only by Facebook but but other social networking researchers. I believe Facebook has work to do here in defining exactly what the purpose of their study is and how best to collect their data.

Instituting a program to self-report race seems like a logical next step for Facebook to take if they truly want to be as open and connected as possible---quite the contrast from acting as a safe haven for those engaging in white flight to escape the riff raff from other more diversified and inclusive social networks. No matter what though, they have more work to do, and despite any pitfalls at least this study acknowledges that they recognize the importance of diversity and inclusiveness within their network.

Hopefully the #FBDiversity effort alerts Facebook to the groundswell of desire for them to achieve their stated goals. Furthermore, it should act as a message to them that posturing will not suffice in a world rapidly becoming increasingly connected and diversified, and that if they sincerely wish to be part of the solution rather than part of the problem, they will have strong allies in all of us!


jon said...

It's a very good draft, Harry. Excellent approach using danah's analysis as the framing. One thing I'd be careful with though is assuming motivation for the Facebook research team. It's shoddy research, but it probably is just due to lack of experience and trying to do something with the data they had available without thinking of the biases they were introducing.

Otherwise my main concern with your post is the length -- over 1300 words, and we've generally been trying to keep posts on the Qworky blog to 500 words or less. So you might want to think about subsetting it ... you're probably trying to do too much in the one post. What are the three most important points you're trying to make?


Harry Waisbren said...

Thanks for the feedback jon!

I wasn't sure about my phrasing for the motivation aspect, and although I tried to provide counter points the post probably does come off as overly accusatory. It's a fine line though when discussing the self-fulfilling prophecy aspect, and I do think it's an important question to assess in light of danah boyd's arguments.

I can cut some of the word count if only by summarizing rather than quoting. I recognize that I'm trying to do a lot in this post, but it is a complicated issue worth the analysis---hoping I don't have to take out too much of the meat of the piece, and that minimizing quotes and condensing/cutting out some of the questions on motivations will do the trick.

I'd say the three most important points I'm trying to make are as follows:

1. Describing the genesis and potential future of the #FBdiversity tag

2. Assessing the FB study's methodology and conclusions

3. Analyzing the conclusions in the context of boyd's speech, and asking whether it is self-fulfilling prophecy ish.

Thanks again for the feedback!


jon said...

It's all good stuff so I can see why you want to discuss it all. In terms of your three points, I'm not sure how much attention the #fbdiversity tag merits. Several of us were talking about the issue; we used a hashtag to make it easier to follow; great! But in a routine kind of a way -- the same thing happened with #hpfail and #montazeri and etc. etc. So I don't see this as at a peer with your other two.

For assessing the terminology and conclusions, this post can be very valuable by summarizing and links out to the key points people have made. Rather than linking to Qworky's tweets, it would be better to link out to what Shireen, Beth, and Tracy tweeted and try to use their own words. And consider being more selective about what you're excerpting from Cheri's post. Do most Qworky readers care about the mixture-modeling technique? Probably not; and the ones that do will want to read her full post.

Your point about danah is also good contribution (although the self-fulfilling prophecy has beend scussed to some extent). you probably want to tighten this section up some to be clearer where you see the relationships and what the implications are.

Hope this helps!