Valuing data is a tough task, a reply to Jaron Lanier’s op-ed in the NYT

Jaron Lanier was featured in a recent New York Times op-ed explaining why people should get paid for their data. Under this scheme, he estimates the total value of data for a four person household could fetch around $20,000.

Let’s do the math on that.

Data from eMarketer finds that users spend about an hour and fifteen minutes per day on social media for a total of 456.25 hours per year. Thus, by Lanier’s estimates, the income from data would be about $10.95 per hour. That’s not too bad!

By any measure, however, the estimate is high. Since I have written extensively on this subject (see this, this, and this), I thought it might be helpful to explain the four general methods used to value an intangibles like data. They include income methods, market rates, cost methods, and finally, shadow prices.

Most data valuations are accomplished through income derivations, often by simply dividing the total market capitalization or revenue of a firm by the total number of users. For those in finance, this method seems most logical since it is akin to an estimate of future cash flows. In its 2018 annual report, Facebook calculated that the average revenue per user was around $112 in the United States and Canada. Antonio Garcia-Martinez recently used this data point in Wired magazine to place an upper limit to the digital dividend idea from California Governor Gavin Newsom.

Similarly, when Microsoft bought LinkedIn, reports suggested that they were buying monthly active users at a rate of $260. A. Douglas Melamed argued in a recent Senate hearing that the upper-bound value on data should at least be cognizant of the acquisition cost for advertisements—putting the total user value at around $16.

Income-based valuations, however, are crude estimates because they are not capturing a user’s ability to marginally earn revenue for the platform. The way to understand this problem is by first recognizing how the three classes of data interact online. Volunteered data is data that is both innate to an individual’s profile, such as age and gender, and information they share, such as pictures, videos, news articles, and commentary. Observed data comes as a result of user interactions with the volunteered data; it is this class of data that platforms tend to collect in data centers.

Last, inferred data is the information that comes from analysis of the first two classes, which explains how groups of individuals are interacting with different sets of digital objects.
Inferential data is the key, as it both drives advertising decisions, and it helps determine what content is presented to users. Thus, the value of a user’s data would combine

The value of that user’s data to increase all their friend’s demand for content; and
The value of that user’s data to contribute to increases in advertising demand.

I’ve seen work suggesting that Shapley values might be used to figure out these numbers. Needless to say, income based valuations are difficult.
Market prices are another method of valuing data, and they tend to place the lowest premium on data. For example:

Vice recently reported that DMVs across the US have been selling records for as little as one cent each.
Wired editor Gregory Barber sold his location data, Apple Health data, and Facebook data, and all he got was a paltry 0.3 cents.
After a breach at Facebook, Facebook logins were selling on the dark web for $2.60.
Advertisers typically pay a half a cent for profiles.
In contrast, Dutch student Shawn Buckles auctioned all his personal data and earned a grand total of €350, which is around $385 in 2014.
The College Board sells lists of high-school students’ names, ethnicities, parents’ education and approximate PSAT or SAT scores, at 47 cents a name.

As with any market, it is important to pay attention to the clearing price because not all markets clear. The bankruptcy proceedings for Caesars Entertainment, a subsidiary of the larger casino company, offers a unique example of this problem. As the assets were being priced in the selloff, the Total Rewards customer loyalty program got valued at nearly $1 billion, making it “the most valuable asset in the bitter bankruptcy feud at Caesars Entertainment Corp.” But the ombudsman’s report understood that it would be a tough sell because of the difficulties in incorporating it into another company’s loyalty program. Although it was Caesar’s’ most pricey asset, its value to an outside party was an open question.

As I detailed earlier this year, data is often valued within a relationship, but practically valueless outside of it. There is a term of art for this phenomenon, as economist Benjamin Klein explained: “Specific assets are assets that have a significantly higher value within a particular transacting relationship than outside the relationship.” Asset specificity goes a long way to explain why there isn’t a thick market for data as Lanier would like.

Third, data might be valued using cost-based methods. But, Chloe Mawer cautioned against using cost-based routes: “This method is highly imprecise for data, because data is often created as an intermediate product of other business processes.” In practice, I assume cost-based methods would probably look like Shapley values anyway.

Lastly, data can be valued through shadow prices. For those items that are rarely exchanged in a market, prices are often difficult to calculate and so other methods are used to appraise what is known as the shadow price. For example, a lake’s value might be determined by the total amount of time in lost wages and money spent by recreational users to get there. For each person, there is a shadow price for that lake.

Similarly, the value of social media can be calculated by tallying all of the forgone wages in using the site. A conservative estimate from a couple years back suggests that users spend about 20 hours a month on Facebook. Since the current average wage is about $28, this calculation indicates that people roughly value the site by about $6700 over the entire year. A study using data from 2016 using similar methods found that American adults consumed 437 billion hours of content on ad-supported media, worth at least $7.1 trillion in terms of foregone wages.

Shadow prices can also be calculated through surveys, which is where they get controversial. Depending on how the question is worded, users willingness to pay for privacy can be wildly variable. Trade association NetChoice worked with Zogby Analytics to find that only 16 percent of people are willing to pay for online platform service. Strahilevitz and Kugler found that 65 percent of email users, even though they knew their email service scans emails to serve ads, wouldn’t pay for alternative. As one seminal study noted, “most subjects happily accepted to sell their personal information even for just 25 cents.” Using differentiated smartphone apps, economists were able to estimate that consumers were willing to pay a one-time fee of $2.28 to conceal their browser history, $4.05 to conceal their list of contacts, $1.19 to conceal their location, $1.75 to conceal their phone’s identification number, and $3.58 to conceal the contents of their text messages. The average consumer was also willing to pay $2.12 to eliminate advertising.

All of this is to say that there is no one single way to estimate the value of data.

As for the Lanier piece, here are some other things to consider:

A market for data already exists. It just doesn’t include a set of participants that Jaron wants to include, which are platform users.
Will users want to be data entrepreneurs, looking for the best value for their data? Probably not. At best, they will hire an intermediary to do this, which is basically the job of the platforms already.
An underlying assumption is that the value of data is greater than the value advertisers are willing to pay for a slice of your attention. I’m not sure I agree with that.
Finally, how exactly do you write these kinds of laws?

First published Sep 23, 2019