At Mashable's panel "Measuring social media – lets get serious," I had the opportunity to question Kevin Weil, Twitter's product lead for revenue, on an issue concerning Twitter's data that has been bothering me for a long time. Christina Warren, the Mashable writer who moderated the panel, went on to write a post about the questions I put to Kevin, here.
First, a bit of background. There are two ways to get data from Twitter: via the application program interface or API, or via the "Firehose" of all tweets (or some percentage thereof). The API is free and easy to access for anyone with programming knowledge, but is capped, so you can only get a portion of what is actually there. The Firehose is supposed to be all tweets for a given time period, and is quite expensive to access, whether via Twitter directly or through Gnip, an aggregator that helps other providers get access. Twitter has not disclosed which tools have full Firehose access.
The major problem that has arisen from widespread use of the API is the crop of slick-interface social monitoring and analytics tools that use the API instead of the Firehose and yet represent themselves as though they are appropriate analysis tools for significant amounts of conversation. While okay for small businesses that don't have much volume, for brands with medium to large amounts of conversation, the data provided by the API is incomplete because the API will only give away so much data for free until the data cap is hit and it stops serving information.
This is a major problem. Bad data = bad research = bad decisions = bad results and damaged relationships with stakeholders. In turn, this results in damage to your ability to use social media to grow and prosper, whether you are at a brand or an agency. The amount of money being spent on the basis of bad research done with API data is certainly in the high tens of millions of dollars and possibly many times that amount. There is simply no way of knowing how many bad decisions this reliance on incomplete data might have caused over the past few years.
These tools routinely present themselves as real competitors to monitoring or analysis tools that have access to all the Twitter data that would really be necessary to offer the pretty graphs that they do. Unfortunately, it's impossible to know which of them is telling the truth. Twitter has not taken action to either shut these tools down, change the nature of the data they provide via the API, or disclose those providers who have full Firehose access, and publish guidelines as to when the others might be appropriate.
I spoke with Kevin briefly after the session and I believe that Twitter is going to take appropriate action. While this may (and probably should) be painful for a large number of tools that currently depend on API data, ultimately the industry as a whole will benefit greatly from increased consistency and measurement accuracy.