Monitoring & Analyzing Social Media

With over 1.5 billion conversations stored, can you afford not to listen?

Category: Research

Aug 29, 2008 0 Comments

How many results will there be? How big an account do I need?

These questions are understandably common given that our pricing is based on total number of results in your account. I hear them everyday as I work with prospective customers who are doing proposals for social media monitoring (yes, I do help with developing your business model around SM2- shoot me a note if you’d like to know more).

Given that even a moderately well-known brand can generate large numbers of results, it is important to have some idea of what you’re getting into before you commit to a plan. We have developed a tool to help with this.

First let me define ’search results’ within SM2. Search results are any mention of your keyword(s) in social media. They might be a blog post, a Tweet, a comment on a wall, a forum post, a wiki entry, etc. However, a search result is much more than that. It is also a set of data points that we collect that are associated with that result. These include all kinds of things including any demographic info, tags and categories, Alexa, Technorati and PageRank data, geo-location, etc. Up to 35 fields depending on what’s available in publically accessible places (we don’t violate EULAs).

This is important because we use that data to build our analysis of your results.

Back to the estimator tool. When you’re planning a social media monitoring project you can contact us and we’ll run your keyword phrases through the tool (it’s internal) to get you a ballpark estimate of how many results we have in our ever-expanding database aka the ’social media warehouse’. This gives you, and your client or team, an idea of the expense required to accurately measure the full response in social media- before you commit to a plan.

Reblog this post [with Zemanta]

Aug 21, 2008 0 Comments

SM2 is not a search engine

We had an interesting conversation here at Techrigy headquarters about our technology with a guest who came by to learn about what we do. And we ran into a common misconception about SM2: That it is some kind of search engine. While it has some search capabilities, search is not its primary function.

Search solves a problem: finding specific answers to queries.

SM2 solves different problems: Finding references to specific terms across social media conversations and content and understanding who is having those conversations, what they’re saying and why. Unlike search, which seeks to supply the best answer(s), SM2 offers up all the possible results and then provides tools for organizing and understanding all of the relevant results.

SM2 has two components. There is a collection system that goes out and collects new social media results on an ongoing basis and stores those results in our Social Media Warehouse. Each result, which might be a blog post, a Tweet, meta data from a YouTube video, etc., is parsed for various data within that result. This data includes public information about the person who created the result such as location, gender, age, etc, any tagging or categorization the user has provided, things like DNS records and IP addresses, URLs, Alexa and Technorati data, etc. Each result can have up to 30 or more data fields in SM2.

The warehouse grows every day. We recently bypassed the 500 million results level and will rapidly hit one billion results as social media participation explodes. Each of those billion results will have multiple data fields which SM2 users can access. Obviously understanding all of this is a challenge. That’s where the second component of SM2 comes in.

The SM2 application front end is a set of tools for discovering conversations and understanding them without having to manually go through them one by one. Set-up gives you the ability to tailor the things you wish to monitor by using keyword phrases, excludes (phrases you do not want to find), whitelists (sources you specifically wish to monitor), etc. Once you run your search, SM2 goes into the warehouse and brings back all the results it finds that match your set-up and analyzes them. After the initial search it continues to bring back results as they are added to the warehouse until you terminate the search.

The analysis tools SM2 provides are extensive, comparable in some ways to web metrics tools like Google Analytics- except that they are very focused on the humans behind the conversations rather than traffic sources and patterns. They look at sentiment, gender, age, location, popularity, trends and themes, categories, etc.

For people who are very used to the search engine model, these differences are a little challenging to grasp. The key lies in understanding how social media differs from the traditional or web 1.0 Internet. Once you grasp that social media is primarily about communication the difference becomes easier to understand.

Reblog this post [with Zemanta]

Aug 14, 2008 2 Comments

Twitter backfire

I’m not a big Twitter user nor do I follow a lot of people. However I’m noticing that those who are out there Twittering like crazy might want to rethink things a bit. Just as Googling a person became a requirement for HR people vetting resumes a few years ago, following a prospect on Twitter is surely the next reference source.

There’s a lot of Twittering going on- as I mentioned recently we collected over 45 million Tweets in the last few months and those were only Tweets with keyword phrases our SM2 users were searching on. I think that having a record of constant and often mindless Twittering could be something that could come back to haunt one in the future.

The important consideration are frequency (if he has time to do this all day, is he unemployed or wasting his employer’s time?) and subject matter (if he is doing this all day is he actually doing something productive or simply killing time?). Either negative would be a red flag for me if I was considering a candidate.

Before we had monitoring and search tools tools like our SM2 and Twitter’s own Summize, you could Tweet along all day without considering these real world consequences. Now, as micro-blogs start to have real world impact and attract analytical scrutiny, we need to consider the long term affects of telling the world that it’s Tuesday afternoon and you’re throwing them back in the neighborhood watering hole…

Social media is a public conversation that is being recorded. More on that in my next post…

Aug 11, 2008 0 Comments

541 million social media search results and counting

Our social media warehouse, the place where we store and index the collective search results brought back from SM2 searches, has reached a milestone: over one half a billion results collected. Not surprisingly blogs are the biggest source with over 400 million blog posts collected. Twitter, though added after our launch late last year, accounts for 43 million results, dwarfing those from any other microblog.

We save the results of user searches and store them in our ‘warehouse’ to create a historical reference in addition to our real time discovery results. Our analysis tools have also indexed all of these for the individual data fields we provide (as many as 35 per result) which include things like demographics, indications of sentiment and location, trends, authority, etc. Any SM2 user can tap into this data.

I suspect we’re going to hit the billion mark very quickly as we continually add Freemium users and as more professional power users enter the system. Anyone who thinks you can ignore the power of social media should take heed: These are just results from very specific brand and reputation-focused searches. As such they represent the tip of the iceberg in social media activity.

Jul 28, 2008 0 Comments

What social media sources do we index?

I’ve been asked several times for a list of what SM2 covers in its social media discovery process. The problem (and it’s not really a problem) is that we are constantly adding new sources. For example we recently added FriendFeed, Identi.ca, Pownce and Plurk.

Here’s a quick overview of what we index:

  • blogs
  • comments
  • wikis
  • forums
  • public content on social networks
  • meta-content on user-generated media like YouTube, Flickr, etc.
  • micro-blogs like Twitter and those mentioned above

Pretty much everything we can hook into in social media. We respect end-user license agreements (EULAs) unlike some of the aggregation sites that appear to monitor social media (I am not referring to any of our legit competitors).

We also provide analysis tools to help sort through the results including:

  • Sentiment- an indicator only but you can drill down to read and mark a result for accuracy
  • Gender
  • Age
  • location
  • trends and trend comparisons by date ranges, keywords and categories
  • author categories- how did a social media participant categorize their conversation?
  • themes- cool charts that show relationships between people, ideas and your brands and reputations
  • Authority Rank
  • results from Top 100 and Top 1000 blog

All of these things can be customized with rules, we offer extensive chart customization capabilities, we do exports with user-configurable fields, offer custom reporting via email and RSS and more.

I’ll be updating this list frequently as we are on a constant improvement path with SM2.

Jul 15, 2008 0 Comments

Reach in Social Media

Reach, influence, authority…what do they mean in social media? Unlike traditional media where reach may be the easiest of the three to quantify (add up subscriptions and pass-alongs), reach in social media is a very tricky thing to measure. We (SM2) rank authority based on existing measurements like Pagerank, Techorati and Alexa, inbound link counts, etc., but authority is different from reach and influence. A blog like Valleywag might have a huge reach but very little real influence (because it’s basically a satirical source not one people go to when making decisions). Influence is built on a strong reputation and track record, things that are built over time and difficult to measure through algorithms.

Reach in social media is a pretty interesting thing because it can change so quickly. Someone inside a company who decides to blow the whistle on some bad activity via Twitter can go from no reach to huge reach in hours- and just as quickly fade when the story becomes old news and is superceded by the next big thing. The rise of a meme, or virally transmitted idea, is a unique characteristic of social media, driven by the one-to-many nature of the communication stream and the network effect. And the speed that a meme can be distributed becomes a very real issue when it comes to brand and reputation management.

If a measurement service claims to offer reach statistics I’d be very wary for the reasons cited above. Unless they have real time access to a social media source’s traffic analytics, which is not the case much of the time, measuring and reporting reach is problematic.

So where does this leave us? You could say that Authority + Reach = Influence. Since reach is variable and tough to measure, influence-ranking is equally hard to do. Cracking the reach measurement challenge will mean a big change in our whole world, however because of privacy issues we may never see it.

Jul 11, 2008 0 Comments

Slideshare: Social Media Monitoring

Jul 11, 2008 0 Comments

Slideshare: SM2 Analysis Guide

Jul 11, 2008 0 Comments

Reputation Management with SM2, Sentiment notes and Reputation ROI

The super-hot political season we’re in is a great time talk about reputation management in social media as professional smear artists work overtime spreading rumors about candidates of all stripes. Obama’s campaign has aggressively recognized this with their Fight The Smears site where they immediately publish any new rumors and refute them decisively in real time. I don’t know if they are using social media monitoring to follow what people are saying but I think it’s likely. This kind of proactive reputation defense requires a combination of technology and human involvement.

For example, though we offer a sentiment indicator in our analysis tools it is just that: an indicator. It identifies words and phrases in context that it thinks could be indications of a negative or positive statement related to a keyword in that search. If it saw ‘Obama sucks‘ in a blog post it would likely flag that as negative. This is where the human beats the computer every time however. Using our drill down feature you can read the ‘negative’ statement in context. Suppose it actually says:

Obama sucks down a frosty at a local fast food joint while talking to a smiling group of fans’

The computer thinks that’s negative, any human knows it’s positive and would correct the sentiment in SM2 accordingly. Yes, it’s labor intensive but not as intensive as rebuilding a reputation damaged by an untruth or misconception.

Sentiment and Accuracy Claims

Semantic search offers up the Holy Grail of search, search that understands natural language queries such as:

‘which dealer in Rochester has a blue Civic in stock?’

The amount of things a search engine would have to understand to return an accurate answer to this question is mind-boggling. It would have to know that Civic and dealer in the same sentence probably means a car is involved, that ‘in stock’ is a sort query and that blue is an attribute. Then it has to know that we’re only interested in Rochester dealers.

This kind of thing is why we have to be very wary of claims of accuracy in sentiment analysis. Unless a service is having actual humans read every result you can only use sentiment as a guide to the general direction of the discussion.

Reputation also varies with demographics and you can see some of this in SM2. If SM2 shows a majority of males from 34-50 in the Midwest think Obama is a Muslim (he is not!), then your management has identified a particular demo in social media that requires your attention and some remedial action.

Reputation management is labor and time intensive. It requires real time discovery because distortions can travel extremely fast in social media, the ultimate rumor mill. Like a recent Doonesbury storyline depicting his weary daughter relentlessly scanning the web 24/7 for Obama smears, it requires a lot of attention.

ROI for Reputation Management?

How do you measure the cost of swing voters in a hotly contested state? Of a false product rumor that derails sales overnight? Of not being prepared when a new market sector latches onto your product for a use you never considered? The ROI is based on risks averted which is tough to quantify.

Jul 10, 2008 0 Comments

Observed vs. Acquired Research

My recent post on how monitoring changes the market research model in social media generated some really thoughtful responses from major players in social media marketing (read the comments). As I stated there, I believe market research is in transition because of our ability to observe and listen to market conversations without overtly influencing those conversations before they take place. You might define this as observed vs. acquired research.

Acquired research involves building a structured environment and inviting participation. This environment might be a survey, a focus group, a social network or even a true environment like a store design. AppleĀ  built multiple prototypes of full scale Apple Stores in a warehouse and then tested response to them:

“One of the best pieces of advice Mickey ever gave us was to go rent a warehouse and build a prototype of a store, and not, you know, just design it, go build 20 of them, then discover it didn’t work,” says Jobs. In other words, design it as you would a product. Apple Store Version 0.0 took shape in a warehouse near the Apple campus. “Ron and I had a store all designed,” says Jobs, when they were stopped by an insight: The computer was evolving from a simple productivity tool to a “hub” for video, photography, music, information, and so forth. The sale, then, was less about the machine than what you could do with it. But looking at their store, they winced. The hardware was laid out by product category - in other words, by how the company was organized internally, not by how a customer might actually want to buy things. “We were like, ‘Oh, God, we’re screwed!’” says Jobs.”

- from Kottke

The result was a radical rethinking of the entire retail experience. They engage customers with specific questions as soon you enter because they learned that customers have a fairly small set of reasons for coming. Specialized employees wear different colored clothing to indicate that they are greeters, ‘Geniuses’ or managers. Cash registers and conventional checkouts are eliminated, receipts are emailed, etc. And they have the highest revenue per square foot of any retail chain. Acquired research works if it is well-designed.

One lesson learned from this and mentioned by the commenters on the other post is that you can’t always predict behavior or results even in a controlled situation- people will do their own thing and that’s where a lot of the value lies. Which takes us to observed research, in this example (social media), observed in the wild.

Monitoring enables us to build an anonymous observation post where we can listen in on conversations, track trends, define sentiment and demographics and even learn how authoritative the speaker is. We can do this on a global basis across a wide variety of media that is extremely unstructured- some of these are ‘man on the street’ conversations. This is a form of research that simply did not exist until recently and most of us are only at the early stages of understanding how to use it.