Every so often, we’re reminded just how embedded social media is in our lives. The 2012 Summer Games is one of those milestone events, as athletes, attendees and viewers turn to social media to post, share and discuss events as they unfold. These are the InformationSummer Games, where mounds of statistics, news, content and conversations are instantly accessible on social (and mobile) platforms. In fact, the global community posted more than 9.66 million mentions on Twitter during the Opening Ceremony. That’s a lot of data.
So what can we do with all this conversational data? Pull it into Power View, of course, for a new perspective on all the Summer Games chatter.
Using SQL Server 2012, we harnessed a live stream of Summer Games-themed Tweets, and then explored the datasets with Power View in the Excel 2013 Preview. In this first of two articles, we’ll show you how we captured unstructured Twitter data and produced some interesting initial insights and visualizations with Power View.
The Setup: How to Mine a Mountain of Twitter Data
Let’s take a look at how we captured and configured Twitter data for manipulation in Excel. We invite you to play along at home and share your unique configuration tricks and tips, insights and visualizations with the community.
Here’s what we utilized:
1. Windows7 x64bits OS (compatible)
2. SQL Server 2012 (with SQL Server Management Studio tool)
3. Ruby On Rails (Ruby v1.9.3, Rails v3.2.6)
4. WAMP Server2 & WAMP Server 2.2 - Supports PHP + SQL Server
5. Microsoft Office 2013 Preview
Step 1: Set up Databases
First things first: create databases in SQL Server 2012. We created tables using standard RDMS principles with the data normalized using standard techniques. Separate databases were created to improve data collection and analysis:one for collecting Tweets and another for performing analytics. The Tweet data database included the Tweet sender’s user-profile information. Scheduled tasks were set up to migrate the data from the source database to the Power View database in 15 minute intervals.
Step 2: Set up Twitter Data Feeds
We tapped into Twitter’s global stream of Tweet data using two sets of APIs:
1. Twitter Streaming API: Working off Ruby On Rails (ROR) script
To save large amounts of data in real time without setting up scheduled tasks, we implemented a Ruby on Rails application and one-time trigger script to pull in public tweets in real-time, enabling us to capture 1-10% of total Twitter data at any given time.
We first installed the required Gems for the ROR application to enable the Twitter Streaming API, and then set up the ROR application to work with SQL Server 2012.
2. Twitter Trends API: Working off PHP scripts
To determine the top global Twitter trends, we sent a request to the Twitter Trends API. For each trend’s keyword, we did a category check throughTwitter Textwise API, limiting Tweets to those with keywords belonging in the Sports category. We used Twitter Search API to get the tweets for trends that belonged under Sports.
Step 3: Map Tweet Keywords
We mapped all the Tweets to custom-defined keywords and trends matching the characters in the tweet text.
Step 4: Set upSentiment Analysis
Using the Twitter Sentiment140 Analysis API, we tagged each Tweet as Positive, Neutral or Negative.
Step 5: Set up Twitter Data Reports and Analysis:
We prepared the following charts to analyze the Tweet data collected leading up to and during the Summer Games.
- Top 10 countries by Tweet volume
- Top 10 sports mentioned in Tweets
- Top 10 trends within Sports category
- Tweets over Time
- Trends over Time
- Tweet Sentiment Analysis over Time
- Tweets Word-count stats, on hourly and daily basis
Twitter Summer Games Architecture Diagram
Success! We experienced a very rapid insertion of records into the source tables, capturing an average of 875 tweets per minute—that’s more than 16,385,884 tweets collected since the start of the Summer Games. Now to manipulate the data using Power View in the Excel 2013 Preview and gather insights from the activity stream.
Mapping the Global Conversation
For our first set of visualizations, we set our sights on the overall volume of conversation over time and breaking it down by country and sport discussed.
Tweets Over Time
The media coverage of the Summer Games kicked in at the start of the week and grew noisier leading up to the Opening Ceremonies. Not surprisingly, Twitter chatter roughly followed this trend. Since Tweets from the US and UK generated the most Tweets, we suspect that Summer Games-themed engagement dipped and rose based on peak daytime hours.
Overall Twitter volume. The X axis shows 1 day increments. The Y axis shows number of Tweets in thousands.
Top Countries Mentioning Summer Games
Where in the world are the chattiest Twitter users? By filtering by geo tag, we can learn where Summer Games-related Tweets originate.
The US and UK were the top countries generating Summer Games-themed Tweets from July 27th to August 8th.
Sports Discussed by Country
Power View also allows us to map discussions of specific sports, like gymnastics, by country.
Gymnastics was mentioned the most in US and UK Summer Game-themed Tweets from July 27th to August 8th
We’ve only scratched the surface on the possibilities—hopefully, we’ve inspired you to nab a Twitter stream and experiment on your own. Be sure to share your findings and links to visualizations in the comments below.
If you want experience Power View in the Excel 2013 Preview visit the Office Preview site and install the Office ProPlus 2013 Preview. To learn more about the enhanced self-service BI capabilities in Excel 2013 visit our detailed blog post.
In our next Summer Games-themed blog post, we’ll look at two specific events –the Men’s basketball gold medal match and the Women’s soccer gold medal match. Using Power View, we will compare and contrast the popularity and trends of these two major Summer Games events.