Location Analysis of Slideshare Viewers Using DSS

I've been using Slideshare quite a bit lately in order to share presentations. While it offers interesting statistics tools, the geographic information it surfaces is a bit light. You get a view count by country, but you can't really dig any deeper:

I was wondering how to get access to more detailed information about my audience. As it turns out, Slideshare allows you to download a CSV list of the last 1000 viewers of your slides, along with a bit of additional information (name of the presentation and, more importantly, city and country the viewer is from):

Once I had access to this, the next item to tackle was to find a way to perform a more detailed analysis. In order to do this, I decided to use Data Science Studio (DSS), a product by French startup Dataiku. They offer a free community edition that you can install locally on your Mac. It works for up to 100,000 rows of data, which was plenty enough for my needs.

After performing a local install, I realised that to do the kind of work I was trying to do, I also needed to install an additional plugin: the Geoadmin plugin. I had to restart DSS after installing it - I could have saved time by installing it before running DSS for the first time. Last but not least, I also needed an API key for a mapping service. You can easily get a free one from MapQuest, which gives you up to 15,000 location lookups - again, plenty enough for my needs.

Once my setup was ready, I launched DSS and headed to http://localhost:11200/ (DSS is a web application, which means lives in your browser). Once there, the first thind I did was to create a new project and import my dataset (Dataiku has several tutorials to help you get started with DSS):

A nice touch about DSS is that it always keeps your original data as is, without modifying it. This means you can always go back in time without fear of altering your data.

I clicked on the "Analyze" button in order to create my first analysis. The analysis tab is the place where you're going to do most of the grunt work of cleaning and improving your data in order to be able to use it. In my case, there were a couple things that needed to be done before I could create the map I was looking for:

  1. First, I had to clean the data in order to remove rows that were lacking geographic data. I did this by clicking on the "Country" column, and using a transformation that removed all the rows with useless data (such as "N/A").
  2. Once I had done this, the next step was to get more accurate data to feed MapQuest. I achieved this by concatenating the "Country" and "City" columns into a new "Address" column.
  3. Once I had the addresses, all I needed to do was to run them through MapQuest using DSS' Geocode function. This gave me the latitude and longitude for each "Address" entry. I proceeded to mark them as their respective data types.
  4. Once I had this, the last thing that was needed was to add a "Create GeoPoint" step that took the latitude and longitude in input and created a... well, you guessed it, a Geopoint based on the coordinates.

Here is the resulting recipe, as well as a sample of its result:

Once I was done processing and enriching my data, I could at last get to the heart fo the subject: creating a map of my viewers! In order to do this, I headed over to the "Charts" tab and followed these steps:

  1. Set the graph type to "World Map".
  2. Put "Geopoint" in the "Break down by..." section and set the granularity level to "8 (city)".
  3. Put "Count of records" in both the "Color" and "Size" of the "Show..." section.
  4. Waited for the map to update with my data.

Here is the final result, centered on the US for better viewability:

Great! I now have access to a visual representation of my audience, city by city, all over the world!

I think I only used the tip of the iceberg when it comes to DSS' feature set, but I was able to quickly create a beautiful map that gave me way more information than what Slideshare has to offer: exactly what I was looking for.