Research Methodology based on RSS Feeds and Social Bookmarking (Part 2, data collection)

As I described on part 1, I have been running a little research to verify whether RSS feeds generated by blog and news searches are of any worth. On the following parts I will be presenting the final data, some analysis and conclusions. Please refer to the original post in order for a more detailed explanation of the methodology and its motivations.

Scope

The scope of this research is very narrow and can only be applied for the selected keywords below. Therefore, the results should not be generalized. A more representative research would need a much larger sample of random keywords, possibly in different languages, using different search engines and more rigorous data collection. Nonetheless, the results shed some light on using RSS feeds as an additional way of information gathering, specially in the exploratory phases of a research.

Data collection

As described in the introductory post, I created a series of subscriptions with some keywords for a Google blog and a Google news search. The keywords, based on the title of my master’s thesis “Dissemination of Knowledge and Culture on Web 2.0: a Case for Brazilian Music”, were based on the first part of the title, and used for the different searches in different levels of complexity, as follows:

  • “dissemination knowledge culture web 2.0″
  • “dissemination knowledge web 2.0″
  • “dissemination culture web 2.0″
  • “knowledge culture web 2.0″
  • “knowledge web 2.0″
  • “culture web 2.

I observed each subscription on Google Reader daily, for a 30-day period, from 19 March until 17 April. For each set I considered the following variables:

  • Total Feeds: the daily number of feeds for a subscription
  • Overrides: how many repeated feeds were found within a subscription or across the whole set
  • Preselections: feeds that have been marked (or starred in Google Reader) for further reading
  • Primary leads: interesting leads that may serve as a reference derived from a preselection
  • Secondary leads: any interesting lead found within a primary lead
  • Bookmarks: effective sources found in the chain of leads that are bookmarked and tagged
  • New RSS feeds: new rss subscriptions generated as a result of good content found within a blog or site that contains other articles of interest, and may serve as an expanding sourc

Table 1: Consolidated data table for the Google Blog Search (all keywords combined)

---

  Total % (of unique items) % (of preselections) % (of all leads)
Total entries 642 --- --- ---
Unique entries 523 (81.5 %) 81.5 % --- ---
Preselections 95 18.16 % --- ---
Primary leads 25 4.78 % 26.32 % 58.14 %
Secondary leads 18 ---- * 15.13 % 41.86 %
Bookmarks 35 6.69 % 36.84 % 81.40 %
New RSS Feeds 4 0.76 % 4.21 % 9.30 %

Table 2: Consolidated data table for the Google News Search (all keywords combined)

---

  Total % (of unique items) % (of preselections) % (of all leads)
Total entries 584 --- --- ---
Unique entries 332. (56.85%) 81.5 % --- ---
Preselections 30 9.04 % --- ---
Primary leads 6 1.81 % 20.00 % 75.00 %
Secondary leads 2 ---- * 0.79 % 25.00 %
Bookmarks 7 2.11 % 23.33 % 87.5 %
New RSS Feeds 40 0.00 % 0.00 % 0.00 %