The research methodology for my Master thesis is largely based on rss feeds, with good leads leading to tagged bookmarks (with del.icio.us) for future reference. The best possible outcome is not only achieving a usable concrete reference, but also other rss feeds that may serve as an expanding source.
The methodology
In order to demonstrate and test tee effectiveness of the methodology I have been running for 9 days now a little experiment. My Thesis runs under the title “Dissemination of Knowledge and Culture on Web 2.0: a Case for Brazilian Music”. Using the first part of the title I have created RSS feeds based on Google Blog and News searches. I have tried to make Technorati and Digg subscriptions, but they have returned unreliable results*, specially with Google Reader**, my RSS reader of choice.
The complexity of the queries, without prepositions and conjunctions, is as follows:
- x.1 “dissemination knowledge culture web 2.0″
- x.2 “dissemination knowledge web 2.0″
- x.3 “dissemination culture web 2.0″
- x.4 “knowledge culture web 2.0″
- x.5 “knowledge web 2.0″
- x.6 “culture web 2.0″
Every search query used has become essentialy an rss feed subscription for a blog search (A) and a news search (B), ergo A.1, A.2, B.1, B.2, etc. With the subscriptions in place, I have used a few variables to control them, individually, on a daily basis:
- Total Feeds: the daily number of feeds for a subscription
- Overrides: how many repeated feeds were found within a subscription or across the whole set
- Pre-selections: feeds that have been marked (or starred in Google Reader) for further reading
- Primary leads: interesting*** leads that may serve as a reference derived from a pre-selection
- Secondary leads: any interesting lead found within a primary lead (e.g. a hyperlink inside of a lead****)
- Bookmarks: effective sources found in the chain of leads that are bookmarked and tagged
- New RSS feeds: new rss subscriptions generated as a result of good content found within a blog or site that contains other articles of interest, and may serve as an expanding source
Preliminary results
Within the 9-day period, between 19 March 2008 and 27 March 2008, the blog search (A) and the news search (B) have returned for the combinations of all their subscriptions, a total 182 and and 198 entries respectively, a rate of 20.22 articles per day for the blog search, and of 22 for the news search. When we look a bit closer at the numbers we see that the news search had a higher repeat rate, 78 articles, whereas the blog search had only 37 repeats. Therefore, the blogsearch had a higher rate of unique articles, 79.67% (145 entries), as opposed to that of the news search of 60.6% (120 entries).
The blog search had 27 pre-selections out 145 unique entries, amounting to 18.62% ratio. Out of my readings, 6 out these 18 have turned out to be primary leads (22.22%). In the news search, 18 out 120 unique entries became pre-selections, a rate of 15%, and 4 out of these pre-selections became primary leads, coincidentally a rate of also 22.22%.
To make it more readable:
(A) Blog search:
Total entries: 182
Repeated entries: 37 (20.33% of total entries)
Unique entries: 145 (79.67% of total entries)
Pre-selections: 27 (18.62% of unique entries)
Primary leads: 6 (22.22% of pre-selections)
Secondary leads: 4
Bookmarks: 7 (70% of all leads combined, 25.93% of pre-selections, 4.83% of unique entries)
New RSS Feeds: 2 (20% of all leads combined, 7.41% of pre-selections, 1.38% of all unique entries)
(B) News search:
Total entries: 198
Repeated entries: 78 (39.39% of total entries)
Unique entries: 120 (60.61% of total entries)
Pre-selections: 18 (15% of unique entries)
Primary leads: 4 (22.22% of pre-selections)
Secondary leads: 2
Bookmarks: 5 (83.33% of all leads combined, 27.78% of pre-selections, 4.17% of unique entries)
New RSS Feeds: 0
Early observations
In 9 days of observations it’s safe to point out that news searches have a much higher repeat rate than blog searches. This maybe due to the limited number of sources used by the Google News search and differences in their search algorythms. Without getting to the subjective quality of the content (a bit more to that later), both the blog and the news searches gave roughly similar results in relation to the number of bookmarks generated.
I will be trying to expand this observation a bit further and will be posting the results here.
—
jp
Notes:
* Techonorati and Digg feeds didn’t produce feeds if there were not any results in the first search.
** Maybe I am part of a vendor lock-in, but I have also used Google Reader as my RSS reader of choice. Although it lacks basic sorting functions and it’s statistics tools are limited to only the last 30 days, it has served me well for the purpose of this studies and it’s online, accessible from anywhere.
*** Interesting is a very broad term here. It denotes what I would personally find fit to use as reference for my thesis research. Naturally, other subjects would have preselected and bookmarked different entries. The purporse of this analysis is primarily to describe how RSS feeds can be used as exploratory research method, not the quality of the leads found in the results.
**** Secondary leads also include any leads found within the navigational path starting at the primary lead. For example, a preselection has turned up a good primary lead, say, a blog entry from Prof. John Doe. In his article, Prof. Doe mentions the work of Dr. Lorem Ipsum (a secondary lead). In this secondary lead, the article of Dr. Ipsum, there’s a link to another piece of information, and so it goes.