Research Methodology based on RSS Feeds and Social Bookmarking (Part 2, data collection)

As I described on part 1, I have been running a little research to verify whether RSS feeds generated by blog and news searches are of any worth. On the following parts I will be presenting the final data, some analysis and conclusions. Please refer to the original post in order for a more detailed explanation of the methodology and its motivations.

Scope

The scope of this research is very narrow and can only be applied for the selected keywords below. Therefore, the results should not be generalized. A more representative research would need a much larger sample of random keywords, possibly in different languages, using different search engines and more rigorous data collection. Nonetheless, the results shed some light on using RSS feeds as an additional way of information gathering, specially in the exploratory phases of a research.

Data collection

As described in the introductory post, I created a series of subscriptions with some keywords for a Google blog and a Google news search. The keywords, based on the title of my master’s thesis “Dissemination of Knowledge and Culture on Web 2.0: a Case for Brazilian Music”, were based on the first part of the title, and used for the different searches in different levels of complexity, as follows:

  • “dissemination knowledge culture web 2.0″
  • “dissemination knowledge web 2.0″
  • “dissemination culture web 2.0″
  • “knowledge culture web 2.0″
  • “knowledge web 2.0″
  • “culture web 2.

I observed each subscription on Google Reader daily, for a 30-day period, from 19 March until 17 April. For each set I considered the following variables:

  • Total Feeds: the daily number of feeds for a subscription
  • Overrides: how many repeated feeds were found within a subscription or across the whole set
  • Preselections: feeds that have been marked (or starred in Google Reader) for further reading
  • Primary leads: interesting leads that may serve as a reference derived from a preselection
  • Secondary leads: any interesting lead found within a primary lead
  • Bookmarks: effective sources found in the chain of leads that are bookmarked and tagged
  • New RSS feeds: new rss subscriptions generated as a result of good content found within a blog or site that contains other articles of interest, and may serve as an expanding sourc

Table 1: Consolidated data table for the Google Blog Search (all keywords combined)

---

  Total % (of unique items) % (of preselections) % (of all leads)
Total entries 642 --- --- ---
Unique entries 523 (81.5 %) 81.5 % --- ---
Preselections 95 18.16 % --- ---
Primary leads 25 4.78 % 26.32 % 58.14 %
Secondary leads 18 ---- * 15.13 % 41.86 %
Bookmarks 35 6.69 % 36.84 % 81.40 %
New RSS Feeds 4 0.76 % 4.21 % 9.30 %

Table 2: Consolidated data table for the Google News Search (all keywords combined)

---

  Total % (of unique items) % (of preselections) % (of all leads)
Total entries 584 --- --- ---
Unique entries 332. (56.85%) 81.5 % --- ---
Preselections 30 9.04 % --- ---
Primary leads 6 1.81 % 20.00 % 75.00 %
Secondary leads 2 ---- * 0.79 % 25.00 %
Bookmarks 7 2.11 % 23.33 % 87.5 %
New RSS Feeds 40 0.00 % 0.00 % 0.00 %

dissemination of knowledge
rss
social bookmarking
web 2.0

Comments (0)

Permalink

Stock and Flow

I recently stumbled upon the concept of stock and flow of information. In simple words flow is everything that is occurring within a time interval, and stock everything that can be observed statically. For example, RSS feeds and blog entries such as this one are flow, archived blog entries and structured content to be found later such as a Wiki entry are stock. In less prosaic style, flows are the rivers, stocks are the reservoirs.

The terms, which are widely used in economics, business and accounting, are distinguished by their relation to time. Stock variables are measured on a given time, whereas flow are measured within an time interval. The concept was originally devised by system dynamics scientist Jay Forrester, who originally referred to the terms as Level (stock) and Rate (flow). [Wikipedia]

I haven’t been able yet to find any academic studies of stock and flow related to online communication, they are mostly associated to stock markets and system dynamics. CommonCraft has 3-part easy introduction to the topic in relation to the circulation and archival of information on the internet. Although the concept here seems to have been borrowed from other disciplines, it’s relevance is not all too inappropriate.

dissemination of knowledge
rss

Comments (0)

Permalink

Overmundo

I came across an article from Paula Martini in iCommons, where she gives interesting insights about Overmundo, a web 2.0 platform that attempts to closes this gap in cultural coverage and diffusion within Brazil. If you live in Brazil and outside of what is there called the “eixo Rio-São Paulo” (the Rio-São Paulo axis), you mainly don’t get any cultural coverage in the media, and when so it is from the view of the center, not from the “periphery”. It is fully made by user-generated contributions and completely automated, drawing inspirations from many Web 2.0 platforms such as Digg and Slashdot.

In the video below Dr. Ronaldo Lemos, from the Center for Technology and Society of the Getúlio Vargas Foundation (FGV) in Rio, one of the founders of Overmundo and chairman of iCommons, receives the Nica Prize for best community project in the Ars Electronica Festival in Austria. I know this is old news, but the video contains a very good explanation of what Overmundo is. Since I will use it as one of the case studies for my thesis, why not showing it again.

Via: Paula Martini at iCommons

Uncategorized

Comments (0)

Permalink

Wales vs. Keen

Following up on the documentary “The Truth About Wikipedia” I received this RSS entry from FORA.TV showing a debate between Jimmy Wales, co-founder of Wikipedia, and Andrew Keen, the writer of the The Cult of the Amateur: Why the Internet is Killing Our Culture. If you’ve got around an hour to spare, I highly recommend it. The debate about amateurism vs authoritativeness showns no sign of slowing down. Sorry, I couldn’t help to use Wikipedia as the source for information about his book. :-)

web 2.0

Comments (0)

Permalink

One seed does not suffice

In the last few days I have gathered some time to read Six Degrees: The Science of a Connected Age from Duncan J. Watts, professor of sociology at the University of Columbia and The Tipping Point: How Little Things Can Make a Big Difference from Malcolm Gladwell, a New Yorker columnist and writer. In an earlier post, based on a article found a Fast Company, I have opposed the two, saying they were representing two different currents of thoughts in relation to the spread of ideas in the so-called web 2.0. I have to rectify myself: they are actually talking about very similar topics, albeit with varying degrees of optimism (and realism), different empirical foundations, and approaches.

Dr. Watts, the academic, the realist, is one of the leading figures of the science of networks, a relatively new discipline that draws its theoretical frameworks from physics, mathematics, biology, sociology and other sciences. His main object of study are small-world networks, a project that spans since his time in the Department of Theoretical and Applied Mechanics at Cornell University, done in collaboration with his adviser Prof. Steve H. Strogratz. Revisiting the “Small World Experiment“, the seminal work of social psychologist Dr. Stanley Milgram which gave legs to the myth of the “Six Degrees of Separation“, Strogratz and Watts developed, with the help of modern computing muscle, a mathematical model to study the phenomenon, in which any given person can be connected to anyone in the world through a small chain of just five connections.

In his book Watts gives a very thorough explanation of the small-world phenomenon and other types of network, making a very solid case for the science of networks and its possible applications. The reading is insightful and the requirement of mathematical knowledge in order to understand it, close to none. As a man of science, Watts keeps a very critical view at the phenomenon and does not fully believe that trends can start by design: “… a series of small random events — events that would go unnoticed under normal conditions — can, at the critical point, push the system into a universally organized state, giving the appearance of having been directed there strategically.” In another passage, he maintains that “… a successful cascade [information cascade, in the language of economics] has far less to with the actual characteristics of the innovation or even the innovator, than we tend to think”, describing that a seed alone is not enough, that trees spread their seeds hoping they will land in the right place.

Gladwell is a trained science journalist and has been able to amass an interesting body of ad-hoc knowledge for his work. He’s although a bit more optimistic than Watts when it comes to the phenomenon of social contagion, or word-of-mouth epidemics. His theory is manifested in three concepts: The Law of the Few, The Stickiness Factor and The Power of Context. The first deals with “gatekeepers”, people that are able to start a social epidemic. Gladwell names them Connectors, those with very high social capital; Mavens, people with extraordinary knowledge and with a social motivation to spread it; and Salesmen, individuals with a high capacity of persuasion. The second part of theory explains how certain ideas posses a higher capacity of maintaining themselves for longer in the collective psyche, by having a higher stickiness factor. Lastly, Gladwell considers how having the proper context, or to change an existing one, is also a necessary condition to start a social epidemic.

Watts deals with a larger topic, the science of networks, but one is able to find in his work some scientific rationale to support the ideas proposed by Gladwell. Gladwell sustains a top-down, pyramid-like network, what Duncan describes in his book a “scale-free network” (Barabási and Albert, 1999), governed by a “power law”. In these networks, highly connected individuals “can have an influence that is disproportionate to their number”, ergo the Connectors, who are highly connected nodes within the network. They both explain, with varying degrees of empirical evidence and examples, that there is a moment in which ideas, trends, innovations or diseases catch on and propagate exponentially within networks, this phenomenon is called by Gladwell, the tipping point, and critical point by Watts, who says “… these changes of state are not steady and gradual, but sudden. One second is raining, the next snowing.”

The two books are by no means whatever an exhaustive explication, but they have served as a good introduction to the topic. I regret however my reading order. I started with the more scientific oriented work of Watts and moved to Gladwell’s rather business-like journalist approach to the subject. I would reverse the order if I would have to read them again.

dissemination of knowledge
social networks
web 2.0

Comments (0)

Permalink

The Truth According To Wikipedia

This is a very interesting documentary from Duch filmmaker IJsbrand van Veelen that further stirs the Web 2.0 controversy. It opposes, Tim O’Reilly (coiner of the term “Web 2.0″), Larry Sanger and Jimmy Wales, the founders of Wikipedia, to the author of The Cult of the Amateur Andrew Keen and Bob McHenry, former editor-in-chief of the Encyclopedia Britannica.

Via: Techcrunch

Uncategorized

Comments (0)

Permalink

Does anybody really control the floodgates of knowledge?

Within my research I stumbled upon a couple of different articles and there’s seem to be two streams of thought regarding the dissemination of knowledge: on one side are those who believe that certain “gatekeepers”, or “e-fluentials” on the web, are responsible for getting the knowledge ball rolling for everyone; on the other some who think that the process is a little more complex, and that no handful of luminaries can dictate what the mass of information consumers is supposed to read. The article “Is the Tipping Point Toast?” of Fast Company is very elucidating in this regard.

One of the key figures of the first group is Malcolm Gladwell, a New Yorker columnist, author of the best-seller The Tipping Point, where he attempts to explain how trends work. Representing the critics on the opposite side is Duncan Watts, a principal researcher at Yahoo! and Professor at Columbia University in New York. Gladwell believes that people with higher social capital are able to sparkle interest on those in their networks, in a trickling down effect. Watts maintains that “… a rare bunch of cool people just don’t have that power”, and has some solid scientific modeling to back up his arguments.

Since October 2007, when I started doing some exploratory research for my Master’s thesis, I have serendipitously come upon bloggers that seem to stand out among the crowd within the field of my interest, music on the so-called Web 2.0. Of all the RSS feeds that I have come to subscribe, some of them seem to turn up more consistent and knowledgeable information, as is the case with Net, Blogs & Rock’n'Roll, the blog of David Jennings, and the homonymous title of his book. Both his blog, and his book were found originally through a blog search. He has become one of my “gatekeepers”, not because he was already holding some special key to the gates of knowledge in the field, but through my readership of his work, which has turned out be fruitful within an relatively short period of time.

I subscribe to a RSS feed of Google Blog Search with the keyword “music 2.0″. Within a couple of weeks the beginning of the subscription, I was able to find interesting content, that I would not otherwise had been able to find within a single search. What I realized however, is that the rate of new discoveries seem to stagnate after a while, i.e. after subscribing to the “e-fluentials” in the field, you come by new entries less often.  Jennings describes a sort of pyramid of influence on the blogosphere, divided according to his terminology in: a handful of “originators” at the top, followed by a small group of “synthesizers” in the middle, and a the base composed by mass of “lurkers”, who basically only consume what’s being handed down to them. Jennings seem to agree with Malcolm Gladwell in that regard.

I haven’t taken any sides yet, and have yet to read more in depth the work of both Gladwell and Duncan. I believe however there’s a sort of a middle way between those currents. Sure, there are people who can write authoritatively about their subjects of interest, being able to amass crowds around themselves, but they acquire knowledge in the same way that everybody else does: by deliberation and by accident.

Uncategorized

Comments (0)

Permalink

Research Methodology based on RSS Feeds and Social Bookmarking (Part 1)

The research methodology for my Master thesis is largely based on rss feeds, with good leads leading to tagged bookmarks (with del.icio.us) for future reference. The best possible outcome is not only achieving a usable concrete reference, but also other rss feeds that may serve as an expanding source.

The methodology

In order to demonstrate and test tee effectiveness of the methodology I have been running for 9 days now a little experiment. My Thesis runs under the title “Dissemination of Knowledge and Culture on Web 2.0: a Case for Brazilian Music”. Using the first part of the title I have created RSS feeds based on Google Blog and News searches. I have tried to make Technorati and Digg subscriptions, but they have returned unreliable results*, specially with Google Reader**, my RSS reader of choice.

The complexity of the queries, without prepositions and conjunctions, is as follows:

  • x.1 “dissemination knowledge culture web 2.0″
  • x.2 “dissemination knowledge web 2.0″
  • x.3 “dissemination culture web 2.0″
  • x.4 “knowledge culture web 2.0″
  • x.5 “knowledge web 2.0″
  • x.6 “culture web 2.0″

Every search query used has become essentialy an rss feed subscription for a blog search (A) and a news search (B), ergo A.1, A.2, B.1, B.2, etc. With the subscriptions in place, I have used a few variables to control them, individually, on a daily basis:

  • Total Feeds: the daily number of feeds for a subscription
  • Overrides: how many repeated feeds were found within a subscription or across the whole set
  • Pre-selections: feeds that have been marked (or starred in Google Reader) for further reading
  • Primary leads: interesting*** leads that may serve as a reference derived from a pre-selection
  • Secondary leads: any interesting lead found within a primary lead (e.g. a hyperlink inside of a lead****)
  • Bookmarks: effective sources found in the chain of leads that are bookmarked and tagged
  • New RSS feeds: new rss subscriptions generated as a result of good content found within a blog or site that contains other articles of interest, and may serve as an expanding source

Preliminary results

Within the 9-day period, between 19 March 2008 and 27 March 2008, the blog search (A) and the news search (B) have returned for the combinations of all their subscriptions, a total 182 and and 198 entries respectively, a rate of 20.22 articles per day for the blog search, and of 22 for the news search. When we look a bit closer at the numbers we see that the news search had a higher repeat rate, 78 articles, whereas the blog search had only 37 repeats. Therefore, the blogsearch had a higher rate of unique articles, 79.67% (145 entries), as opposed to that of the news search of 60.6% (120 entries).

The blog search had 27 pre-selections out 145 unique entries, amounting to 18.62% ratio. Out of my readings, 6 out these 18 have turned out to be primary leads (22.22%). In the news search, 18 out 120 unique entries became pre-selections, a rate of 15%, and 4 out of these pre-selections became primary leads, coincidentally a rate of also 22.22%.

To make it more readable:

(A) Blog search:
Total entries: 182
Repeated entries: 37 (20.33% of total entries)
Unique entries: 145 (79.67% of total entries)
Pre-selections: 27 (18.62% of unique entries)
Primary leads: 6 (22.22% of pre-selections)
Secondary leads: 4
Bookmarks: 7 (70% of all leads combined, 25.93% of pre-selections, 4.83% of unique entries)
New RSS Feeds: 2 (20% of all leads combined, 7.41% of pre-selections, 1.38% of all unique entries)

(B) News search:
Total entries: 198
Repeated entries: 78 (39.39% of total entries)
Unique entries: 120 (60.61% of total entries)
Pre-selections: 18 (15% of unique entries)
Primary leads: 4 (22.22% of pre-selections)
Secondary leads: 2
Bookmarks: 5 (83.33% of all leads combined, 27.78% of pre-selections, 4.17% of unique entries)
New RSS Feeds: 0

Early observations

In 9 days of observations it’s safe to point out that news searches have a much higher repeat rate than blog searches. This maybe due to the limited number of sources used by the Google News search and differences in their search algorythms. Without getting to the subjective quality of the content (a bit more to that later), both the blog and the news searches gave roughly similar results in relation to the number of bookmarks generated.

I will be trying to expand this observation a bit further and will be posting the results here.


jp
Notes:

* Techonorati and Digg feeds didn’t produce feeds if there were not any results in the first search.

** Maybe I am part of a vendor lock-in, but I have also used Google Reader as my RSS reader of choice. Although it lacks basic sorting functions and it’s statistics tools are limited to only the last 30 days, it has served me well for the purpose of this studies and it’s online, accessible from anywhere.

*** Interesting is a very broad term here. It denotes what I would personally find fit to use as reference for my thesis research. Naturally, other subjects would have preselected and bookmarked different entries. The purporse of this analysis is primarily to describe how RSS feeds can be used as exploratory research method, not the quality of the leads found in the results.

**** Secondary leads also include any leads found within the navigational path starting at the primary lead. For example, a preselection has turned up a good primary lead, say, a blog entry from Prof. John Doe. In his article, Prof. Doe mentions the work of Dr. Lorem Ipsum (a secondary lead). In this secondary lead, the article of Dr. Ipsum, there’s a link to another piece of information, and so it goes.

rss
social bookmarking
web 2.0

Comments (0)

Permalink