From mbates@... Sat Jan 12 17:26:11 2008
Return-Path: <errors@...>
X-Sender: errors@...
X-Apparently-To: lightheartedlibrarians@...
X-Received: (qmail 48840 invoked from network); 13 Jan 2008 01:26:10 -0000
X-Received: from unknown (66.218.67.94)
by m36.grp.scd.yahoo.com with QMQP; 13 Jan 2008 01:26:10 -0000
X-Received: from unknown (HELO server.willco.com) (195.177.192.131)
by mta15.grp.scd.yahoo.com with SMTP; 13 Jan 2008 01:26:09 -0000
X-Received: from server.willco.com (server.willco.com [127.0.0.1])
by server.willco.com (8.13.1/8.12.8) with ESMTP id m0D1Q9BD029259
for <lightheartedlibrarians@...>; Sun, 13 Jan 2008 01:26:09 GMT
X-Received: (from errors@localhost)
by server.willco.com (8.13.1/8.12.8/Submit) id m0D1Q9r8029258;
Sun, 13 Jan 2008 01:26:09 GMT
Date: Sun, 13 Jan 2008 01:26:09 GMT
To: lightheartedlibrarians@...
MIME-Version: 1.0
X-Mailer: Willco Mailer
X-Willco-Timestamp: 1200187569
X-Willco-Community: bates
X-Willco-Identifier: 181
X-Willco-Recipient: lightheartedlibrarians@...
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
Message-ID: <juk6nl.dw0zkr@...>
X-Originating-IP: 195.177.192.131
X-eGroups-Msg-Info: 1:12:0:0:0
From: Mary Ellen Bates <mbates@...>
Reply-To: Mary Ellen Bates <mbates@...>
Subject: Bates InfoTip: Clustering On Demand
X-Yahoo-Group-Post: public
[You are receiving this because you have subscribed to this free newsletter as lightheartedlibrarians@.... If you would like to unsubscribe, just go to BatesInfo.com/subscribe.html or email me.]
* * * Clustering On Demand * * *
I was recently doing some research for a client on the topic of social capital (see, for example, Robert Putnam's book, Bowling Alone). It's a difficult topic to search and, of course, I retrieved kajillions of results from several search engines. I went through as many of them as I had the patience for, and I tried a number of refinements to further focus my search. But I found it difficult to find what I wanted in the major search engines.
Then I remembered hearing about Carrot2 (http://demo.carrot2.org), an open source search-results-clustering engine, just recently out in beta. In a nutshell, it takes search results, analyzes them and, on the fly, creates groups of the most common concepts or terms from those results. Since this is all done by algorithms rather than by humans, expect the odd result every once in a while, but I found the clusters to be consistently useful.
Carrot2's default is to search the web using eTools.ch, a Swiss meta-search engine that queries 10 search engines, including Google, Yahoo, Ask and MSN. However, since eTools only returns the top 20 results from each search engine, I prefer not to use eTool search results. Instead, you can click a tab to limit your search to Google, Yahoo, MSN, Wikipedia, PubMed and a few other finding tools. Because clustering is a computationally intensive process, Carrot2 limits the search results by default to the top 100 results from any of the search engines. However, you can click the Show Options link and set Carrot2 to search and sort up to 400 results. (Note that increasing the number of search results also increases the number of results from each search engine when using the eTools meta-search engine from 20 to 40.)
Geek that I am, I find it even more intriguing that, under that "Show Options" link is a pull-down menu that lets you select which of six different sorting algorithms you want to use. The clustering results are dramatically different (although keep in mind that the search results themselves stay the same -- only the clusters change). With my "social capital" search, I was able to see a variety of groupings of my search results, and identify some of the key writers and terms.
Carrot2 may not be your day-to-day search tool, but it is tremendously useful for those searches in which it is difficult to sift the wheat from the chaff.
****************************
"Can I publish or reproduce this InfoTip?" Be my guest. Just make sure you credit the source, Bates Information Services, and include the URL, www.BatesInfo.com/tip.html.
In addition to InfoTips, I've got a personal blog, Librarian of Fortune (www.LibrarianOfFortune.com)
A version of this InfoTip with live links is available at www.batesinfo.com/tip.html An RSS feed for my InfoTip is at www.batesinfo.com/tip.rss
If you want to see where I will be speaking next, check out www.BatesInfo.com/new.html
Do you need value-added research or training services?
Contact me at:
Mary Ellen Bates
Bates Information Services Inc.
+1 303.772.7095
mailto:mbates@...
www.BatesInfo.com
|
Mary Ellen Bates <mbates@...>
mbates@...
Send Email
|