NEWS: P2P SEARCH ENGINE

February 28th, 2009

Faroo is an initiative geared towards merging p2p mechanism and search engine’s nature. By downloading Faroo, user becomes the part of decentralized network of crawlers that create contribute to creating index. With one difference - only visited pages are indexed. It is nothing like searching for UFO screen saver, sharing computer resources. In fact it is a user who is the crawler - visited pages are added to index, so index is a result of browsing history of all Faroo users, not like in traditional search engines, where crawler “goes” from one server to another.

Faroo contains also some interesting mechanism except of offering becoming part of search engine - PageRank mechanism is also built with completely new approach. It does not look like traditional page rank mechanism, in Faroo it is not page owner, who decides about the position in the ranking, by linking pages togather. Faroo offers user-oriented technology - user’s behaviour is being tracked while browsing specific page and it is automatically counted in the ranking. It sounds a little bit enigmous, I imagine that final contribution to Page Rank is a net force of lenght of the stay, number of clicks and the nature of clicks (ads or content), maybe also adding to favourites.

Faroo is an interesting initiative, it is even more curious how it will be developed. At least I am going to stay tuned at their blog.

-mw

Dandelon is a search engine destined to search thorugh resources of libraries catalogues. It conducts the query by scanning through bibliographic data provided by librarians, OCR thechniques and searching through electronic documents stored at libraries’ servers. Although it is a german project, signed by AGI-Information Management Consultants it is available in 27 languages (mainly european). More than 1,15 million of books, ebooks and articles have already been indexed.

Techniques implemented in these search engine are well known in world of librarians - intelligentCAPTURE is a tool used by Congress Library and British Library. It is used to provide most relevant results by matching query terms with thesauri entries (in these case 27 different languages are used), so its potential is impressive.

-mw

So-called semantic web industry is growing bigger every day. Although it still has not been defined what exactly semantic web mean, there is no week we do not get another semantic tool. It seems like the word semantic became a mantra. Although it is still hard to identify the web services that in fact are semantic. Many of them just have names based on the term. Before we check few quite new semantic tools let’s try to find an answer to the question why do we need semantic text analysis.

In my opinion the reason for starting whole this semantic affair was the need for improvement of queries understanging by search engines and the quality of search results. These two needs are mutually dependent and conditional. That is why the term semantic search - the most exciting combo - has not been embodied yet. Right now people are on the stage of developing automatic semantic analyses of text that would enable both: indexing of huge amounts of semantically analysed text and semantical analyses of queries, formulated as questions. There already exist several services that try to deal with such an analysis.

SemantalyzR is a service that enables getting the page’s content filtered and presented in form of tags, divided in categories such as: names, industry terms, countries, organizations and facilities etc. Unfortunately the engine is not precise enough. For example for DelveInto.Info most of the tags were drawed from the tag cloud, in addition fewof tags were not even separeted from each other, although that in fact they are two separete tags.

SemantalyzR gives also possibility to delve into specific tags to get a set of informations from different services such as twitter, flickr, wikipedia and dooblet. User gets the small compilation of previously analysed sites, where the specific tag was detected. The most interesting thing about that links-list is the fact it contains questions and answers from Yahoo Answers, regarding the searched term.

OpenCalais this tool offers two kinds of services, first is a text converter from simple text to .rdf coded file. It can be useful while making our webpage visible for semantic search engines. The other part of OpenCalais is a plug-in, that installed in our browser (currently only IE and FF available), can underline nouns in the content of currently browsed page with different colours, depending on which category the word was asosciated with. Similar solution was implemented in Mashlogic plugin, I wrote about. OpenCalais also enables to search the term with the most popular search engines, that are displayed in context menu, visible after setting the cursor on underlined term.

Interesting but unfortunately not available for the public tool is one developed by Cortex Intelligence, company specializing in Text Mining for the use of Competetive Intelligence. We can just watch a demo presentation. Its basis is the anlalyses of simple grammar functions and relations between the verbs and other sentence parts. Execpt for quick analyses of actions described in text without the need of reading the text we can also get text sorted to categories such as: date, companies or geographical names. This project is probably most advanced due to its grammar analyses. In fact it seems most semantic among all mentioned above (at least it is possible to make such a statemant basing on its demo).

-mw

NEWS: CUIL’S UPGRADED INDEX

November 5th, 2008

cuilReffering to Cuil Team’s annoucement they implemented some changes to their search engine’s indexing system. As we can read:

We’ve just rolled out a new, refreshed index that addresses those issues. We’ve dropped dead pages and spam and folded in new pages from our recent crawls“.

The main purpose was reduction of expired links. Other changes were concerned on query processing. Read more at Cuil’s blog - here.

-mw

About year ago it was hot news that Google was testing voting up search results mechanism. Now it seems like it is now in user-testing phase. On several sites, like: SearchRoundTable, ZDNet, JustinHileman, we can read about that it is available for some users. We can also see the images and watch videos, another one from TechCrunch - here.

In these Google’s feature we wil be able to vote for prticular results up or down, analogically to the Digg system. We will be able to add comments, explaining why specific result is precise or why it is not. We will be able to read other users comments and see who they are.

It looks like a major revolution in Google’s algorythm. By the way, in my opinion in future we will be observing two approaches on personalised sites indexing - first via voting, and second by tagging sites.

-mw