Long time, no read, huh? :) I’m back, after 1,5 month of hard work, I have more free time I can spend on writing at delveinto.info.

Hyperrise of amount of information gathered in the Internet caused the need for credible information analysis. Especially for effective extracting opinion contained in these information. So far no apps, that would have been able to extract opinion from documents automatically, were developed. There are some that can process huge amounts of text but they can not make an interpretation - this is always human’s role.

I have got an idea of series of posts, in which I will write about some apps that are quite effective and in large extent can simplify our way of acuiring the opinion from written texts. But before that let’s identify most important areas of acuiring opinion.

Most important reason for doing it seems to lay in marketing approach - opinion of consumers is most important one nowadays. It is consumer who takes decisions about what to buy, so somehow he has the power over providers of goods and services - who want to know how to satisfy consumer’s needs in order to get paid. They want to know what is the object of demand or need for improvement. Providing credible consumer’s opinion to set directions and strategies of companies can become very profitable busines soon. But it is not so easy as it looks like - there is number of questions and obstacles.

First and most important one is methodology - how to extract opinion, what makes opinion credible, what parameters should we take into consideration and of course what texts should we extract from. There is a huge confusion about last of this issues. Companies not only want to acuire opinion from the Web, they also want to spread opinions among other consumers to rise the publiciy and increase sale volume thanks to manipulated word-of-mouth practices. Some companies offer publishing comments and blog entries for money in order to improve search engine ranks for particular terms or to convince other consumers that product they are sharing their opinion about is more valuable that other products are. Such an artificial buzz effect is a real plague for these who want to extract opinion from the Web.

The other question is: should we take into account opinions published on platforms provided by producers? They might be moderated or wrriten for companies’ order. This can be explained by showing the risk of revealing that whole communication was falsified or manipulated. This is completely different to the abovementioned cases, where “proper” opinions are spread among these that are real.

Other areas that should be addressed are related to technical sphere of the issue. Before starting opinion mining appropriate methods should be chosen. Choice is quite limited. I would recommed statistical or liguistic methods. We can also combine these two methods. By counting stats for texts, we might learn what objects and topics are most important in the text, we can as well compare positive and negative words used in the message. Using linguistic methods our results would be more complex, we will learn what sort of narration is used - who has got dominative role, also the pace of narration and grammar relations between objects in the text. We do not need all these information, so probably more appropriate would be combined methods - selected to measure specific parameters we want to extract.

Soon I will publish results of opinion mining conducted on Ggle, Yhoo and YTbe blogs. Research was presented at studying communication conference in Lodz, PL.

-mw

OPINION: IN WEB WE TRUST

November 28th, 2008

When one faces the deluge of information one of most important issues becomes the fact of credibility of information we get from the Web. There are methods of setting the track record for particular websites (based on monitoring the site’s credibility), but there’s no guarantee that the site we find trustworthy would be among the results for particular query. Answer and remedium for this problem, can be Google Wiki Search, that enables to mark up specific site (for example the one we consider to be a good source of information). This site would be always at the top of search results site, when we search for terms that can be boiled up to the same category (for example try financial crisis and credit crisis).

In this case we can observe completetly new approach - giving credibility status with web tools is individualized, previously there were start-ups basing on social suggestions, Eurekster or Mahlo, just to give examples. But in this case we were forced to trust the community - which in fact remained anonymous. These services are not closed, opened social services that gather opinions from enthusiasts who, in many cases, lack the elementary knowledge in specialized disciplines, like for examples economy, medicine or others. On the one hand - it is the silent bauty of Web 2.0 - every amteur can become an expert, but on the other hand during searching the Web to get the information essential to your company or research it can just complicate things and waste our time.

Trust if of course the basis of each society (at least these highly developed ones), but wasn’t the web since it beginning a meritocratic phenomenon? I do not claim that democratization of Web is something wrong, it is good, it enhances the flow of opinions and provides another areas of potential business-making but… user-generated content affects the web credibility. In the field of retrieving credible information, that can determine the success of our venture or our client’s venture it seems that it is better to trust competent people. Social marked search results can be useful just for preparing the popularity ranks among particular web communities and at a pinch to get the opinion of web users.

-mw