OPINION: OPINION MINING - IDEE FIX?
April 26th, 2009
Long time, no read, huh? :) I’m back, after 1,5 month of hard work, I have more free time I can spend on writing at delveinto.info.
Hyperrise of amount of information gathered in the Internet caused the need for credible information analysis. Especially for effective extracting opinion contained in these information. So far no apps, that would have been able to extract opinion from documents automatically, were developed. There are some that can process huge amounts of text but they can not make an interpretation - this is always human’s role.
I have got an idea of series of posts, in which I will write about some apps that are quite effective and in large extent can simplify our way of acuiring the opinion from written texts. But before that let’s identify most important areas of acuiring opinion.
Most important reason for doing it seems to lay in marketing approach - opinion of consumers is most important one nowadays. It is consumer who takes decisions about what to buy, so somehow he has the power over providers of goods and services - who want to know how to satisfy consumer’s needs in order to get paid. They want to know what is the object of demand or need for improvement. Providing credible consumer’s opinion to set directions and strategies of companies can become very profitable busines soon. But it is not so easy as it looks like - there is number of questions and obstacles.
First and most important one is methodology - how to extract opinion, what makes opinion credible, what parameters should we take into consideration and of course what texts should we extract from. There is a huge confusion about last of this issues. Companies not only want to acuire opinion from the Web, they also want to spread opinions among other consumers to rise the publiciy and increase sale volume thanks to manipulated word-of-mouth practices. Some companies offer publishing comments and blog entries for money in order to improve search engine ranks for particular terms or to convince other consumers that product they are sharing their opinion about is more valuable that other products are. Such an artificial buzz effect is a real plague for these who want to extract opinion from the Web.
The other question is: should we take into account opinions published on platforms provided by producers? They might be moderated or wrriten for companies’ order. This can be explained by showing the risk of revealing that whole communication was falsified or manipulated. This is completely different to the abovementioned cases, where “proper” opinions are spread among these that are real.
Other areas that should be addressed are related to technical sphere of the issue. Before starting opinion mining appropriate methods should be chosen. Choice is quite limited. I would recommed statistical or liguistic methods. We can also combine these two methods. By counting stats for texts, we might learn what objects and topics are most important in the text, we can as well compare positive and negative words used in the message. Using linguistic methods our results would be more complex, we will learn what sort of narration is used - who has got dominative role, also the pace of narration and grammar relations between objects in the text. We do not need all these information, so probably more appropriate would be combined methods - selected to measure specific parameters we want to extract.
Soon I will publish results of opinion mining conducted on Ggle, Yhoo and YTbe blogs. Research was presented at studying communication conference in Lodz, PL.
-mw
NEWS: WOLFRAMALPHA UPCOMING BETA
March 13th, 2009
Wolfram Alpha is a new project by Steven Wolfram, british mathematician, physician and creator of Mathematica - technical computing application. This week Wolfram annouced that his company is going to launch new search engine on May. it is going to be named Wolfram Alpha. Known as innovative and good in writing computing algorhytms Wolfram may creata something really big. From what we could have read in the media buzz on Wolfram Alpha is going to be combination of semantic search engine with question gathering one.
The main innovation is fact that WA is going to generate answers in real time, without gathering them previously. It is something tottaly opposite to what we could have seen , for example, in Ask.com. Answers are “computed” from unstructurized data processed by engine’s algorhytms and than answers are generated. These algorhytms are said to be based on natural language, which is fully understood by the engine. Answers are going to be given in plain language and contain extract from indexed data.
Interesting thing is that index for this search engine is not like traditional one - created by some crawlers from data of web-origin and stats. Some parts of WA’s index are large databeses from various fields, huge amounts of information about physical world is gathered in them - WA is going to offer more formal sort of knowledge than for example Google, that bases on different types of media information, which often is informal.
Right now the project is in private beta phase. You can ask for invitation or subscribe for newsletter.
Questions we can aks at this moment are concerned on the nature of the results given by WA, under some assumptions this project might look like step back. Is it going to be improved Wikipedia, with formal and credible knowledge? What about network business model - if users are not going to be contributors at the same time, would they like to use it, if the knowledge is not going to be democratized?
-mw
NEWS: SOCIAL NETWORK SEARCH
March 13th, 2009
Aardvark is a search engine founded by ex-Googlers, that uses social network power. It is not based on information created by users and than indexed. It is distributing the query among your afiliates and their afiliates, who find themselves good at different fields - if you’re lucky you might get the answer. Idea is not stupid, or rather decent. User can ’search’ by using IM, e-mail, SMS or Twitter.
To increase probability of being answered you should take care and gather as much friends in the service as possible, which is a little bit hard especially at the begging after releasing public beta version. But there is an option to grab the contact list from Facebook connect.
After typing in the query the algorhytm searches for person that is most probable to have knowledge about what user is aksing. Than user just have to wait for the answer. Basis for poining on specific user as the one who has appropriate knowledge is the fact that he introduced himself as having knowledge on particular field, coming from specific region or interested in something user asked about.
It looks like this sort of human powered search is one with completely new approach. It is interesting how many degrees of affiliation can be potentially used. However it can be a great source of help and really practicl information in some cases. Except for that Aardvark looks like it is going to be not only the source of information but also sort of place to make friends and have fun.
-mw
NEWS: SEARCHING BRAIN’ WORK
March 13th, 2009
At Gord Hotchkiss’ site we can find interesting interview with dr Teena Moody from UCLA’s Semel Institute, who conducted survey on how human’s brain work during searching the web. The research is based on the method name as functional Magnetic Resonance Imaging (fMRI). Main goal of this research was to find out on the scale of differences between human’s brain work during reading and searching the web. Another dimension of the research was to check if there are any differences between different grops of people - heavy users and novices.
Magnetic resonance revealed that brains of novices, named by dr Moody Internet-naive persons, during searching the web behave at the same way as they were reading, on the other hand heavy users’ brains work more activelly, more brain centers are active. Especially front and limbic lobes are being used by heavy-searchers. These two lobes are responsible for making decisions and conflict settlement.
Brain centres active during searching the web for heavy-users and novices:
Explanation for that fact is quite simple, using the Internet changes our brain work procedures, people who do not use Internet everyday behave like children in the mist - they just fell lost in new enviroment and act like they were reading it, gathering the information to get the full scope of view. However heavy-users make decisions basing on the information the just have found in perfectly known enviroment.
This partly supports thesis that using Internet changes our brains.
-mw
NEWS: KUMO - LIVE SEARCH’ SUCCESSOR?
March 3rd, 2009
At the end of this week Microsoft will start internal tests of its new search engine, that will replace Live Search. In the long run Microsft is going to resign from Live Search brand completetly and create new engine and new brand at the same time.
The news was published by Barney Pell, one of Microcorp evangelists and formerly one of the authors of Powerset search engine. The rumour was buzzed at Twitter and than Microsoft confirmed that they work on creating new engine, although no ballpark date was given.
Microsoft bought domain kumo.com and redirects its internatl traffic to it. Kumo is a Japan word meaning “cloud” or “spider”, could it be that data gathered in web clouds also is going to indexed?
-mw

