Social Media, Data Mining & Machine Learning: October 2008

Going to CIKM and IDEAL

Posted by JoSeK at 9:55 PM . Saturday, October 25, 2008

Tomorrow I'll be flying (with my colleage Francisco Carrero) to San Francisco to attend CIKM 2008. Then, at the end of the next week, I'll fly to South Corea to attend IDEAL 2008. 12 days off the office to attend some interesting sessiones and visit interesting places. A complete travel around the world :D

In the Development of a Spanish MetaMap

Posted by JoSeK at 6:19 PM . Tuesday, October 21, 2008

0 comments

Labels: ontologies, research

Frankie and me will be attending CIKM this year, in order to present a poster on one of our current research lines. This poster is entitled "In the Development of a Spanish MetaMap" and presents how we are trying to deal with the adaption of a such huge linguistic resource as MetaMap is.

Next is the poster, please, feel free to make any comments about the poster, as we prefer to correct or arrange anything before CIKM in order to make it clearer.

Y!OS

Posted by JoSeK at 3:06 PM . Monday, October 20, 2008

0 comments

Labels: internet

This morning, I found an interesting post about Yahoo! Open Strategy (Y!OS) in Enrique Puertas' blog (in Spanish). Y!OS is the way Yahoo! is trying tyo fight against its major competence, Google. Both companies knows about the relevance of Open Software and are adopting Open Strategies in order to create comunities around their products.

Y!OS is organized in 3 main platforms plus OAuth as a way to implement a model of authentication.

1.- Yahoo! Application Platform: Is the Yahoo! platform for developing web applications that are available throughout Yahoo!. It gives the developers a development environment, APIs to access important functionalities, distribution and discovery infrastructure and a runtime and rendering environment.

In the next video, Xavier Legros talks about Yahoo! Application Platform at Open Hack Day 2008.

2.- Yahoo! Social Platform: is a suite of REST APIs that enable the creation of social applications that makes easier to connect users.

3.- Yahoo! Query Language: is a similar to SQL language that allows the developers to query, filter anc combine data accross Yahoo!, as well as any othe sources like RSS feed or HTML webpages.

Is seems really interesting that, from several services like BOSS or PIPES, Yahoo! has been able to develop a complete and unified platform that gives the developer an added value. I like the last movements of Yahoo! and I think Yahoo! only depends on himself to continue/return being a great player of Internet's technologies.

The State of Business Intelligence 2008

Posted by JoSeK at 7:10 PM . Sunday, October 19, 2008

3 comments

Labels: business intelligence

InformationWeek publishes an interesting article about the state of Business Intelligence 2008. This last year has been a very moved one in this area, with great adquisitions (Cognos, BO, Hyperion, etc.) by big companies (Oracle, SAP, Microsoft, etc.) that changes the actual panorama of this sector.

Second Web People Search Evaluation Workshop

Posted by JoSeK at 6:59 PM . Tuesday, October 07, 2008

0 comments

Labels: cfps, evaluation

Second Web People Search Evaluation Workshop
Call for Participation
http://nlp.uned.es/weps

Finding information about people in the World Wide Web is one of the most common activities of Internet users. Person names, however, are highly ambiguous. In most cases, the results for a person name search are a mix of pages about different people sharing the same name. The user is then forced either to add terms to the query (probably losing recall and focusing on one single aspect of the person), or to browse every document in order to filter the information about the person he/she is actually looking for. In an ideal system the user would simply type a person name, and receive search results clustered according to the different people sharing that name.

In 2007 the Web People Search Task (Artiles et al. 2007) was the first competitive evaluation focused on this problem. The 16 participating systems received a set of web pages for a person name, and they had to cluster them into different entities. This second evaluation provides a new testbed corpus, improved evaluation metrics, and an additional attribute extraction subtask.

* Task definitions

** Clustering

In this task systems receive as input a set of web search results obtained when performing a query for an (ambiguous) person name. The expected output is a clustering of the web pages, where each cluster is assumed to contain all (and only those) pages that refer to the same individual.

** Attribute Extraction

This subtask consists of extracting 18 kinds of "attribute values" for target individuals whose names appear on each of the provided Web pages. The organizers will distribute the target Web pages in their original format (i.e., html), and the participant systems have to extract attribute values from each page.

** Complete guidelines and data

* Participation

The clustering and the attribute extraction task will be regarded as two separate subtasks, and therefore a team can choose to participate in only one or both of them. The organizers will provide annotated data for developing/training systems. On a second stage, an unannotated corpus will be distributed, systems output will be collected and evaluation results returned to the participants. Each team can submit up to five runs. Every team is expected to write a paper describing their system and discussing the evaluation results.

* How do I register ?

Please send an email expressing your interest to the task organizers (weps-organizers@lsi.uned.es).

* Important Dates

October 2008: Distribute the training data + CFP
December 1-8, 2008: Evaluation
December 17, 2008: Return the evaluation result
February 2009: Papers due.
April 2x, 2009: Workshop in Madrid.

* Workshop Organizers

Satoshi Sekine, Proteus Project (NYU).
Javier Artiles, NLP & IR Group (UNED).
Julio Gonzalo, NLP & IR Group (UNED).

* Program Committee

Eneko Agirre, UBC
Breck Balwin, Alias-i
Andrew Borthwick, Spock
Jeremy Ellman, Northumbria University
Donna Harman, National Institute of Standards and Technology (NIST)
Eduard Hovy, ISI
Dmitri Kalashnikov, University of California, Irvine
Paul Kalmar, Fair Issac
Bernardo Magnini, FBK-irst, Italy
Gideon Mann, Google
Yutaka Matsuo, Tokyo University
Manabu Okumura, Tokyo Inst. of Tech.
Ted Pedersen, University of Minnesota
Massimo Poesio, University of Essex
Maarten de Rijke, University of Amsterdam
Mark Sanderson, University of Sheffield
Arjen P. de Vries, Centrum Wiskunde & Informatica

Updated information about the task can be found at the WePS web site (http://nlp.uned.es/weps)

Social Media, Data Mining & Machine Learning

Going to CIKM and IDEAL

In the Development of a Spanish MetaMap

Y!OS

The State of Business Intelligence 2008

Second Web People Search Evaluation Workshop

Labels

Blog Archive

Related Blogs