[REMINDER] Special Issue of International Journal of Electronic Commerce on Mining Social Media

. Wednesday, December 23, 2009

The deadline fot the special issue on Mining Social Media on the International Journal of Electronic Commerce is approaching (deadline for abstract is January 15). If you are interested on publishing on this special issue, please, take care about the dates.

List of Social Tagging Datasets

. Sunday, December 06, 2009

Markus Strohmaier is compiling a list of social tagging datasets available for research. Actually the list contains 8 datasets, but it's being actualized according to the comments made on Markus' blog. It seems a good place to find interesting datasets to work on, and also to share the actual datasets we're working on.

From Search to Recommender Systems

. Thursday, December 03, 2009

Tech-companies rule the Web, and you can see that analyzing some of the biggest Internet companies like Google and Amazon. Google is the Intelligent Systems reference company due to their search engine, but also for the big quantity and quality of different technologies tehy develop like automated translation, user profiling, context management or even image processing.

Amazon is a e-commerce store, and it seems not so correlated with technology as Google, but the vision of Jeff Bezos and their commitment to technology have allowed them to grow like anyone before in the e-commerce market. For Bezos, an online store should not limit their catalog to a few items, online stores should contain millions of products, and should personalize the user experience of their users. His vision was clear: "if you have 3 million customers in the Web, I should have 3 million online stores", and then the recommender system ruled Amazon.

Both technologies (search and recommender systems) are useful for the Information Overload problem we suffer nowadays. But they're radically different from their conception. Search engines need the users to express their needs in textual form, and then process that query and retrieve the most relevant documents according to that query. Recommender Systems analyze the behaviour (and other kind of data) of the user in a website and then are able to choose the products or contents more likely to interest the user. Both approaches are useful, but until the moment recommender systems are not as popular as search engines.

But we are in a turnaround in the Web, as the way information is generated and consumed has changed. Nowadays, due to the success of Social Media sites like Facebook or Twitter, and even to the success of previous technologies as RSS, we receive a lot of information in a passive way: we don't ask directly to receive that information, but we receive it. Until the moment, the problem was to find some important information, but now the problem is turning into choose what information I already receive is relevant to me. That's why I think recommender systems will replace in popularity to search engines in the future.

1st Spring School on Social Media Retrieval (S3MR)

. Sunday, November 08, 2009

DEADLINE: November 17, 2009.

Multimedia content has become ubiquitous on the web, creating new challenges for indexing, access, search and retrieval. At the same time, much of this content is made available on content sharing websites like YouTube or Flickr, or shared on social networks like Facebook. In such environments, the content is usually accompanied with metadata, tags, ratings, comments, information about the uploader and their social network, etc.

Analysis of these "social media" shows a great potential in improving the performance of traditional multimedia information analysis/retrieval approaches by bridging the semantic gap between the "objective" multimedia content analysis and "subjective" users' needs and impressions. The integration of these aspects however is non-trivial and has created a vibrant, interdisciplinary field of research.

The Spring School on Social Media Retrieval aims at bringing together young researchers from neighboring disciplines, offering

(1) lectures delivered by experts from academy and industry providing a clear and in-depth summary of state-of-the-art research in social media retrieval,

(2) collaborative projects in small groups providing hands-on experience on integrative work on selected problems from the field.


* Content distribution over social/peer-to-peer networks
* Multimedia content analysis
* Automatic multimedia annotation/tagging
* Multimedia indexing/search/retrieval
* Implicit media tagging
* Social data analysis
* Collaborative tagging

Confirmed lecturers:

-Susanne Boll, Carl von Ossietzky Universität, Oldenburg, Germany , http://medien.informatik.uni-oldenburg.de/personen/susanne_boll/
-Roelof van Zwol, Yahoo Research, Barcelona, Spain, http://research.yahoo.com/Roelof_van_Zwol
-Ciro Cattuto, ISI Foundation, Turino, Italy, http://isiosf.isi.it/~cattuto/

for more information and also for subscription please visit our webpage: http://www.petamedia.eu/s3mr/

CFP: Special Issue of International Journal of Electronic Commerce on Mining Social Media

. Friday, October 30, 2009

After the experience of organizing the 1st International Workshop on Social Media (papers now online), we've been organizing a special issue of the IJEC (International Journal of Electronic Commerce) on Mining Social Media. Now we release the CFP hoping to receive high quality papers on Mining Social Media:


Recently, Forrester published a report, “The Future of the Social Web” where they sketched a timeline of the development of the Social Web, dividing its evolution in 5 eras. According to that report, the first era of the development of the Social Web started to explode the social relationships among users. Then, in the social functionality era, these social relationships resulted in the social functionality era where several websites started to add social functionalities in order to help users to interact with their peers. We are now in the era of Social Colonization, where technologies like Facebook Connect or Google Friend Connect have standardized social functionalities among websites and a vast majority of websites now include several social functionalities. Soon these federated identities will empower people to enter the era of social context with personalized and social content, and the development of tools for personalize social content will aim the development of the era of social commerce.

The primary goal of the proposed special issue of International Journal of Electronic Commerce is to foster research in the interplay between Social Media, Data Mining and Electronic Commerce, trying to reflect the actual developments on technologies that fit on the Social Context era.


The International Journal of Electronic Commerce is the #1-ranked journal on Electronic Commerce globally. This Special Issue will provide a significant opportunity for authors to publish important novel and original contributions in the area of Data Mining applied to Social Media. The guest editors seek papers and proposals that address various aspects of Mining Social Media, including recommender systems for social media, data mining algorithms designed to explode Social Networks, information management for Social Networks, etc.


We invite scholars and professionals from a broad range of disciplines to submit to this Special Issue. Papers may encompass any or all of the following: foundational theoretical analyses, modelling, simulation, and empirical studies. Authors may examine different aspects of mining social media in any of a variety of possible contexts. Special topics of interest include, but are not limited to, the following:

A. Data Mining for Social Networks

• Novel Algorithms
• Association Rules
• Mining semi-structured data
• Classification and Ranking
• Clustering
• Text Mining
• Machine Learning
• Privacy Preserved Data Mining
• Statistical Methods
• Temporal and spatial data mining
• Parallel and Distributed Data Mining
• Interactive and Online Mining
• Data and Knowledge Visualization
• Multimedia mining (audio/video)
• Ensemble Methods
• Web Mining
• Graph Mining
• Link Mining

B. Information Management for Social Networks

• Recommender Systems
• Information Retrieval
• Sentiment Analysis
• Natural Language Processing
• Question Answering
• Semantic Processing
• Graph Analysis and Complex Networks
• Social Network Analysis

C. Possible applications

• Electronic Commerce
• E-Mail Spam Detection
• Blog/Social Networks Spam Detection
• Community Detection
• Users/content recommenders
• Trends discovery
• Blogs/Social Networks Community Dynamics
• User Reviews Ranking
• Blogs/Social Networks Contributions Summarization
• Abuse/Fraud Detection
• User Profile Modelling
• Event Detection and Tracking in Social Media
• Online Advertising


Manuscripts submitted to the special issue should contain original material not published in nor submitted to other journals. Each manuscript has to have a cover page with the author information and another page with title and abstract but the author information omitted. The review process is double-blind and papers which do not meet publication quality standards will be rejected before the review process.

Interested authors are required to submit extended abstracts of no more than two pages for their planned submissions. This will give the editorial team an opportunity to determine if a given submission is appropriate for expedited handling and review.

Full papers should be sent via e-mail to Jose Carlos Cortizo <josecarlos.cortizo@wipley.com> in anonymized PDF Format, not including any author names or affiliations, and should not exceed 40 pages.


  • Abstracts DeadLine: 15 January 2010
  • Abstracts Feedback: 30 January 2010
  • Full Paper Submission: 15 April 2010
  • Revision Notification: 1 June 2010
  • Revised Manuscripts: 1 August 2010
  • Final Decision: 1 October 2010

1st International Workshop on Mining Social Media Programme

. Monday, October 12, 2009

While we are still working on the final proceedings to be published in Bubok, and in the post-workshop special issue on a journal to be announced soon, we have the final version of the programme of the Mining Social Media Workshop. If you are interested on Mining Social Media, this will be a very good place to meet with other researchers and practicioners. Registration is open.

  • 9:30 - 11:00; Keynote speaker, William W. Cohen
  • 11:00 - 11:30; Coffee Break
  • 11:30 - 13:30; 6 paper presentations (20 minutes per paper)
    • "Using prediction Markets and Twitter to predict a Swine Flu Pandemic", Joshua Ritterman, Miles Osborne and Ewan Klein
    • "Comparison of Rule-based to Human Analysis of Chat Logs", April Kontostathis, Lynne Edwards, Jen Bayzick, India McGhee, Amanda Leatherman and Kristina Moore
    • "Detecting Blogs Independently from the Language and Content", Francisco Manuel Rangel and Anselmo Peñas
    • "Improve Web Search Ranking with Social Tagging", Shihn-Yuarn Chen and Yi Zhang
    • "Combining Tag Cloud Learning with SVM Classification to Achieve Intelligent Search for Relevant Blog Articles", Ahmad Ammari and Valentina Zharkova
    • "Folksonomy Analyzer: a FCA-based Tool for Conceptual Knowledge Discovery in Social Tagging Systems", Kyoung-Mo Yang, Suk-Hyung Hwang, Yu-Kyung Kang, Hae-Sool Yang
  • 13:30 - 15:30; Lunch Break
  • 15:30 - 17:00; 4 paper presentations (20 minutes per paper)
    • "Fundamental operations for organizing resource groups in Grouped folksonomy", Yu-Kyung Kang, Suk-Hyung Hwang and Hae-Sool Yang
    • "A Comparison of Approaches to Determine Topic Similarity of Weblogs for Privacy Protection", Dong Yi Wu and Amanda Stent
    • "Data-Driven Ontologies for Recommender Engines in Social Networks", Ingo Bax and János Moldvay
    • "Expert Stock Picker: The Wisdom of (the Experts in the) Crowds", Shawndra Hill, Noah Ready-Campbell
  • 17:00 - 17:30; Coffee Break
  • 17:30 - 19:30; Industry Panel with Tuenti, Strands and Optenet

Innovation in Search and Artificial Intelligenc

. Friday, September 04, 2009

The first time I read the name Peter Norvig was when I bought the "Artificial Intelligence: A Modern Approach" book, when I was 18; and for me, he is one of the most brilliant researchers in AI. In this talk, Peter Norvig (also Research Director at Google), resumes some of the last advances in AI and Internet search, which allow us to develop new models to manage huge quantities of data.

Funded PhD position in Dynamic Network Analysis (Ireland)

. Tuesday, August 25, 2009

The Unit for Information Mining and Retrieval (http://uimr.deri.ie) invites applications for a funded PhD Studentship as part of the Clique Research Cluster at DERI. At the Clique Research Cluster (http://www.cliquecluster.org), we are investigating and analysing how very large real-life social networks, on-line forums, biological networks and other networks of interest evolve. Some areas we are interested in include:
  • Analysing how communities in these networks form and change with time;
  • Analysing how information and innovation diffuses and formulating models to describe the observed diffusion behaviour;
  • Analysing churn in online communities and mobile call networks.

The Candidate

We are seeking applications for a PhD candidature in dynamic graph analysis. The successful candidate will analyse how particular properties of the networks change, and use network changes to detect abnormal events or to predict how information diffusion is enhanced or hampered. The successful candidate should have a bachelors degree in computer science, maths, science or engineering, and have the pre-requisites for PhD studies at NUI Galway (http://www.nuigalway.ie).

The PhD studentship covers academic fees and includes a generous stipend for a four year period. In addition, desired, though not necessary, requirements are:

  • Familiarity with basic graph theory (e.g., finding connected components, shortest paths);
  • Familiarity with modelling and simulation;
  • Familiarity with social network analysis;
  • Familiarity with dynamic data analysis (e.g., data streaming algorithms, incremental algorithms);
  • Familiarity with text mining, feature extraction and machine learning;
  • Masters or equivalent degree in graph analysis, modelling or social network analysis.

The successful candidate will work with the PI Dr. Conor Hayes and Dr. Jeffrey Chan as part of the Clique Research Cluster at DERI, NUI Galway.


Interested applicants should send an application with the subject header CLIQUE_PhD_09 to conor.hayes@deri.org. The application should contain a CV, a one page statement explaining how the candidate's background is compatible with the aims of the Clique Research Cluster and a list of references.

Vía Social Media Research Mail-list

MSM09, Deadline Extended until September 6

. Monday, August 17, 2009

The submission deadline for the 1st International Workshop on Mining Social Media has been extended until 6th of September. If you're working on any possible application of data mining techniques, or even recommender systems, information retrieval or any other Information Access technique to Social Media, this is a very good place to submit your work.

New Book: Modelling and Data Mining in Blogosphere

. Friday, July 31, 2009

A new Data Mining for Social Media book has been released. Authored by Nitin Agarwal (University of Arkansas at Little Rock) and Huan Liu (Arizona State University), "This book offers a comprehensive overview of the various concepts and research issues about blogs or weblogs. It introduces techniques and approaches, tools and applications, and evaluation methodologies with examples and case studies".

ISBN: 9781598299083 paperback
ISBN: 9781598299090 ebook

Online version available:

Table of Contents:

  • Chapter 1: Modeling Blogosphere
  • Chapter 2: Blog Clustering and Community Discovery
  • Chapter 3: Influence and Trust
  • Chapter 4: Spam Filtering in Blogosphere
  • Chapter 5: Data Collection and Evaluation
  • Appendix A: Tools in Blogosphere
  • Appendix B: API Examples

The Lemur Query Log Project

. Wednesday, July 29, 2009

Jose Maria Gomez has published in his blog about the Lemur Query Log Project, which is a very interesting iniciative leaded by Dr. Bruce Croft. The Lemir Query Log Project features a toolbar that collect queries and related navigation from users and send it to a database which collects a massive query log that may benefits the IR research community.

Information Retrieval, as most of the subdisciplines related to intelligent information access, relies on the availability of data, more specifically on testing datasets. That's the reason why projects like Lemur Query Log are so important for future researchs and developments.

Frauds in Science

. Saturday, June 13, 2009

A month ago I wrote a post in my Spanish Intelligent Systems blog about frauds in science. In this post I resume what I wrote because I think it can be a good initial point for a debate about the present and future of Science.

Talking about frauds in science can be quite long, there are many little things in the actual scientific process that should be corrected (fake conferences, strange publishing processes, etc.), but I'll focus on big frauds,.

In August 2005, PLoS Medicine published "Why Most Published Research Findings are False", dealing with bad experimental design which conduces to wrong research findings. As stated in this paper, the scientific process is much focused in nobel research, and there is almost no support for research trying to replicate previous results, trying to corroborate previous findings.

In the paper "Repairing research integrity" published in Nature, June 2008, Sandra Titus and her team analyzes the integrity on scientific studies. Based on a survey over more than 4.000 researchers from over 600 institutiones, the results showed more than 200 cases of bad conduct in some scientific study, a number much higher than the registered previously by ORI (Office of Research Integrity). More than 60% of the total meant to data falsification, being plagiarism the next more usual bad conduct detected. Some of these frauds are detected on time, as the Kristin Roovers case, that was discovered by the editors of The Journal of Clinical Investigations when he sent a paper containing some images that had been manipulated with Photoshop.

There exists some regions where frauds are even a bigger problem. This is the case of China, where more than 60% of PhD students admit they have plagiarize some work. This represents a really big problem for China's research and even for the whole scientific community.

Recently, another big fraud in science was discovered, when The Scientist informed that Elsevier, one of the biggest scientific publishers, has several agreements with companiers to publish scientific journals that the companies use to promote their products. The first case detected was the Australasian Journal of Bone and Joint Medicine, where a paper was published promoting a product from Merck, a company that paid Elsevier for designing this journal. Summer Johnson writes about this big fraud in Bioethics, a really recommended lecture.

The scientific community must react to all these things, if we want to preserve the image of science but, what can we do? I think there are several options that could improve the scientific process:

1.- Open Access. The Elsevier case must make us to think that letting companies like Elsevier to control scientific publishing is not a good idea. Open Access seems a good way to prevent science from the desires and interests of big publishing companies. It is also a good way to assure an egalitarian access to scientific results.

2.- We should try to help iniciatives refering to negative results (or less important ones). Journals like Journal of Interesting Negative Results in Natural Language Processing and Machine Learning, Journal of Negative Results on Biomedicine, or Journal of Negative Results, are doing a good job publishing that kind of results.

3.- It also seems very important to improve working conditions of researchers. For instance, in Spain a lot of researchers earn less money that if they were working in a supermarket or driving a taxi, occupations with less responsabilities and less impact in the society. Who can care about doing high quality research if can't give his/her family a decent living?

4.- It is also needed to take up again scientific ethics. As researchers we must value what science word means. Science is not about publishing papers, science is all about improving the global knowledge, science is something really great.

Wikipedia Page Traffic Statistics for DataMiners


Gregory Piatetsky pointed in KDnuggets Twitter account the release of a data package containing 7 months of hourly pageview statistics for all articles in Wikipedia. This dataset has a compressed size over 320 GB, over 1 TB uncompressed, and includes 7 months of hourly page traffic statistics for over 2.5 Million Wikipedia articles. All text content, statistics and link data in the dataset are licensed under GFDL (GNU Free Document License).

1st International Workshop on Mining Social Media

. Wednesday, May 06, 2009

I'm very glad to announce the MSM09 workshop that we are organizing in Sevilla (Spain), November 9. I hope to see some of you there :D


1st International Workshop on Mining Social Media
Workshop at CAEPIA 2009 (http://www.lsi.us.es/caepia09)
November 9, 2009, Sevilla, Spain
Submission deadline: July 31, 2009


Social Media are technological tools that allow users sharing and discuss information. Most Social Media are Internet based applications that manage textual information, as blogs (Blogger, Wordpress), microblogging (Twitter, Pownce), wikis (Wikipedia), forums, or Social Networks (Facebook, MySpace, LinkedIn). But there also exist other Social Media Internet applications where users share more than text, as photo sharing tools (Flickr, Picasa), video sharing (YouTube, Vimeo), livecasting (Ustream), or audio and music sharing (last.fm, ccMixter, FreeSound). More recent Social Media includes virtual worlds (Second Life), online gaming (World of Warcraft, WarHammer Online), game sharing (Miniclip.com) and Mobile Social Media like Nomad Social Networks where users share their current position in the Real World.

Social Media have been able to shift the way information is generated and consumed. At first, information was generated by one person and “consumed” by many people, but now the information is generated by many people and consumed by many people, changing the needs in information access and management. It is also noticeable that Social Media applications manage huge quantities of users and data: Facebook and MySpace manage between 100 and 150 million users, it is estimated that 1 million blog posts are generated each day, microblogging services like Twitter generates 3 million messages each day, YouTube manages more that 150.000 million videos, etc. All these points make clear that Social Media is an excellent application field for data miners.


The Mining Social Media workshop aims to bring practitioners but also researchers with a specific focus on the application of existent or novel Data Mining techniques into the field of Social Media. We encourage the submission of experimental papers where Data Mining techniques are applied into existent Social Media, but also more theoretical papers that show clear application in real Social Media applications. The interesting topics include blog post analysis, blog comments analysis, blog spam, recommender systems for Social Media, clickstream analysis, relevance analysis, spam users detection, behavior analysis, contextual mining, Social Media user segmentation, route analysis for nomad Social Networks, multimedia mining, search in Social Media, etc.

All the submissions must be practical or theoretical applications of Data Mining or Information Management to Social Media (blogs, wikis, social networks, social services, e-mail, etc.).

A. Data Mining for Social Networks
* Novel Algorithms
* Association Rules
* Mining semi-structured data
* Classification and Ranking
* Clustering
* Text Mining
* Machine Learning
* Privacy Preserved Data Mining
* Statistical Methods
* Temporal and spatial data mining
* Parallel and Distributed Data Mining
* Interactive and Online Mining
* Data and Knowledge Visualization
* Multimedia mining (audio/video)
* Ensemble Methods
* Web Mining
* Graph Mining
* Link Mining
B. Information Management for Social Networks
* Recommender Systems
* Information Retrieval
* Sentiment Analysis
* Natural Language Processing
* Question Answering
* Semantic Processing
* Graph Analysis and Complex Networks
* Social Network Analysis
C. Possible applications
* E-Mail Spam Detection
* Blog/Social Networks Spam Detection
* Community Detection
* Users/content recommenders
* Trends discovery
* Blogs/Social Networks Community Dynamics
* User Reviews Ranking
* Blogs/Social Networks Contributions Summarization
* Abuse/Fraud Detection
* User Profile Modeling
* Event Detection and Tracking in Social Media

We welcome contributions through research papers and industrial reports/case studies on applications. We also welcome work-in-progress contributions, as well papers discussing potential research directions.


* Deadline for submission: 31 July 2009
* Notification of acceptance: 17 September 2009
* Workshop: 9 November 2009


* William W. Cohen (Carnegie Mellon University)


The Industry Panel features panelists from several Social Media companies, who will give their vision about needs, opportunities and current solutions of Data Mining and related techniques to real Social Media solutions. Up to date, Strands (http://www.strands.com) and Tuenti (http://www.tuenti.com) have confirmed their presence in this panel.


Submissions must be anonymous. Papers must be sent in a PDF file, written in English and not exceed 12 pages including figures, references, etc. and should be formatted according to the Springer LNAI guidelines. Papers will be reviewed by at least two PC members, and accepted papers will be published in the workshop proceedings. We are currently studying the possibility of publishing Springer post-proceedings and a special issue on a Data Mining journal.

SPONSORS (confirmed as of May 3, 2009):

* MAVIR (http://www.mavir.net)
* UEM (http://www.uem.es)
* Fundación Madri+D para el Conocimiento (http://www.madrimasd.org)
* STRANDS (http://www.strands.com)


* Jose Carlos Cortizo (Social Gaming Platform, Universidad Europea de Madrid) - primary contact (josecarlos.cortizo@wipley.com)
* Francisco Manuel Carrero (Social Gaming Platform, Universidad Europea de Madrid)
* Jose Maria Gomez (Optenet)
* Enrique Puertas (Universidad Europea de Madrid)
* Borja Monsalve (Social Gaming Platform, Universidad Europea de Madrid)


* Nitin Agarwal (University of Arkansas at Little Rock, USA)
* Aris Anagnostopoulos (Sapienza Università di Roma , Italy)
* Raul Arrabales (Universidad Carlos III de Madrid, Spain)
* Dominik Benz (University of Kassel, Germany)
* Paolo Boldi (University of Milano, Italy)
* Iván Cantador (Universidad Autónoma de Madrid & University of Glasgow, United Kingdom)
* Francisco Manuel Carrero (Social Gaming Platform, Spain)
* Pablo Castells (Universidad Autónoma de Madrid, Spain)
* Meeyoung Cha (Max Planck Institute, Germany)
* Yun Chi (Nec Laboratories America, USA)
* Jose Carlos Cortizo (Social Gaming Platform, Spain)
* Debora Donato (Yahoo! Research, Spain)
* Cesar Estebanez (Universidad Carlos III de Madrid, Spain)
* Jose Maria Gomez (Optenet, Spain)
* Antonio Gulli (Ask.com, Italy)
* Viet Ha-Thuc (University of Iowa, USA)
* José Antonio Iglesias Martínez (Universidad Carlos III de Madrid, Spain)
* Akshay Java (MSN Microsoft, USA)
* Gueorgi Kossinets (Google, USA)
* Zornitsa Kozareva (Universidad de Alicante, Spain)
* Beate Krause (University of Kassel, Germany)
* Emmanuelle Lebhar (CNRS & Universidad de Chile, Chile)
* Agapito Ledezma Espino (Universidad Carlos III de Madrid, Spain)
* Kristina Lerman (USC Information Sciences Institute, California, USA)
* Ee-Peng Lim (Singapore Management University, Singapore)
* Stéphane Marchand-Maillet (University of Geneva, Switzerland)
* Luis Martin (Universidad Politecnica de Madrid, Spain)
* Borja Monsalve (Social Gaming Platform, Spain)
* Cesar de Pablo (Universidad Carlos III de Madrid, Spain)
* Manos Papagelis (University of Toronto, Canada)
* Victor Peinado (Universidad Nacional de Educación a Distancia, Spain)
* Enrique Puertas (Universidad Europea de Madrid, Spain)
* Josep Lluis de la Rosa (Universidad de Girona & Rensselaer Polytechnic Institute, USA)
* Sara Owsley Sood (Pomona College, USA)
* Markus Strohmaier (Graz University of Technology, Austria)
* Yaiza Temprado (Universidad Europea de Madrid, Spain)
* Marc Torrens (Strands, Spain)

The Socialization of the WWW

. Thursday, April 23, 2009

This year, WWW2009 is being held in Madrid, a great event and a great place to celebrate the 20th Aniversary of the WWW. The WWW Conference is a great place to study the current opportunities and problems on the Web setup, and that’s why I use to read the abstracts of the accepted papers each years. And there is a visible evolution towards the Social Web in the last 3 years.

In WWW 2007 at Banff (Canada), there was no Social track, only a session called “Mining in Social Networks” under Data Mining track, were 3 papers were presented, but addressing Social Networks as a general paradigm of communication among people. WWW 2008 at Beining (China) had a Social Networks track, containing 3 sessions: “Analysis of Social Networks and Online Interaction Spaces”, “Discovery and Evolution of Communities” and “Applications and Infrastructures for Web 2.0”, and some other papers addressing Social Media as “Finding the Right Facts in the Crowd: Factoid Question Answering over Social Media”.

WWW 2009 clearly shows that the Web is Social. There exist a “Social Networks and Web 2.0” track addressing Social Media and Collaborative Web, with 4 sessions (Recommender Systems, Interactions in Social Communities, Difusion and Search in Social Networks and Photos and Web 2.0) and 12 papers, but there are also, at least, 5 papers in other tracks that address other relevant topics like privacy from a Social Media point of view:
More than 12% of the papers accepted in WWW2009 are related to Social Media, and that’s a very good symptom of the Socialization of the Web.

Aware of Ranking Scam

. Wednesday, April 22, 2009

In the comments of a previous post, Jijung Tang links a very interesting information about online anonymous ranking sites. He points out some considerations about cs.conference-ranking.net and cs-conference-ranking.org. A must read post.

When I wrote the post about Conference Rankings, I didn't take care about who was behind cs-conference-ranking.org. I was said it was a good reference, and much of the conferences seem to be in a logical order. But Jijung is right, it's weird to find IAENG conferences among the most relevant conferences, and to hide the information about the creators of the ranking is not a good signal.

Social Media, Data Mining & Machine Learning


I've been more than two months without updating the blog. A lot of work with my new company, trying to find funding for starting-up, and developing tons and tons of code lines. Now I'll try to update the blog once per week of once each two weeks, at least, but as my research interests have shifted a little bit to Social Media applications, I've decided to change the name of the blog from "Business Intelligence, Data Mining & Machine Learning" o "Social Media, Data Mining & Machine Learning". Probably the name of the blog is not so important, but working now on Social Media I feel more comfortable with that keywork in the title.

SSMS 2009 - Summer School on Multimedia Semantics


** SSMS 2009 - SUMMER SCHOOL ON MULTIMEDIA SEMANTICS ** -- University of Koblenz, Germany --

Managing and Modeling of Multimedia and User Generated Content in Web 3.0

http://www.smart-society.net/ssms09 23-28 August 2009, Koblenz, Germany Application deadline: 15 May 2009 (Friday)


We are pleased to announce 4th edition of Summer School on Multimedia Semantics. This summer school series successfully started in 2006, each year offering top level education at great European locations (Kallithea, Greece; Glasgow, UK; Crete, Greece) for students from all over the world. Summer School on Multimedia Semantics is intended for PhD and Master students, who want to learn more on use of semantics in various media. We offer lectures by leading researchers in the field. Organized student poster sessions will create unique networking opportunities. They are excellent place to present your work, get comments and feedback from senior researchers, or exchange ideas with other participants.


  • Social Media Modeling
  • Audio Processing and Semantics
  • Video Analysis and Semantic Retrieval
  • Multimedia Personalization


  • Prof. Dr. Karlheinz Brandenburg (IDMT Fraunhofer, Germany)
  • Prof. Dr. Andreas Nuernberger (University of Magdeburg, Germany)
  • Prof. Dr. Lynda Hardman (Centrum Wiskunde & Informatica, Netherlands)
  • Prof. Dr. Steffen Staab (University of Koblenz-Landau, Germany)
  • Prof. Dr. Andreas Hotho (University of Kassel, Germany)
  • Dr. Marcel Worring (University of Amsterdam, Netherlands)
  • Prof. Dr. Fabio Ciravegna (University of Sheffield, England)
  • Dr. Yiannis Kompatsiaris (ITI, Greece)
  • Prof. Dr. Noel O`Connor (Dublin City University, Ireland)
  • Dr. Ansgar Scherp (University of Koblenz-Landau, Germany)


We welcome applicants from anywhere in the world. Summer school is mainly addressed for graduate students (Masters and PhD). Each participant is required to bring a poster presenting her/his work. Organized poster sessions will allow to exchange ideas and get feedback on your research. Applications must be send directly to Ruth Götten ( confsec@uni-koblenz.de ) - PDF, ?MsWord or plain text are preferred.

Application must include: - your name, - organization, - name of your supervisor(s), - abstract of your work (limit: 1 page)

More details available at: http://www.smart-society.net/ssms09/index.php?page=registration

CONTACT: confsec@uni-koblenz.de


Koblenz (which means: merging of the rivers) is situated in the picturesque valley of the Rhine and Moselle. Middle Rhine Valley is a UNESCO World Heritage site ( http://www.welterbe-mittelrheintal.de ) City is by four mountain ranges with many vineyards in the valley, its narrow alleyways and happy atmosphere, Koblenz is welcoming town for guests from all over the world. The town has old traditions, reaching over 2000 years to the times of the Roman empire. More at http://en.wikipedia.org/wiki/Koblenz


The Summer School classes will be held at the Campus of the University Koblenz-Landau ( http://www.uni-koblenz.de ). Suggested accommodation is provided by "Contel" Hotel ( http://www.contel-koblenz.de ) in Koblenz, which is conveniently located within walking distance from the Koblenz campus. Various other options for housing in different price ranges are also available - check SSMS'09 website for more details. Study hard and enjoy summer! Each
day of intensive study is followed by a different social event. Participants and lecturers will find many opportunities to talk, socialize and have more direct contact outside of the classroom. On Wednesday we plan a longer, half-day trip outside of Koblenz.


  • Application deadline: 15 May 2009 Acceptance notification: 1 June 2009
  • Registration deadline: 15 June 2009 Summer School: 23 - 28 August


The tuition fee for each student is 350 EUR. It covers lectures, meals, excursion and other organized social events. Accommodation is NOT included in the tuition. If you have questions, please contact Ruth Götten at confsec@uni-koblenz.de


We offer a limited number of student scholarships that cover the tuition fee for the summer school. To apply for a scholarship, please attach a letter to your application (limit 1 page) explaining what you expect to learn during the summer school in Koblenz, motivate why you apply for this scholarship and clarify why you cannot use other funds.


  • Dr. Yiannis Kompatsiaris, ITI, Greece
  • Prof. Dr. Ebroul Izquierdo, Queen Mary University London, England
  • Prof. Dr. Lynda Hardman, CWI, The Netherlands
  • Prof. Dr. Fabio Ciravegna, University of Sheffield, England
  • Prof. Dr. Alan Smeaton, Dublin City University, Ireland


  • Prof. Dr. Steffen Staab
  • Dr. Ansgar Scherp
  • Dr. Marcin Grzegorzek
  • Dr. Maciej Janik
  • Ruth Götten, Dipl.-Päd.

IEEE International Conference on Data Mining 2009

. Saturday, January 31, 2009

December 6-9, 2009 Miami, U.S.A.

The IEEE International Conference on Data Mining (ICDM) has established itself as the world's premier research conference in data mining. The 2009 edition of ICDM provides a leading forum for presentation of original research results, as well as exchange and dissemination of innovative, practical development experiences.

The conference covers all aspects of data mining, including algorithms, software and systems, and applications. In addition, ICDM draws researchers and application developers from a wide range of data mining related areas such as statistics, machine learning, pattern recognition, databases and data warehousing, data visualization, knowledge-based systems, and high performance computing.

By promoting novel, high quality research findings, and innovative solutions to challenging data mining problems, the conference seeks to continuously advance the state-of-the-art in data mining.

Besides the technical program, the conference will feature workshops, tutorials, panels, and the ICDM data mining contest.

Topics of Interest

  • Data mining foundations
    • Novel data mining algorithms in traditional areas (such as classification, regression, clustering, probabilistic modeling, pattern discovery, and association analysis)
    • Models and algorithms for new, structured, data types, such as arising in chemistry, biology, environment, and other scientific domains
    • Developing a unifying theory of data mining
    • Mining sequences and sequential data
    • Mining spatial and temporal datasets
    • Mining textual and unstructured datasets
    • Distributed data mining
    • High performance implementations of data mining algorithms
    • Privacy and anonymity-preserving data analysis
  • Mining in emerging domains
    • Stream data mining
    • Mining moving object data, RFID data, and data from sensor networks
    • Ubiquitous knowledge discovery
    • Mining multi-agent data
    • Mining and link analysis in networked settings: web, social and computer networks, and online communities
    • Mining the semantic web
    • Data mining in electronic commerce, such as recommendation, sponsored web search, advertising, and marketing tasks
  • Methodological aspects and the KDD process
    • Data pre-processing, data reduction, feature selection, and feature transformation
    • Quality assessment, interestingness analysis, and post-processing
    • Statistical foundations for robust and scalable data mining
    • Handling imbalanced data
    • Automating the mining process and other process related issues
    • Dealing with cost sensitive data and loss models
    • Human-machine interaction and visual data mining
    • Integration of data warehousing, OLAP and data mining
    • Data mining query languages
    • Security and data integrity
  • Integrated KDD applications, systems, and experiences
    • Bioinformatics, computational chemistry, ecoinformatics
    • Computational finance, online trading, and analysis of markets
    • Intrusion detection, fraud prevention, and surveillance
    • Healthcare, epidemic modeling, and clinical research
    • Customer relationship management
    • Telecommunications, network and systems management
    • Sustainable mobility and intelligent transportation systems

Important Dates

  • April 13, 2009 - Deadline for workshop proposals
  • June 26, 2009 - Deadline for paper submission, tutorial submission, and panel proposals
  • September 4, 2009 - Notification to authors
  • September 28, 2009 - Deadline for camera-ready copies
  • December 6-9, 2009 Conference

Information Access vs. Information Retrieval

. Tuesday, January 27, 2009

Jose Maria Gomez publishes a very interesting post about the differences of Information Access and Information Retrieval that are not so clear for a lot of people, including researchers of areas distant from IR or IA.

The Future of Social Networks

. Thursday, January 08, 2009

Before Christmas, I wrote a post in my Spanish blog Sistemas Inteligentes (Intelligent Systems) containing some reflexions about the future of Social Networks. I try to resume the main idea and translate to English in this post, as I think Social Networks, and all Social Media are a really interesing field to KDD.

As stated by Wikipedia, "Emergence is the way complex systems and patterns arise out of a multiplicity of relatively simple interactions. Emergence is central to the theories of integrative levels and of complex systems". Some strong ideas in AI are connected with emergence, like Swarm Intelligence that "is a type of artificial intelligence based on the collective behavior of decentralized, self-organized systems. The expression was introduced by Gerardo Beni and Jing Wang in 1989, in the context of cellular robotic systems".

But now, what does emergence have in common with Social Networks? For me there is a clear similarity: both emergent systems and social networks present a group of individuals interacting among them to form something bigger. In emergent systems that group of, let's say stupid or limited, individuals are able to connect among themselves to create some kind of cooperative organism that is more intelligent than the union of the intelligences of the individuals. In today's Social Networks (refering to the Social Networks applications like Facebook or MySpace) we have a really better initial state, we have a group of intelligent individual cooperating among them, but the result is not the expected, because the global information is just the union (or even less, as some information may be duplicated) of the information generated by each user. But it's even worse, the global intelligence of the system is almost null, as today's Social Network systems are all about information and are not trying to create a superior layer of the system by processing all that information and creating real knowledge.

For me, it's clear that the future of Social Networks is about developing systems that generates added value to the users by processing all the information and connexions. Next years, Social Networks will need to use KDD techniques and I'me sure Social Media will become the next big application field for KDD and Machine Learning.