Telefon: +43(0) 662 834 602 - 0

Follow RSA

Universitäre Partner

© RSA FG - Research Studios Austria
Forschungsgesellschaft mbH
2019 All rights reserved

Interview mit dem Studioleiter über Big Data Science

Mihai Lupu is an internationally acknowledged expert in information retrieval and machine learning as well as cognitive and big data analytics. He graduated from the National University of Singapore with a doctoral thesis called "Generic Peer-to-Peer Networks: Theory, Implementation and Applications in Image Retrieval Using the Wavelet Transform" in 2008. Currently, Dr. Lupu is heading the Research Studio Data Science located in Salzburg and Vienna. Please read a current interview with him.
Photo: The Data Science team in 2018, from left to right Bernd Ivanschitz, Mihai Lupu, Alexandros Bampoulidis & Aziz Abdel Taha.
How did it come that you got into data science?
Even during my BSc times I had been interested in Machine Learning and in particular in text processing. The BSc thesis was on Information Retrieval, which I then expanded in my MSc thesis to Peer-to-Peer information retrieval and then, for my PhD, to more in-depth study of the nature of peer-to-peer networks. One should not think of this as networks in the sense of physical cables and hardware routers, but rather in terms of methods to locate data of interest in the absence of complete information about either the location of the data or of the data itself. The step from here to a more generic application domain is not that large.
Which new prototype is the studio Data Science currently building?
One prototype we’ve recently developed is a trademark classification system. Trademarks are an important component of Intellectual Property protection and there exists an international classification that the world has agreed upon for all trademarks. This is called the NICE classification because it results of the NICE Agreement of 1957. The Nice Agreement establishes a classification of goods and services for the purposes of registering trademarks and service marks (the Nice Classification). The trademark offices of Contracting States must indicate, in official documents and publications in connection with each registration, the numbers of the classes of the Classification to which the goods or services for which the mark is registered belong.
In a short project with WIPO (The World Intellectual Property Organisation) we have developed a classifier for a new trademark appearing before the organization.
What is this prototype able to fulfill?
The prototype outputs, for any text given as input, a probability distribution over the set of NICE classes. For the graphical output, the prototype only presents the top three predicted classes, together with the parts of the text that most contributed to that prediction.
Why is this prototype superior to existing protoypes under development or products already in the market?
The version that existed before at WIPO was essentially based on keyword matching. We are using an adapted state-of-the-art Convolutional Neural Network for the classification. Furthermore, we have also added a post-processing step, where, for each predicted class, the text provided as input is highlighted in such a way as to assist the user in determining why that class was predicted.
Which problem does this prototype solve?
When considering applying for a new trademark, both the applicant, as well as the officer receiving the application, must make sure that the trademark is properly classified against the NICE classification in order to make sure that appropriate protection is guaranteed.
Which are the main ingredients this prototype?
The main ingredient is an adapted Convolutional Neural Network trained on English and Spanish trademark texts.
Why does this prototype raise added value for users?
In addition to the classification it provides, the prototype gives indications as to why that prediction was made. This is a major benefit of the prototype because it overcomes the typical opacity of modern Artificial Neural Networks models.
Five top publications:
Alexandros Bampoulidis, João R. M. Palotti, Mihai Lupu, Jon Brassey, Allan Hanbury: Does Online Evaluation Correspond to Offline Evaluation in Query Auto Completion? ECIR 2017: 713-719

Aldo Lipani, João R. M. Palotti, Mihai Lupu, Florina Piroi, Guido Zuccon, Allan Hanbury: Fixed-Cost Pooling Strategies Based on IR Evaluation Measures. ECIR 2017: 357-368

Navid Rekabsaz, Mihai Lupu, Allan Hanbury: Exploration of a Threshold for Similarity Based on Uncertainty in Word Embedding. ECIR 2017: 396-409
Aldo Lipani, Mihai Lupu, Allan Hanbury: Visual Pool: A Tool to Visualize and Interact with the Pooling Method. SIGIR 2017: 1321-1324
Navid Rekabsaz, Mihai Lupu, Allan Hanbury, Hamed Zamani: Word Embedding Causes Topic Shifting; Exploit Global Context! SIGIR 2017: 1105-1108

Telefon: +43(0) 662 834 602 - 0