EMBEDDIA tools are standing out on international challenges!
The results in multilingual and social media information of our semantic enrichment tools outperformed all other participants in the official rankings in all languages in:
We are organising a free workshop on modern NLP through large pretrained language models on Tuesday, September 29th, at the Faculty of Computer and Information Science in Ljubljana, Slovenia.
Zero-Shot Learning for Cross-Lingual News Sentiment Classification
"Given the annotated dataset of positive, neutral, and negative news in Slovene, the aim is to develop a news classification system that assigns the sentiment category not only to Slovene news, but to news in another language without any training data required. Our system is based on the multilingual BERTmodel."
Automating News Comment Moderation with Limited Resources: Benchmarking in Croatian and Estonian
"This article describes initial work into the automatic classification of user-generated content in news media to support human moderators. We work with real-world data — comments posted by readers under online news articles — in two less-resourced European languages, Croatian and Estonian."
Authors: Ravi Shekhar, Marko Pranjić, Senja Pollak, Andraž Pelicon, Matthew Purver
Automated Journalism as a Source of and a Diagnostic Device for Bias in Reporting
"We describe how systems for automated journalism could be biased in terms of both the information content and the lexical choices in the text, and what mechanisms allow human biases to affect automated journalism even if the data the system operates on is considered neutral."
Authors: Leo Leppänen, Hanna Tuulonen, Stefanie Sirén-Heikel
A Multilingual Study of Multi-Sentence Compression using Word Vertex-Labeled Graphs and Integer Linear Programming
"This paper describes an Integer Linear Programming method for MSC using a vertex-labeled graph to select different keywords, with the goal of generating more informative sentences while maintaining their grammaticality. Our system is of good quality and outperforms the state of the art for evaluations led on news datasets in three languages: French, Portuguese and Spanish."
Authors: Elvys Linhares Pontes, Stephane Huet, Juan-Manuel Torres-Moreno, Thiago G. da Silva, Andrea Carneiro Linhares