View this email in your browser


September, 2020
EMBEDDIA team meeting across six European countries.

We are proud to announce EMBEDDIA has successfully passed the mid-term project review!

During our first 18 months we have:

  • achieved huge progress in cross-lingual approaches for less-resourced languages and the news media domain,

  • gained excellent scientific results in basic NLP research and applied research in news media,

  • presented our work in 17 scientific articles, 33 conference/workshop proceedings and on 26 events. 


!! EMBEDDIA software on GitHub !!

EMBEDDIA tools are standing out on international challenges!

The results in multilingual and social media information of our semantic enrichment tools outperformed all other participants in the official rankings in all languages in:

Our multilingual fake news spreader model (in English and Spanish) came out 3rd (out of 66 participants) at this year's PAN

We are organising a free workshop on modern NLP through large pretrained language models on Tuesday, September 29th, at the Faculty of Computer and Information Science in Ljubljana, Slovenia. 



  • Zero-Shot Learning for Cross-Lingual News Sentiment Classification
"Given the annotated dataset of positive, neutral, and negative news in Slovene, the aim is to develop a news classification system that assigns the sentiment category not only to Slovene news, but to news in another language without any training data required. Our system is based on the multilingual BERTmodel."

Authors: Andraž Pelicon, Marko Pranjić, Dragana Miljković, Blaž Škrlj, Senja Pollak
  • Automating News Comment Moderation with Limited Resources: Benchmarking in Croatian and Estonian
"This article describes initial work into the automatic classification of user-generated content in news media to support human moderators. We work with real-world data — comments posted by readers under online news articles — in two less-resourced European languages, Croatian and Estonian."

Authors: Ravi Shekhar, Marko Pranjić, Senja Pollak, Andraž Pelicon, Matthew Purver 
  • Automated Journalism as a Source of and a Diagnostic Device for Bias in Reporting
"We describe how systems for automated journalism could be biased in terms of both the information content and the lexical choices in the text, and what mechanisms allow human biases to affect automated journalism even if the data the system operates on is considered neutral."
Authors: Leo Leppänen, Hanna Tuulonen, Stefanie Sirén-Heikel
  • A Multilingual Study of Multi-Sentence Compression using Word Vertex-Labeled Graphs and Integer Linear Programming
"This paper describes an Integer Linear Programming method for MSC using a vertex-labeled graph to select different keywords, with the goal of generating more informative sentences while maintaining their grammaticality. Our system is of good quality and outperforms the state of the art for evaluations led on news datasets in three languages: French, Portuguese and Spanish."

Authors: Elvys Linhares Pontes, Stephane Huet, Juan-Manuel Torres-Moreno, Thiago G. da Silva, Andrea Carneiro Linhares
Copyright © *|2020|* *|Department of Knowledge Technologies – EMBEDDIA project, Jožef Stefan Institute, Jamova cesta 39, 1000 Ljubljana, Slovenia|*, All rights reserved.

Our mailing address is:

This email was sent to <<Email Address>>
why did I get this?    unsubscribe from this list    update subscription preferences
EMBEDDIA · Jamova cesta 39 · Department of Knowledge Technologies – EMBEDDIA project, Jožef Stefan Institute · Ljubljana 1000 · Slovenia

Email Marketing Powered by Mailchimp