HumL

HumL@WWW2018

the first international workshop on Augmenting Intelligence with Humans-in-the-Loop
co-located with TheWebConf (WWW2018)
Workshop date: 24 April 2018
Proceedings: The Web Conference
(HumL workshop series)

Human-in-the-loop is a model of interaction where a machine process and one or more humans have an iterative interaction. In this paradigm the user has the ability to heavily influence the outcome of the process by providing feedback to the system as well as the opportunity to grab different perspectives about the underlying domain and understand the step by step machine process leading to a certain outcome. Amongst the current major concerns in Artificial Intelligence research are being able to explain and understand the results as well as avoiding bias in the underlying data that might lead to unfair or unethical conclusions. Typically, computers are fast and accurate in processing vast amounts of data. People, however, are creative and bring in their perspectives and interpretation power. Bringing humans and machines together creates a natural symbiosis for accurate interpretation of data at scale. The goal of this workshop is to bring together researchers and practitioners in various areas of AI (i.e., Machine Learning, NLP, Computational Advertising, etc.) to explore new pathways of the humanintheloop paradigm.

Keynote Speakers

Elena Simperl
University of Southhampton

Loops of humans and bots in Wikidata

Wikidata is one of most successful knowledge graphs ever created. It expresses knowledge in the form of subject-property-value statements accompanied by provenance information. A project of the Wikimedia Foundation, Wikidata is supported by a community of currently 19 thousand active users and 234 bots, who together are responsible for editing more than 45 million entities since the start of the project in 2012. This makes Wikidata a prime example for what human-in-the-loop technology can achieve. In this talk, we are going to present several studies that aim to understand the links between its socio-technical fabric and its success.

Praveen Paritosh
Google Research

The missing science of knowledge curation (Or, Improving incentives for large-scale knowledge curation)

Dictionaries, encyclopedias, knowledge graphs, annotated corpora, library classification systems and world maps are all examples of human-curated knowledge resources that have been highly valuable to science as well as amortized across multiple large-scale systems in practice. Many of these were started and built even before a crowdsourcing research community existed. While the last decade has seen unprecedented growth in research and practice in building crowdsourcing systems to do increasingly complex tasks at scale, many of these resources are still woefully incomplete—lacking coverage in languages and subject matter domains. Moreover, many knowledge resources needed to fill other semantic gaps for artificial intelligence systems simply don't exist or aren’t being built. Why? I argue that we don’t have the right incentives, and that in order to improve the incentives, we have some fundamental scientific questions to answer. While building a large knowledge resource, we have little more than intuitions when it comes to estimating the reusability, maintainability, and long-term value of the effort. These make it difficult to fund or manage such projects, often requiring herculean personalities or fortunate businesses. Building or expanding a resource is often not seen as “sexy”, which results in lack of resources to answer those questions in any principled manner. These problems begin to outline a new science of curation, making progress on which could help improve the discussion around and funding for building sorely needed knowledge resources.

Program

09:00-10:00 Keynote
Praveen Paritosh, Google Research
The missing science of knowledge curation (Or, Improving incentives for large-scale knowledge curation)
[slides]
10:00-10:20
Amrapali Zaveri, Pedro Hernandez Serrano, Manisha Desai and Michel Dumontier
CrowdED: Guideline for optimal Crowdsourcing Experimental Design
[slides]
10:20-11:00 Coffee Break
11:00-11:20
Alexandros Chortaras, Anna Christaki, Nasos Drosopoulos, Eirini Kaldeli, Maria Ralli, Anastasia Sofou, Arne Stabenau, Giorgos Stamou and Vassilis Tzouvaras
WITH: Human-computer collaboration for data annotation and enrichment
[slides]
11:20-11:40
Giorgio Maria Di Nunzio, Maria Maistro and Federica Vezzani
A Gamified Approach to Naïve Bayes Classification: A Case Study for Newswires and Systematic Medical Reviews
[slides]
11:40-12:00
Wei Sun, Ying Li, Anshul Sheopuri and Thales Teixeira
Computational Creative Advertisements
[slides]
12:00-12:20
Ismini Lourentzou, Daniel Gruhl and Steve Welch
Exploring the efficiency of batch active learning for human-in-the-loop relation extraction
[slides]
12:20-01:40 Lunch Break
01:40-02:40 Keynote
Elena Simperl, University of Southhampton
Loops of humans and bots in Wikidata
[slides]
02:40-03:00
Lora Aroyo and Chris Welty
The Quantum Collective
[slides]
03:00-03:40 Coffee Break
03:40-04:00
Roberto Enea, Maria Teresa Pazienza, Andrea Turbati and Alessandro Colantonio
How to support human operator in uncertainty managing during the ontology learning process
[slides]
04:00-04:10
Rafael Zequeira Jiménez, Laura Fernández Gallardo and Sebastian Möller
Outlier Detection vs. Control Questions to ensure reliable results in Crowdsourcing. A Speech Quality Assessment Case Study
[slides]
04:10-04:30 Closing Session

Call for Contributions

Topics

Human Factors:

Humancomputer cooperative work
Mobile crowdsourcing applications
Human Factors in Crowdsourcing
Social computing
Ethics of Crowdsourcing
Gamification techniques

Data Collection:

Data annotations task design
Data collection for specific domains (e.g. with privacy constraints)
Data privacy
Multilinguality aspects

Machine Learning:

Dealing with sparse and noisy annotated data
Crowdsourcing for Active Learning
Statistics and learning theory

Applications:

Healthcare
NLP technologies
Translation
Data quality control
Sentiment analysis

The proceedings of the workshop will be published online (open access) and through ACM Digital Library, as a companion volume of The Web Conf.
Papers must be submitted in PDF according to the new ACM format published in ACM guidelines, selecting the generic “sigconf” sample. Submissions should not exceed 8 pages including any diagrams or appendices and references. The PDF files must have all non-standard fonts embedded. Submissions must be self-contained and in English.
Please submit your contributions to EasyChair and select "Augmenting Intelligence with Humans-in-the-Loop Workshop".

Workshop Report

The old paradigm of computing - where machines do something for the humans - has changed: more and more humans and machines are working with and for each other, in a partnership. We can see the effectiveness of this paradigm in many areas, ranging from human computation (where humans do some of the computation in place of the machines), computer-supported cooperative work, social computing, web search, recommender systems, computer-mediated communication, to name a few. The HumL workshop focused on research questions around the design of web information systems that keep humans in the loop. The three main categories of topics covered in this edition of the workshop were human factors, data collections and applications.
The paper presentations covered a wide range of topics related to the efficient and effective combination of the strong sides of both machine and crowd computation. Overall the workshop presented a nicely diverse set of use cases and domains where these topics were studied. Empirical results were provided and discussed with respect to (1) methods for data quality ensurance and labeling task efficiency, (2) the role of gamification elements for improving crowd perfomance as well as the role of quantum mathematics to simulate human behavior.
The workshop was opened in the morning by a very interesting invited talk by Praveen Paritosh from Google Research, talking about the missing science of knowledge curation. Praveen discussed the use of the right incentives to motivate human contribution to the creation of knowledge resources and intruduced three types of data for training purposes: found data, bought data and vested data and argued on the long term value of producing high quality vested data resources.
Amrapali Zaveri discussed findings about optimal amount of workers and tasks per worker in a parallelized and decentralized crowdsourcing setting. They introduce a two-staged statistical guideline, CrowdED, for optimal crowdsourcing experimental design in order to a-priori estimate optimal worker and task assignments to obtain maximum accuracy on all tasks.
Anastasia Sofou presented their work on making Cultural Heritage more accessible and reusable. They introduce WITH, an aggregation platform that provides enhanced services and enables human-computer collaboration for data annotations and enrichment. Specifically, the system combines machine intelligence from image and text analysis and human intelligence through gamified crowdsourcing.
Giorgio Maria Di Nunzio described experiments for the classification of newswires and systematic medical reviews, where authors use gamification techniques to transform labelling tasks into an interactive learning process.
Wei Sun presented results of field experiments on 169 television advertisements and 2334 participants, with a dynamic Bayesian network (DBN)-based system assisting human designers produce effective advertisements with predictable outcomes (e.g. linking advertisement content, viewers emotional responses and effectiveness metrics for ad avoidance, sharing and influence on purchase).
Steve Welch presented their study on the tradeoff for machine learning model performance in terms of batch size of the requested annotation labels and the training time in the domain of real-time domain specific relation extraction.
The afternoon session was opened by Elena Simperl from the University of Southampton, who generated lots of interest with her research on how humans and bots contribute together to the development of the Wikidata knowledge graph.
Chris Welty outlined a new and more general approach to recognizing context, grounded in a fairly simple intuition: is far more appropriate to model human behavior in human in the loop systems using notions from quantum mechanics, such as superpositions of states.
Maria Teresa Pazienza presented a hybrind human and machine computation framework for ontology learning which combines machine extraction of triples from heterogeneous sources and human validation of these.
Rafael Zequeira discussed the results from a speech quality assessment study in the context of data reliability in crowdsourcing. Specifically, the authors focus on how ``trapping questions'' or ``outlier detection'' assure reliable results, and show that only a combination of both techniques provides quality improvement.

HumL@WWW2018

the first international workshop on Augmenting Intelligence with Humans-in-the-Loop
co-located with TheWebConf (WWW2018)
Workshop date: 24 April 2018
Proceedings: The Web Conference
(HumL workshop series)

Keynote Speakers

Program

Important Dates

Call for Contributions

Topics

Organization

Lora Aroyo VU University Amsterdam

Gianluca Demartini University of Queensland, Australia

Anna Lisa Gentile IBM Research Almaden

Chris Welty Google

Program Committee

Workshop Report

Contact Us

HumL@WWW2018

the first international workshop on Augmenting Intelligence with Humans­-in-­the-­Loop co-located with TheWebConf (WWW2018) Workshop date: 24 April 2018 Proceedings: The Web Conference (HumL workshop series)

Keynote Speakers

Program

Important Dates

Call for Contributions

Topics

Organization

Lora Aroyo VU University Amsterdam

Gianluca Demartini University of Queensland, Australia

Anna Lisa Gentile IBM Research Almaden

Chris Welty Google

Program Committee

Workshop Report

Contact Us

the first international workshop on Augmenting Intelligence with Humans-in-the-Loop
co-located with TheWebConf (WWW2018)
Workshop date: 24 April 2018
Proceedings: The Web Conference
(HumL workshop series)