Ladies of Data edition 2022: a full chapel and inspiring talksPosted on
A full chapel and inspiring talks during Ladies of Data 2022. On 14 April, 100+ data scientists came together for Ladies of Data, an inclusive event around data science and diversity. This year, the theme was NLP (Natural Language Processing).
In the keynotes, visitors were inspired by two successful leaders in data science, the lessons they learned and their unique perspectives. In her talk, Margot Rozendaal discussed use cases for marketing, advertising and the newsroom at DPG Media. She explained how through topic modelling, transformers and segmentation, DPG analyzes the articles their visitors prefer and are able to personalize their customer journey. Rina Joosten from Seedlink talked about creating human & societal impact through technology and how Seedlink is challenging a billion dollar assessment industry. The presentations were interactive and personal, with lots of questions from the audience.
In the breakout sessions we hosted some interesting presentations and discussions about current hot topics in data science and AI.
What (biases) have language models learned?
by Heleen Rutjes and Emiel van Miltenburg
In this interactive session hosted by Heleen Rutjes en Emiel Miltenburg we discussed bias in language models. We especially talked about language generating models. Language generation models are NLP models which are able to generate text based on given user input. Generally, language generation models are trained on large amounts of (historic) texts. Naturally, these models inherit many of the biases that have been prevailing in our society. We explored some of the consequences of these kind of biases can have and how we can uncover them. One cool example that came about is a tool created by PAIR. You can check it out here: https://pair.withgoogle.com/explorables/fill-in-the-blank/
Automatic Speech Recognition
by Esther van den Berg from Amberscript
With the rise of large pretrained Automatic Speech Recognition, ASR solutions are beginning to become a commodity. Most automatically generated captions, however, do not achieve the level of accuracy necessary to achieve accessibility of audio for non-native speakers or for the Deaf and hard-of-hearing. In this break-out session, Esther discussed the principles of ASR, the importance of audio accessibility and the advantage of human-in-the-loop ASR solutions for achieving better accessibility for all.
Turning NLP techniques into business insights
by Milou Ehrismann and Elena Weber from Underlined
In the breakout session with Underlined, Milou explained how they use NLP techniques to help companies optimize their customer journeys. Underlined integrates text from multiple data sources and enriches this data using advanced NLP algorithms and process mining. Issues such as modeling the complex Dutch language and creating customized stopword lists were covered during the session. Through a topic tree, Underlined gives companies business insights to take actions to improve customer happiness. On top of this, Elena discussed one of the research projects at Underlined in which she is involved. She is investigating how to evaluate NLP techniques used in banks to automatically classify customer feedback into topics. Especially challenging to achieve a good evaluation is the imbalance in topic size.
We would like to thank all speakers and participants for this great event and hope to see you all next year!