Interactive Topic Analysis with Multi-Lingual Embeddings in Communalytic @ #ICWSM2024

The workshop is part of the 2024 AAAI ICWSM

Where: Buffalo, NY

When: June 3, 2024

The event is organized by the Social Media Lab at Toronto Metropolitan University and hosted as part of the 18th International AAAI Conference on Web and Social Media (ICWSM).

Conference registration is required to participate in the event.

Contact Info

[email protected]
X @SMLabTO


Agenda

  1. Introduction to Communalytic and Data Collection from Social Media (20 min)
  2. Representing Posts as Embeddings  (20 min)
  3. Projecting and Visualizing Embeddings (20 min)
  4. Hands-on Part (60 min)

Participants need a laptop with internet access and a modern web browser to participate in the tutorial. The primary tool to be used during the tutorial is Communalytic, which runs from within a web browser and does not require any additional software.

Upon completion of the tutorial, participants should be able to: 1) collect publicly available social media data from platforms such as Reddit, Telegram and Mastodon using Communalytic, 2) conduct a topic analysis with the collected data.

Objectives

This hands-on tutorial at #ICWSM2024 will introduce users to Communalytic, a research tool developed by the Social Media Lab for studying online communities and discourse. The session will include an overview of Communalytic’s features and a step-by-step guide on using Communalytic’s built-in topic analysis module.

By the end of the tutorial, participants will know how to use a large language model (LLM) to transform social media data into vectors of numbers known as embeddings. The tutorial will also show attendees how to visualize the resulting vectors via Nomic Atlas, a third-party tool that enables users to represent and explore embeddings in an interactive map with labels assigned automatically based on the semantic similarity of the posts’ content.

Considering the interdisciplinary nature of this area, we welcome participants from a wide range of disciplines, including (but not limited to) Information Science, Communication, Education, Journalism, Management, Political Science, Psychology and Sociology.

Background

Current topic modelling techniques such as Latent Dirichlet Allocation (LDA) and BERTopic have limitations in that they often identify abstract topics that can be challenging for human analysts to interpret due to their non-descriptive nature. This is caused in part by the fact LDA and BERTopic are typically defined by a set of tokens and their probabilities (Fig 1). To overcome the limitations of current topic modelling techniques, this tutorial introduces an alternative approach using embeddings and clustering.

This method has a distinct advantage: It allows researchers to view a high-level map of posts clustered based on their semantic similarity while allowing researchers to zoom in on specific clusters and examine the underlying posts (Fig 2).

Fig 1: Example of Topic Modelling Visualization based on LDA.

Fig 2: Example of Visualization of Social Media Posts based on Embeddings.

Organizers

  • Anatoliy Gruzd, Phd
    Canada Research Chair | Co-Director, Social Media Lab | Professor, Information Technology Management, Toronto Metropolitan University, Canada
  • Philip Mai, MA, JD
    Co-Director, Social Media Lab, Ted Rogers School of Management, Toronto Metropolitan University, Canada
  • Amira Ghenai, PhD
    Assistant Professor, Information Technology Management, Toronto Metropolitan University, Canada