{"id":23002,"date":"2024-11-19T18:40:57","date_gmt":"2024-11-19T18:40:57","guid":{"rendered":"https:\/\/socialmedialab.ca\/web\/?p=23002"},"modified":"2025-06-17T17:01:48","modified_gmt":"2025-06-17T17:01:48","slug":"introducing-the-new-3d-topic-analyzer-module-in-communalytic","status":"publish","type":"post","link":"https:\/\/socialmedialab.ca\/web\/2024\/11\/19\/introducing-the-new-3d-topic-analyzer-module-in-communalytic\/","title":{"rendered":"Introducing the New Topic Analyzer\u00a0in Communalytic"},"content":{"rendered":"\n<div class=\"wp-block-media-text is-stacked-on-mobile is-vertically-aligned-top\" style=\"grid-template-columns:30% auto\"><figure class=\"wp-block-media-text__media\"><img decoding=\"async\" width=\"1024\" height=\"1024\" src=\"https:\/\/socialmedialab.ca\/web\/wp-content\/uploads\/2024\/09\/20240918_141029-1024x1024.jpg\" alt=\"\" class=\"wp-image-23022 size-full\" srcset=\"https:\/\/socialmedialab.ca\/web\/wp-content\/uploads\/2024\/09\/20240918_141029-1024x1024.jpg 1024w, https:\/\/socialmedialab.ca\/web\/wp-content\/uploads\/2024\/09\/20240918_141029-300x300.jpg 300w, https:\/\/socialmedialab.ca\/web\/wp-content\/uploads\/2024\/09\/20240918_141029-150x150.jpg 150w, https:\/\/socialmedialab.ca\/web\/wp-content\/uploads\/2024\/09\/20240918_141029-768x768.jpg 768w, https:\/\/socialmedialab.ca\/web\/wp-content\/uploads\/2024\/09\/20240918_141029-1536x1536.jpg 1536w, https:\/\/socialmedialab.ca\/web\/wp-content\/uploads\/2024\/09\/20240918_141029-2048x2048.jpg 2048w, https:\/\/socialmedialab.ca\/web\/wp-content\/uploads\/2024\/09\/20240918_141029-696x696.jpg 696w, https:\/\/socialmedialab.ca\/web\/wp-content\/uploads\/2024\/09\/20240918_141029-1068x1068.jpg 1068w, https:\/\/socialmedialab.ca\/web\/wp-content\/uploads\/2024\/09\/20240918_141029-420x420.jpg 420w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure><div class=\"wp-block-media-text__content\">\n<h3 class=\"wp-block-heading\"><strong>Discovering Latent Topics <\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">We are thrilled to announce the release of the new Topic Analyzer\u00a0in <a href=\"https:\/\/communalytic.org\/\" target=\"_blank\" rel=\"noreferrer noopener\">Communalytic<\/a>, a computational social science research tool designed to study online communities and public discourse on social media. Communalytic\u00a0is entirely web-based and requires no special programming or coding experience.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The Topic Analyzer can <strong>automatically group social media posts that are&nbsp;<a href=\"https:\/\/en.wikipedia.org\/wiki\/Semantic_similarity\" target=\"_blank\" rel=\"noreferrer noopener\">semantically similar<\/a>&nbsp;<\/strong>(i.e., similar in their meaning). The tool is designed to expedite the analysis of a large corpus of text data without the need to read and review every post.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Researchers can use the Analyzer to discover latent topics in a dataset, i.e., abstract topics that may not be directly observable from reading the posts alone. It can also be used to discover communities of users who share a similar interest in a topic but do not necessarily communicate with each other.<\/p>\n<\/div><\/div>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Creating Embeddings with Social Media Posts<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">The\u00a0Topic Analyzer\u00a0<span style=\"box-sizing: border-box; margin: 0px; padding: 0px;\">utilizes sentence-transformer models to convert social media posts into computer-readable<a href=\"https:\/\/huggingface.co\/blog\/getting-started-with-embeddings\" target=\"_blank\">\u00a0vector embeddings<\/a>, enabling the capture of the semantic meaning of these<\/span> posts. For more information on embedding, see\u00a0<a href=\"https:\/\/developers.google.com\/machine-learning\/crash-course\/embeddings\/video-lecture\" target=\"_blank\" rel=\"noreferrer noopener\">here<\/a>\u00a0and\u00a0<a href=\"https:\/\/huggingface.co\/blog\/getting-started-with-embeddings\" target=\"_blank\" rel=\"noreferrer noopener\">here<\/a>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Communalytic generates embeddings from social media posts using a multilingual text embedding model from <a href=\"https:\/\/docs.voyageai.com\/docs\/embeddings\" target=\"_blank\" rel=\"noreferrer noopener\">VoyageAI<\/a>: <strong>Voyage-3-lite<\/strong> (512 dimensions) in <strong>Communalytic EDU<\/strong> and <strong>Voyage-3<\/strong> (1024 dimensions) in <strong>Communalytic PRO<\/strong>. These general-purpose models are optimized for multilingual retrieval, making them ideal for identifying semantic similarities between sentences in a dataset. These models were selected because they <a href=\"https:\/\/blog.voyageai.com\/2024\/09\/18\/voyage-3\/\" target=\"_blank\" rel=\"noreferrer noopener\">outperform<\/a>&nbsp;similar models for creating embeddings from texts in <strong>27 different languages<\/strong>:&nbsp;<em>Arabic, Bengali, Czech, Danish, Dutch, English, French, Georgian, German, Greek, Hungarian, Italian, Japanese, Korean, Kurdish, Norwegian, Persian, Polish, Portuguese, Russian, Slovak, Spanish, Swedish, Thai, Turkish, Urdu, Vietnamese<\/em>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Auto-clustering of Semantically Similar Posts<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">After transforming posts into embeddings, Communalytic uses a dimension reduction technique called <strong><strong><span style=\"box-sizing: border-box; margin: 0px; padding: 0px;\"><span style=\"box-sizing: border-box; margin: 0px; padding: 0px;\"><span style=\"box-sizing: border-box; margin: 0px; padding: 0px;\"><a href=\"https:\/\/medium.com\/@aeonaten\/understanding-umap-uniform-manifold-approximation-and-projection-cede51c477d9\" target=\"_blank\" rel=\"noreferrer noopener\">UMAP<\/a><\/span><\/span><\/span><\/strong><\/strong> to compress the embeddings from 512 or 1024 dimensions down to three dimensions. These reduced embeddings are then visualized with Communalytic&#8217;s interactive <strong>3D Semantic Similarity Map<\/strong>, where each dot represents a post. Semantically similar posts are automatically grouped into clusters using one of three clustering algorithms: <strong>HDBScan<\/strong>, <strong>KMeans<\/strong>, or <strong>Gaussian Mixture<\/strong>. Each cluster represents a distinct latent topic. Dots within the same cluster are displayed in the same colour, making it easy to distinguish between topics.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For instance, consider a dataset containing three posts:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>&#8220;I like apples.&#8221;<\/li>\n\n\n\n<li>&#8220;I hate oranges.&#8221;<\/li>\n\n\n\n<li>&#8220;I need a car.&#8221;<\/li>\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\">The <strong>Topic Analyzer<\/strong> would cluster the first two posts into a single group because they share semantic similarity\u2014they both discuss preferences related to fruits (despite expressing opposite sentiments). The third post, however, would form its own cluster, as it focuses on an unrelated topic (a need for a car). <em>This demonstrates how the Topic Analyzer attempts to group content based on meaning, rather than just surface-level keywords.<\/em><\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Auto-Labelling of Semantically Similar Posts<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">After the posts are clustered and visualized with the <strong>3D Semantic Similarity Map<\/strong>, researchers can review each cluster manually and assign a descriptive label. Alternatively, they can use one of the available LLMs (e.g., llama-3.1 or mistral-7b) to suggest a topic label for each cluster automatically.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>A preview of the 3D Semantic Similarity&nbsp;Map<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Below is a video demo of the <strong>Topic Analyzer<\/strong> and the companion <strong>3D Semantic Similarity Map<\/strong>.<\/p>\n\n\n\n<div style=\"height:40px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-video\"><video height=\"1080\" style=\"aspect-ratio: 1920 \/ 1080;\" width=\"1920\" controls src=\"https:\/\/socialmedialab.ca\/web\/wp-content\/uploads\/2024\/09\/communalytic-topic-analysis_w-audio.mp4\"><\/video><\/figure>\n\n\n\n<div style=\"height:40px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><a href=\"https:\/\/communalytic.org\/\"><img decoding=\"async\" width=\"1024\" height=\"231\" src=\"https:\/\/socialmedialab.ca\/web\/wp-content\/uploads\/2024\/05\/Communalytic-logo-2-1024x231.png\" alt=\"\" class=\"wp-image-22800\" style=\"width:474px;height:auto\" srcset=\"https:\/\/socialmedialab.ca\/web\/wp-content\/uploads\/2024\/05\/Communalytic-logo-2-1024x231.png 1024w, https:\/\/socialmedialab.ca\/web\/wp-content\/uploads\/2024\/05\/Communalytic-logo-2-300x68.png 300w, https:\/\/socialmedialab.ca\/web\/wp-content\/uploads\/2024\/05\/Communalytic-logo-2-768x173.png 768w, https:\/\/socialmedialab.ca\/web\/wp-content\/uploads\/2024\/05\/Communalytic-logo-2-696x157.png 696w, https:\/\/socialmedialab.ca\/web\/wp-content\/uploads\/2024\/05\/Communalytic-logo-2-1068x241.png 1068w, https:\/\/socialmedialab.ca\/web\/wp-content\/uploads\/2024\/05\/Communalytic-logo-2.png 1476w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>About Communalytic<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/communalytic.org\/\">Communalytic <\/a>is a no-code computational social science research tool for studying online communities and public discourse on social media. It is designed to provide researchers, journalists, and students with essential resources and infrastructure for conducting independent, public-interest research. It has a full suite of easy-to-use social media data collectors and analyzers \u2013 no coding is required. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Users can\u00a0<a href=\"https:\/\/communalytic.org\/docs\/tutorial-importing-data-into-communalytic-from-csv\/\">bring their own data<\/a>\u00a0or use one of Communalytic\u2019s various\u00a0<a href=\"https:\/\/communalytic.org\/docs\/learn-more-data-collection\/\">social media data collectors<\/a> to collect data from platforms such as Bluesky, Mastodon, Reddit, Telegram, X (formerly Twitter), and YouTube.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">There are two versions of Communalytic. Each is designed for different purposes and different sets of users:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Communalytic EDU&nbsp;<\/strong>is designed to help students learn about social media data analytics.<\/li>\n<\/ul>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Communalytic PRO<\/strong>&nbsp;is designed for academic researchers and journalists and is ideal for large-scale research projects.&nbsp;<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>About Communalytic&#8217;s Data Analyzer Modules<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Communalytic also comes with a set of built-in data analytics modules, including a:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Civility Analyzer<\/strong>\u00a0that can identify toxic and prosocial interactions in a dataset using the latest machine-learning models (Perspective API and Detoxify),<\/li>\n\n\n\n<li><strong>Sentiment Analyzer<\/strong> that can\u00a0calculate sentiment polarity scores to determine whether the text in a dataset expresses a positive, negative or neutral sentiment,<\/li>\n\n\n\n<li><strong>Topic Analyzer\u00a0<\/strong>that can automatically group social media posts that are\u00a0semantically similar\u00a0to identify latent topics in a dataset (i.e., abstract topics that may not be directly observable from just reading the posts),<\/li>\n\n\n\n<li><span style=\"box-sizing: border-box; margin: 0px; padding: 0px;\"><span style=\"box-sizing: border-box; margin: 0px; padding: 0px;\"><strong>Network Analyzer<\/strong>\u00a0that can generate and visualize various types of networks in a dataset, including signed and unsigned communication network<\/span>s, as well as link-sharing networks.<\/span> A signed network is one where the nodes and edges carry additional information such as weights (i.e., toxicity and prosocial scores or sentiment scores)<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">These Analyzers can automatically:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Detect antisocial (Toxicity, Insults, Threats &#8230; ) and prosocial interactions (Compassion, Curiosity and Respect &#8230;) in any text-based dataset,<\/li>\n\n\n\n<li>Assess sentiments in online discourse (i.e., opinion mining),<\/li>\n\n\n\n<li>Group together social media posts that are semantically similar and identify latent topics, uncovering hidden communities of users who share an interest in a topic but may not know each other or have ever communicated with one another.<\/li>\n\n\n\n<li>Find out who talks to whom, who shares whose contents, who shares the same links or resources, etc\u2026<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">When used together, these analytical modules can be used to study online communities and influencers, map shared interests among community members, study the spread of misinformation and disinformation, and detect signs of possible coordination among seemingly disparate actors.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Discovering Latent Topics We are thrilled to announce the release of the new Topic Analyzer\u00a0in Communalytic, a computational social science research tool designed to study online communities and public discourse on social media. Communalytic\u00a0is entirely web-based and requires no special programming or coding experience. The Topic Analyzer can automatically group social media posts that are&nbsp;semantically [&hellip;]<\/p>\n","protected":false},"author":5,"featured_media":23022,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[495,41,265,554,264],"tags":[615,614,515],"class_list":["post-23002","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-analytics","category-announcements","category-research","category-research-tools","category-web-apps","tag-embeddings","tag-topic-analysis","tag-topic-modelling"],"_links":{"self":[{"href":"https:\/\/socialmedialab.ca\/web\/wp-json\/wp\/v2\/posts\/23002","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/socialmedialab.ca\/web\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/socialmedialab.ca\/web\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/socialmedialab.ca\/web\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/socialmedialab.ca\/web\/wp-json\/wp\/v2\/comments?post=23002"}],"version-history":[{"count":41,"href":"https:\/\/socialmedialab.ca\/web\/wp-json\/wp\/v2\/posts\/23002\/revisions"}],"predecessor-version":[{"id":23622,"href":"https:\/\/socialmedialab.ca\/web\/wp-json\/wp\/v2\/posts\/23002\/revisions\/23622"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/socialmedialab.ca\/web\/wp-json\/wp\/v2\/media\/23022"}],"wp:attachment":[{"href":"https:\/\/socialmedialab.ca\/web\/wp-json\/wp\/v2\/media?parent=23002"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/socialmedialab.ca\/web\/wp-json\/wp\/v2\/categories?post=23002"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/socialmedialab.ca\/web\/wp-json\/wp\/v2\/tags?post=23002"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}