We are thrilled to announce the release of the Topic Analysis Module in Communalytic, a computational social science research tool for studying online communities and discourse analysis.
The Topic Analysis Module can automatically identify and group together social media posts that are semantically similar and can be used to spot latent topics in a dataset (i.e., abstract topics that may not be directly observable from just reading the posts).
The module is entirely web-based and requires no programming skills to use. It is designed to help researchers make sense of their social media dataset without having to scroll through endless Excel files, read every post or even have prior knowledge about the content of the dataset.

The module uses sentence-transformer models from Hugging Face (a platform and community repository for sharing machine learning models) to transform human-readable textual data such as social media posts into computer-readable vectors of numbers known as embeddings.
To visualize the resulting embeddings, the module uses Nomic Atlas, a third-party tool that allows users to explore embeddings in a 2D multi-dimensional space, turning the vectors into an interactive topic map with automatic topic labels added to each grouping of posts. Posts that are located close to each other in a multi-dimensional space are considered semantically similar (i.e., similar in their meaning).
Researchers have the option of using the suggested labels for each groupings or clusters of semantically similar posts in a dataset. However, researchers can also override the suggested labels and manually explore posts and relabel the clusters as desired. Once semantically similar posts are grouped and visualized, researchers can use the resulting topic map to examine their dataset and uncover latent topics.
If you are interested in learning more, here are a few helpful links to get you started with this new module in Communalytic:
About Communalytic

Communalytic is a computational social science research tool for studying online communities and discourse. It can collect, analyze, and visualize publicly available data from various social media platforms including Reddit, Telegram, YouTube, Facebook/Instagram (via CrowdTangle) and Twitter, or from a user-uploaded CSV file.
Communalytic contains a suite of advanced data analytics modules including: a Toxicity Analyzer, a Sentiment Analyzer, a Topic Analyzer and a built-in Network Analyzer. These modules can be used to automatically:
- detect anti-social interactions (i.e., harassment, hate speech, extremist content, etc.),
- assess sentiments in online discourse,
- identify and group together social media posts that are semantically similar and identify latent topics within your dataset,
- generate and visualize various types of networks, including communication and link-sharing networks, which in turn can be used to identify influencers, map shared interests among online actors, study the spread of mis/dis-information and detect signs of possible coordination among seemingly disparate actors.
For more details, see Communalytic’s Tutorials page.