If you are using Netlytic to collect Twitter data for your research, here’s how to check to see how many tweets in your dataset have been deleted (by a user or the platform).
Step 1: Install a free text editor that supports regular expressions such as Nodepad++
Step 2: Download your dataset from Netlytic as an Excel or CSV file and open it in Excel
Step 3: Copy the column called “guid” from the Excel file into a new text file in Nodepad++
Step 4: in Nodepad++
- delete the first row with “guid”
- using the Search & Replace function, find “https://twitter.com/” (no quotes) and replace it with an empty character “” (no quotes)
- using the Search & Replace function, find “/statuses” (no quotes) and replace it with an empty character “” (no quotes)
- using the Search & Replace function in the regexp mode (check the appropriate check box), find the following regular expression ^[^\n\t\r\/]+\/ and replace it with an empty character “” (no quotes):
Step 4: in Nodepad++, save it as a new text file (let’s call it “ids.txt”; this file will include a list of unique tweet ids.
Step 5: Install a free Hydrator app
Step 6: in Hydrator:
- Link your Twitter account to Hydrator (under Settings)
- Open the “ids.txt” file in Hydrator by adding it as a new dataset.
- Click the “Add Dataset” button
Step 7: Continue in Hydrator:
- Run the collector by clicking the “Start” button.
- Once ready, save the dataset by clicking on the “CSV” button. The resulting file will include all original tweets/retweets (minus those that have been deleted either by the users or the platform). A side benefit of this process is that the original dataset is now enriched with additional metadata elements that have not been originally collected by Netlytic (such as the number of times a tweet has been retweeted).
Step 8: Once the recollection process is complete, click on the green bar to see how many tweets have been deleted from your original dataset. For example, in the sample dataset 4% of tweets have been deleted.
PS. To check if a tweet was deleted by its creator or by the platform, try the following python script: https://github.com/DocNow/twarc/blob/master/utils/deletes.py
This post was originally posted on the Netlytic.org website at https://netlytic.org/home/?p=11627