THIS Was Twitter: Introducing the Twitter History and Image Sharing v1.0 Datasets
DOI:
https://doi.org/10.51685/Keywords:
Twitter, X , social media, images, open dataAbstract
This paper introduces the Twitter History and Image Sharing (THIS) datasets. These four related datasets enable the study of Twitter without the release of tweets or user information. Both are derived from a corpus of 14.596 billion geolocated tweets streamed from September 1, 2013 through March 14, 2023. Two Twitter History datasets provide data on the number of tweets, tweets by language, and user data by country from September 1, 2013 through March 14, 2023. A third Twitter History dataset provides data on the number of new user registrations by country from March 21, 2006, the start of Twitter, through March 14, 2023. Image Sharing is based on the 1.676 billion images shared during this period and the 956.049 million still available for download in early 2024. It provides data on the number of images shared and still available from September 1, 2013 through March 14, 2023. The THIS datasets enable the study of Twitter itself and its differential use across countries, including in response to specific events, and the paper demonstrates applications to correlates of image sharing and removal, behavior around national executive elections, event detection, and digital repression. While this paper is not the first to study Twitter, it is, as far as we are aware, the first to provide datasets enabling other researchers to do the same.
Additional Files
Published
Issue
Section
License
Copyright (c) 2025 Zachary Steinert-Threlkeld, Hanfeng Chen

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.