THIS Was Twitter: Introducing the Twitter History and Image Sharing v1.0 Datasets

Authors

  • Hanfeng Chen
  • Zachary Steinert-Threlkeld University of California - Los Angeles

DOI:

https://doi.org/10.51685/

Keywords:

Twitter, X , social media, images, open data

Abstract

 This paper introduces the Twitter History and Image Sharing (THIS) datasets. These four related datasets enable the study of Twitter without the release of tweets or user information. Both are derived from a corpus of 14.596 billion geolocated tweets streamed from September 1, 2013 through March 14, 2023. Two Twitter History datasets provide data on the number of tweets, tweets by language, and user data by country from September 1, 2013 through March 14, 2023. A third Twitter History dataset provides data on the number of new user registrations by country from March 21, 2006, the start of Twitter, through March 14, 2023. Image Sharing is based on the 1.676 billion images shared during this period and the 956.049 million still available for download in early 2024. It provides data on the number of images shared and still available from September 1, 2013 through March 14, 2023. The THIS datasets enable the study of Twitter itself and its differential use across countries, including in response to specific events, and the paper demonstrates applications to correlates of image sharing and removal, behavior around national executive elections, event detection, and digital repression. While this paper is not the first to study Twitter, it is, as far as we are aware, the first to provide datasets enabling other researchers to do the same.

Additional Files

Published

2025-05-04

Issue

Section

Articles

How to Cite

Chen, H., & Steinert-Threlkeld, Z. (2025). THIS Was Twitter: Introducing the Twitter History and Image Sharing v1.0 Datasets. Journal of Quantitative Description: Digital Media, 5. https://doi.org/10.51685/