Major Release: v1.0 of Wayback Tweets
A toolkit for retrieving archived tweets
by claromes in releases, en_us, OSINT, SOCMINT, open source / read this article in

Last year I announced on this blog that the project would move to the command line, and with that shift, new features would be added. In addition to the CLI, we now have an API that can be used independently as a module, allowing for more flexible usage of the tool.
The web app prototype will not receive all the updates from the package, but it still supports downloading archived tweets and benefits from 35GiB of resources kindly provided by the Streamlit team. The legacy version of the web app is no longer maintained, but it remains available.
The tool will continue evolving and gradually becoming more robust.
Features
Here are the main features of the Python package in the stable version 1.0.
Archived Tweet Retrieval
Fetches data from the Wayback Machine CDX API based on a Twitter/X username and supports query continuation using a resumption_key
.
Date Filtering and Result Grouping
The tool allows filtering results by date using the --from
and --to
parameters in the YYYYmmdd
format. You can also limit the number of results with --limit
. To avoid duplicate entries (e.g., multiple captures of the same tweet), you can group results using the --collapse
option, with fields such as urlkey
, digest
, or timestamp.
Comprehensive Parsing of Archived Data
Beyond fetching tweets, Wayback Tweets performs detailed parsing of archived captures, extracting fields such as:
- Timestamp of the capture (in a human-readable format).
- Original and archived URLs (including parsing of legacy URL structures).
- Tweet text still available from the source.
- Retweet identification.
- Archived content type.
- HTTP status (when available), content size, and hash digest of the original data.
Read more about the Field Options.
Export in Multiple Formats
- HTML: Ideal for viewing in a browser using
<iframe>
. - CSV: For spreadsheet analysis.
- JSON: For programmatic use.
Practical Examples
waybacktweets [OPTIONS] USERNAME
waybacktweets --from 20200101 --to 20231231 --limit 500 --collapse digest jack
from waybacktweets import WaybackTweets, TweetsParser, TweetsExporter
USERNAME = "jack"
FIELD_OPTIONS = [
"archived_timestamp",
"parsed_archived_timestamp",
"archived_tweet_url",
"parsed_archived_tweet_url",
"original_tweet_url",
"parsed_tweet_url",
]
api = WaybackTweets(USERNAME)
archived_tweets = api.get()
if archived_tweets:
parser = TweetsParser(archived_tweets, USERNAME, FIELD_OPTIONS)
parsed_tweets = parser.parse()
exporter = TweetsExporter(parsed_tweets, USERNAME, FIELD_OPTIONS)
exporter.save_to_csv()
The CLI is simpler and more practical, offering direct commands to retrieve archived tweets. The API, on the other hand, provides more flexibility and customization, allowing integration of data access into other projects and enabling customization of queries and tweet processing.
Highlights
Following the initial alpha and release candidate versions of 1.0, the project received some recognition...
It was featured in Henk van Ess' Deleted Tweet Finder project, where Wayback Tweets is one of the search options ("User Search"). There is also a demonstration in Spanish on Jey Zeta's YouTube channel and a great tutorial in English written by CyberRaya. It was mentioned in the 5th edition of a guide published by the Institute of the Security Service of Ukraine at Yaroslav Mudryi National Law University (p. 41), and also in the 4th edition of the book Manual de Investigação Digital by Guilherme Caselli (pp. 425–426), published in Brazil by Editora Juspodivm.
Visit
Below are the links to each part of the toolkit, along with everything that has been implemented or modified in the project since the alpha versions.
- Wayback Tweets documentation.
- GitHub Repository.
- PyPI Page.
- Streamlit Web App.
- Legacy Streamlit Web App (no longer maintained).
What's Changed
v1.0
- Added
resumption_key
option - Updated documentation
- Fixed JSON generation
- Updated HTML visualization
- Updated Streamlit Web App
v1.0rc1
- Streamlit App/ Docs: Improved description
- Streamlit App: Fixed IndexError
v1.0rc0
- Added Pandas as a package dependency
- Checked
field_options
in the Viz module - Fixed accordions not opening in Firefox
- Adjusted the Streamlit app to allow search without date filter
v1.0a7
- Streamlit App: Added tabs to show results (HTML, CSV, JSON)
- Streamlit Legacy App: Updated descriptions
- Module: Updated CLI help text
- Added Donate button
- Added Hands-On Examples to documentation
- Updated Installation documentation
v1.0a6
- Streamlit Web App: Fixed width title
- Streamlit Web App: Set anchor headers to
False
- Streamlit Web App: Added
username
query param - Added Citation file
v1.0a5
- Fixed
visualize
module - Updated Streamlit Web App
v1.0a4 (released alongside v1.0a5)
- Updated Streamlit Web App
- Updated documentation
- Updated CLI print messages
- Added pagination to the generated HTML
- Added "Outputs" to the documentation
v1.0a3 (released alongside v1.0a5)
- Updated Streamlit version to 1.36
- Updated Streamlit Web App UI
- Added legacy Streamlit Web App (v0.4.3)
- Updated
visualize
andexport
modules - Fixed
request
module
v1.0a2 (released alongside v1.0a5)
- Added
streamlit
only for dev group in Poetry - Added Python 3.10 as a dependency
- Added accordion on generated HTML
- Added
parsed_archived_timestamp
as a Field Option - Reviewed tweet URL parser
v1.0a1 (released alongside v1.0a5)
- Updated the base code
- Downloaded the archived tweets CDX data
- Parsed available tweets
- Parsed JSON archived tweets (not implemented in the API or CLI, only in the Web App)
- Added HTML generator
- Added docstrings
- Added Poetry for package management
- Added Black, Flake8, isort, and pre-commit for development
- Added documentation with Sphinx (initially tested with MkDocs, but decided to use Sphinx with the Pallets/Flask theme)
- Added CLI with the
click
package - Updated the Streamlit Web App:
- To use the
waybacktweets
package (not yet implemented on Streamlit Cloud) - Updated Streamlit version (1.35.0)
- Added a calendar interface
- Updated README and LICENSE
- Added automatic documentation deployment with Actions
- Added
verbose
flag in the CLI and global configuration for verbose mode - Published version 1.0 alpha on PyPI
- Added basic OpenGraph tags / General template for all documentation pages