Skip to content

clopso/tweetext

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

July 2023 - Recent changes in the Twitter platform now require a login to view tweets, causing this tool to malfunction

Tweetext: Find old tweets with the Wayback Machine

Want to find old tweets and don't know how? You found the solution!

Features:

  • All tweets from an account in a text file.

  • Can be used with deleted accounts.

  • The link to view the tweet is along with the text file.

  • Allows custom time range to narrow Tweet search between two dates.

  • Ability to switch between a list of proxy servers to avoid 429 errors.
    You will need to do this for datasets larger than about 800 tweets.

Usage

python3 tweetext.py -u USERNAME [OPTIONS]

    -u, --username                                  Specifies the Twitter handle of the target user

    --batch-size                                    Specifies how many URLs you would like to
                                                    examine at a time. Expecting an integer between
                                                    1 and 100. A higher number will give you faster speed.
                                                    momentum, but with the risk of errors. default = 100

    --semaphore-size                                Specify how many URLs --batch-size you want
                                                    I would like to query asynchronously all at once.
                                                    Expecting an integer between 1 and 50.
                                                    A higher number will give you a speed momentum,
                                                    but with the risk of errors. Default = 50

    -from, --fromdate                               Restricted search of *archived* deleted Tweets
                                                    From this date
                                                    (can be combined with -to)
                                                    (YYYY-MM-DD or YYYY/MM/DD format
                                                    or YYYYMMDD, doesn't matter)


    -to, --todate                                   Restrict search of *archived* deleted Tweets
                                                    on and before this date
                                                    (can be combined with -from)
                                                    (YYYY-MM-DD or YYYY/MM/DD format
                                                    or YYYYMMDD, doesn't matter)


    --proxy-file                                    Provide a list of proxies to use.
                                                    You will need this to check large groups of tweets
                                                    Each line must contain a url:port to use
                                                    The script will choose a new proxy from the
                                                    list randomly after each --batch-size


    Logs                                            After checking a user's tweets, but before you
                                                    make a download selection, a folder will be created
                                                    with that username. This folder will contain a log of:
                                                    <deleted-twitter-url>:<deleted-wayback-url>
                                                    in case you need them
    Examples:
    python3 tweetext.py -u taylorswift13            Download all tweets
                                                    (until deleted)
                                                    from @taylorswift13

    python3 tweetext.py -u drake -from 2022/09/02   All downloads @drake's
                                                    Tweets (until deleted)
                                                    from the beginning until
                                                    February 9, 2022

Installation

git clone https://github.com/clopso/tweetext
cd tweetxt
pip3 install -r requirements.txt

Run the command:

python3 tweetext.py -u USERNAME

(Replace USERNAME with your target handle).

For more information, check out the Usage section above.

Troubleshooting

The default speed settings for --semaphore-size and --batch-size are set to the fastest possible execution. Reduce these numbers to slow down your execution and reduce the chance of errors. For checking large numbers of tweets (> than 800) you'll need to use web proxies and --proxy-file flag

Things to keep in mind

  • Quality of the HTML files depends on how the Wayback Machine saved them. Some are better than others.
  • This tool is best for text. You might have some luck with photos. You cannot download videos.
  • Custom date range is not about when Tweets were made, but rather when they were archived. For example, a Tweet from 2011 may have been archived today.

About

Find ALL old tweets with the Wayback Machine (Including from disabled accounts)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages