Free and robust Tweets extraction

As anticipated by many, Twitter stopped offering its (limited!) API for free ​1​.

Now, what options do you have to programmatically access the public content for free?
In this context, it is worth mentioning the library snscrape, a tool (well-maintained as of now) for extracting the content from social media services such as Facebook, Instagram or Twitter ​2​. I have just given a go, in the scope of the research project I am working on, and would love to share some thoughts and code.

The basic usage is pretty simple, but I added multithreading to improve speed by executing queries in parallel (an established way of handling I/O bound operations). I also prefer a functional/pipeline style of composing Python commands, using generators, filter and map features. The code snippet below (see also the Colab notebook) shows how to extract tweets of top futurists. Enjoy!

# install social media scrapper: !pip3 install snscrape
import snscrape.modules.twitter as sntwitter
import itertools
import multiprocessing.dummy as mp # for multithreading 
import datetime
import pandas as pd

start_date = datetime.datetime(2018,1,1,tzinfo=datetime.timezone.utc) # from when
attributes = ('date','url','rawContent') # what attributes to keep

def get_tweets(username,n_tweets=5000,attributes=attributes):
    tweets = itertools.islice(sntwitter.TwitterSearchScraper(f'from:{username}').get_items(),n_tweets) # invoke the scrapper
    tweets = filter(lambda t:t.date>=start_date, tweets)
    tweets = map(lambda t: (username,)+tuple(getattr(t,a) for a in attributes),tweets) # keep only attributes needed
    tweets = list(tweets) # the result has to be pickle'able
    return tweets

# a list of accounts to scrape
user_names = ['kevin2kelly','briansolis','PeterDiamandis','michiokaku']

# parallelise queries for speed ! 
with mp.Pool(4) as p:
    results = p.map(get_tweets, user_names)
    # combine
    results = list(itertools.chain(*results))
  1. 1.
    @TwitterDev. Twitter announces stopping free access to its API. Twitter Dev Team. Published February 3, 2023. Accessed February 15, 2023. https://twitter.com/TwitterDev/status/1621026986784337922?s=20
  2. 2.
    snscrape. snscrape. Github Repository. Accessed February 15, 2023. https://github.com/JustAnotherArchivist/snscrape

Published by mskorski

Scientist, Consultant, Learning Enthusiast

Leave a comment

Your email address will not be published. Required fields are marked *