Free and robust Tweets extraction

As anticipated by many, Twitter stopped offering its (limited!) API for free ​1​.

Now, what options do you have to programmatically access the public content for free?
In this context, it is worth mentioning the library snscrape, a tool (well-maintained as of now) for extracting the content from social media services such as Facebook, Instagram or Twitter ​2​. I have just given a go, in the scope of the research project I am working on, and would love to share some thoughts and code.

The basic usage is pretty simple, but I added multithreading to improve speed by executing queries in parallel (an established way of handling I/O bound operations). I also prefer a functional/pipeline style of composing Python commands, using generators, filter and map features. The code snippet below (see also the Colab notebook) shows how to extract tweets of top futurists. Enjoy!

# install social media scrapper: !pip3 install snscrape
import snscrape.modules.twitter as sntwitter
import itertools
import multiprocessing.dummy as mp # for multithreading 
import datetime
import pandas as pd

start_date = datetime.datetime(2018,1,1,tzinfo=datetime.timezone.utc) # from when
attributes = ('date','url','rawContent') # what attributes to keep

def get_tweets(username,n_tweets=5000,attributes=attributes):
    tweets = itertools.islice(sntwitter.TwitterSearchScraper(f'from:{username}').get_items(),n_tweets) # invoke the scrapper
    tweets = filter(lambda>=start_date, tweets)
    tweets = map(lambda t: (username,)+tuple(getattr(t,a) for a in attributes),tweets) # keep only attributes needed
    tweets = list(tweets) # the result has to be pickle'able
    return tweets

# a list of accounts to scrape
user_names = ['kevin2kelly','briansolis','PeterDiamandis','michiokaku']

# parallelise queries for speed ! 
with mp.Pool(4) as p:
    results =, user_names)
    # combine
    results = list(itertools.chain(*results))
  1. 1.
    @TwitterDev. Twitter announces stopping free access to its API. Twitter Dev Team. Published February 3, 2023. Accessed February 15, 2023.
  2. 2.
    snscrape. snscrape. Github Repository. Accessed February 15, 2023.

Fourier integrals vanishing on large circles

When evaluating contour integrals, it is often of interest to prove that Fourier-type integrals vanish on large enough semicircles (see the figure). This holds under the following condition:

Theorem. Suppose that $$f(z)=O(|z|^{-a}), \quad a>0$$ for \(z\) in the upper half-plane. Then for any \(\lambda > 0\) we have $$\int_{\gamma_R} f(z)\mathrm{e}^{i\lambda z} \rightarrow 0, \quad R\to+\infty,$$ where \(\gamma_R\) is the upper half-circle of radius \(R\).

This result is stronger than other ways of developing vanishing integration contours in the upper half-plane, compare for instance with the MIT lecture notes by Jeremy Orloff​1​. The version above can be found in advanced books on Fourier transforms, for example​2​.

To prove that, parametrize the upper half-circle \(\gamma_R\) by \(z=R\mathrm{e}^{i\theta} = R(\cos\theta + i\sin\theta)\) where \(0<\theta<\pi\). Under this parametrization, the Fourier multiplier becomes \(\mathrm{e}^{i\lambda z} = \mathrm{e}^{-\lambda R \sin \theta}\mathrm{e}^{i R \lambda \cos\theta}\). Thus, the integral can be bounded by $$ \left|\int_{\gamma_R} f(z)\mathrm{e}^{i\lambda z}\right|\leqslant \int_{0}^{\pi} |f(R\mathrm{e}^{i\theta})| R \mathrm{e}^{-R\lambda \sin\theta} \mbox{d}\theta \\
\leqslant C\int_{0}^{\pi} R^{1-a} \mathrm{e}^{-R\lambda \sin\theta} \mbox{d}\theta\\ = 2C\int_{0}^{\frac{\pi}{2}} R^{1-a} \mathrm{e}^{-R\lambda \sin\theta} \mbox{d}\theta \\
\leqslant 2C\int_{0}^{\frac{\pi}{2}} R^{1-a} \mathrm{e}^{-2 R\lambda \theta / \pi} \mbox{d}\theta \\
= C\cdot \frac{\pi R^{- a} \left(1 – e^{- R \lambda}\right)}{\lambda},$$
which tends to zero as long as \(a>0\) and \(R\to \infty\).

  1. 1.
    Orloff J. Definite integrals using the residue theorem. Lecture Notes. Accessed 2023.
  2. 2.
    Spiegel MR. Laplace Transforms. McGraw Hill; 1965.

Expanding Inverse Functions

The problem of inverting the implicit function $$y=f(x)$$ in the form of power-series $$x = a + \sum_{k=1}^{\infty} b_k (y-f(a))^k$$

around a point of interest $x=a$, has a long history. Lagrange obtained a theoretical inversion formula [1], yet efficient implementations are relatively recent [2]Brent, Richard P., and Hsiang T. Kung. “Fast algorithms for manipulating formal power series.” Journal of the ACM (JACM) 25, no. 4 (1978): 581-595. [3]Johansson, F., 2015. A fast algorithm for reversion of power series. Mathematics of Computation84(291), pp.475-484..

I this note I am sharing a bit simpler algorithm, performing Newton-like updates:
$$x_{k} = x_k-f(x_{k-1})[(y-f(a))^{k}]\cdot \frac{(y-f(a))^{k}}{f'(a)},\quad x_1 = a+\frac{y-f(a)}{f'(a)}.$$
We can see that it gradually produces more and more accurate terms. More precisely, assuming w.l.o.g. \(0=a=f(a)\), suppose \(f(x) =y + O(y^{k})\), then \(f(x + h ) = f(x)+f'(0) h + O(h^2)\) by Taylor’s expansion, and so the update \(h=-\frac{f(x) [y^k] }{f'(0)}\) (where \([z^k]\) is the operation of extracting the coefficient with \(z^k\)) gives \(f(x+h)=y+O(y^{k+1})\) as desired.
I implemented this a proposal for Sympy.


2 Brent, Richard P., and Hsiang T. Kung. “Fast algorithms for manipulating formal power series.” Journal of the ACM (JACM) 25, no. 4 (1978): 581-595.
3 Johansson, F., 2015. A fast algorithm for reversion of power series. Mathematics of Computation84(291), pp.475-484.

Monitoring Azure Experiments

Azure Cloud is a popular work environment for many data scientists, yet many features remain poorly documented. This note shows how to monitor Azure experiments in a more handy and detailed way than through web or cl interface.

The trick is to create a dashborad of experiments and their respective runs, up to a desired level of detail, from Python. The workhorse is the following handy utility function:

from collections import namedtuple

def get_runs_summary(ws):
    """Summarise all runs under a given workspace, with experiment name, run id and run status
        ws (azureml.core.Workspace): Azure workspace to look into
    # NOTE: extend the scope of run details if needed
    record = namedtuple('Run_Description',['job_name','run_id','run_status'])
    for exp_name,exp_obj in ws.experiments.items():
        for run_obj in exp_obj.get_runs():

Now it’s time to see it in action 😎

# get the default workspace
from azureml.core import Workspace
import pandas as pd

ws = Workspace.from_config()

# generate the job dashboard and inspect
runs = get_runs_summary(ws)
summary_df = pd.DataFrame(runs)
# count jobs by status

Use the dashboard for to automatically manage experiments. For example, to kill running jobs:

from azureml.core import Experiment, Run

for exp_name,run_id in summary_df.loc[summary_df.run_status=='Running',['job_name','run_id']].values:
    exp = Experiment(ws,exp_name)
    run = Run(exp,run_id)

Check the jupyter notebook in my repository for a one-click demo.

Repo Passwords in Poetry

Poetry, a popular Python package manager, prefers to use keyring to manage passwords for private code repositories. Storing passwords in plain text is a secondary option, but may be needed in case of either issues in poetry itself or with keyring configuration (may not be properly installed, be locked etc). To disable the use of system keyring by poetry, set the null backend in the environmental variable PYTHON_KEYRING_BACKEND:

(.venv) azureuser@sense-mskorski:~/projects/test_project$ export PYTHON_KEYRING_BACKEND=keyring.backends.null.Keyring
(.venv) azureuser@sense-mskorski:~/projects/test_project$ poetry config http-basic.private_repo *** '' -vvv
Loading configuration file /home/azureuser/.config/pypoetry/config.toml
Loading configuration file /home/azureuser/.config/pypoetry/auth.toml
Adding repository test_repo (https://***) and setting it as secondary
No suitable keyring backend found
No suitable keyring backends were found
Keyring is not available, credentials will be stored and retrieved from configuration files as plaintext.
Using a plaintext file to store credentials

Analogously, one of working backends can be enabled (make sure it works correctly!) . This is how to list available backends:

(.venv) azureuser@sense-mskorski:~/projects/test_project$ keyring --list-backends
keyring.backends.chainer.ChainerBackend (priority: -1) (priority: 0)
keyring.backends.SecretService.Keyring (priority: 5)

Marking Python Tests as Optional

Often code tests are to be run on special demand, rather than in a CI/CD manner: for instance, they may be slow or work only in a local mode with protected data. This note shows how to declare code tests optional in pytest, the leading testing framework for Python. The article is inspired by the pytest documentation on test markers.

The trick is to mark extra tests with a decorator and couple it with a runtime flag.
The first step is to mark the relevant code blocks and is illustrated below:

import pytest

def test_addition():
    assert 1+1==2

The second step is to instruct the config flag on how to interpret the decorator; it is also worth describing this action in the helper. The code below shows how to set this up:

import pytest

def pytest_addoption(parser):
    """Introduce a flag for optional tests"""
        "--optional", action="store_true", default=False, help="run extra tests"

def pytest_collection_modifyitems(config, items):
    """Instruct the testing framework to obey the decorator for optional tests"""
    if config.getoption("--optional"):
        # --extras given in cli: do not skip extra tests
    # otherwise, execute
    skip_extras = pytest.mark.skip(reason="need --optional flag to run")
    for item in items:
        if "optional" in item.keywords:

def pytest_configure(config):
    """Make the decorator for optional tests visible in help: pytest --markers"""
    config.addinivalue_line("markers", "optional: mark test as optional to run")

To see it in action, run pytest -k battery --optional 😎 PASSED

Debug CI/CD with SSH

CircleCI is a popular platform Continuous integration (CI) and continuous delivery (CD). While its job status reports are already useful, one can do much more insights by debugging it in real time. Here I am sharing a real use-case of debugging a failing job deploing an app 🙂

Failing job are reported in red and the (most of) errors caught during the execution appear in the terminal. In this case, the environment is unable to locate Python:

The failed job and caught errors.

The error is not very informative. 😮 For more insights, let’s re-run the job in SSH mode:

Re-running the failed CircleCI with SSH.

You will be welcomed with instructions on how to connect via SSH:

CircleCI showing SSH connection string.

Use this instruction to connect to inspect the environment at its failure stage:

mskorski@SHPLC-L0JH Documents % ssh -p 64535
The authenticity of host '[]:64535 ([]:64535)' can't be established.
ED25519 key fingerprint is SHA256:LsMhHb5fUPLHI9dFdyig4VKw44GTqrA2dkEWT0sZx4k.
Are you sure you want to continue connecting (yes/no)? yes

circleci@bc95bb40fff3:~$ ls project/venv/bin -l
total 300
lrwxrwxrwx 1 circleci circleci    7 Aug  1 12:37 python -> python3
lrwxrwxrwx 1 circleci circleci   49 Aug  1 12:37 python3 -> /home/circleci/.pyenv/versions/3.8.13/bin/python3
lrwxrwxrwx 1 circleci circleci    7 Aug  1 12:38 python3.9 -> python3

Bingo! The terminal warns about broken symbolic links:

Terminal highlights broken symbolic links.

The solution in this case was to update cache. The issues may be far more complex than that, but being able to debug them live comes to the rescue. 😎

Prototype in Jupyter on Multiple Kernels

For data scientists, it is a must to prototype in multiple virtual environments which isolate different (and often very divergent) sets of Python packages. This can be achieved by linking one Jupyter installation with multiple Python environments.

Use the command <code>which jupyter</code> to show the Jupyter location and <jupyter kernelspec list> to show available kernels, as shown below:

ubuntu@ip-172-31-36-77:~/projects/eye-processing$ which jupyter
ubuntu@ip-172-31-36-77:~/projects/eye-processing$ jupyter kernelspec list
Available kernels:
  .eyelinkparser    /home/ubuntu/.local/share/jupyter/kernels/.eyelinkparser
  eye-processing    /home/ubuntu/.local/share/jupyter/kernels/eye-processing
  pypupilenv        /home/ubuntu/.local/share/jupyter/kernels/pypupilenv
  python3           /usr/local/share/jupyter/kernels/python3

To make an environment a Jupyter kernel, first activate it and install ipykernel inside.

ubuntu@ip-172-31-36-77:~/projects/eye-processing$ source .venv/bin/activate
(.venv) ubuntu@ip-172-31-36-77:~/projects/eye-processing$ pip install ipykernel
Collecting ipykernel
Successfully installed...

Then use ipykernel to register the active environment as a jupyter kernel (choose a name and the destination, e.g. user space; consult python -m ipykernel install –help for more options).

(.venv) ubuntu@ip-172-31-36-77:~/projects/eye-processing$ python -m ipykernel install --name eye-processing --user
Installed kernelspec eye-processing in /home/ubuntu/.local/share/jupyter/kernels/eye-processing

The kernel should appear in the jupyter kernel lists. Notebooks may need restarting to notice it.

The environment appears in Jupyter kernels (VS Code).

Open kernel.json file under kernel’s path to inspect config details:

 "argv": [
 "display_name": "eye-processing",
 "language": "python",
 "metadata": {
  "debugger": true
"~/.local/share/jupyter/kernels/eye-processing/kernel.json" [noeol] 14L, 234B

Modern Bibliography Management

A solid bibliography database is vital for every research project, yet building it is considered an ugly manual task by many – particularly by old-school researchers. But this does not have to be painful if we use modern toolkit. The following features appear particularly important:

  • collaborative work (sharing etc)
  • extraction magic (online search, automated record population)
  • tag-annotation support (organize records with keywords or comments)
  • multi-format support (export and import using BibTeX or other formats)
  • plugins to popular editors (Office, GoogleDocs)

I have had a particularly nice experience with Zotero (many thanks to my work colleagues from SensyneHealth for recommending this!). Let the pictures below demonstrate it in action!

tag and search…

organize notes…

automatically extract from databases…

format and export…

National Bank of Poland leads on helping Ukraine’s currency

Ukrainian refugees see their currency hardly convertible at fair rates or even accepted. The lack of liquidity will also hit Ukraine’s government urgent expenses (medical supplies or weapons).

The National Bank of Poland (NBP) offered a comprehensive financial package to address these problems. Firstly, it enabled the refugees to exchange the currency at a nearly official rate, through the agreement with the National Bank of Ukraine (NBU) signed on March 18.

Secondly, in a follow-up agreement the NBP offered a FX Swap for USD 1 billion which will provide the Ukraine central bank with more liquidity.

As Reuters announced on March 24, EU countries are close to follow and agree on a scheme enabling exchanging Ukraine’s cash. The rules disclosed in the draft are close to the NBP scheme (exchange capped at around 10,000 UAH per individual)