For a home automation project I am trying to pull train delay data. An API wrapper exists, with cURL examples. These work fine, but both Python's requests.get
and httpx.get
are slow to pull data (up to a minute for requests
and about 4 seconds for httpx
) but curl
, or pasting a link in the browser, returns almost immediately. Why?
The internet suggested that some sites have anti-scraping protections and may throttle or block requests
, as it uses HTTP1.0. On this API httpx
does seem to be much faster, but nowhere near as fast as curl
or the browser.
Some examples - this Python snippet takes about 4 seconds:
import httpx client = httpx.Client(http2=True) response = client.get('https://v6.db.transport.rest/stations?query=berlin') print(response.text)
This takes up to a minute:
import requests response = requests.get('https://v6.db.transport.rest/stations?query=berlin') print(response.text)
This returns almost immediately:
import subprocess command = 'curl \'https://v6.db.transport.rest/stations?query=berlin\' -s' result = subprocess.run(command, capture_output=True, shell=True, text=True) print(result.stdout) print(result.stderr)
What's the magic here?