For a home automation project I am trying to pull train delay data. An API wrapper exists, with cURL examples. These work fine, but both Python's requests.get and httpx.get are slow to pull data (up to a minute for requests and about 4 seconds for httpx) but curl, or pasting a link in the browser, returns almost immediately. Why?
The internet suggested that some sites have anti-scraping protections and may throttle or block requests, as it uses HTTP1.0. On this API httpx does seem to be much faster, but nowhere near as fast as curl or the browser.
Some examples - this Python snippet takes about 4 seconds:
import httpx client = httpx.Client(http2=True) response = client.get('https://v6.db.transport.rest/stations?query=berlin') print(response.text)This takes up to a minute:
import requests response = requests.get('https://v6.db.transport.rest/stations?query=berlin') print(response.text)This returns almost immediately:
import subprocess command = 'curl \'https://v6.db.transport.rest/stations?query=berlin\' -s' result = subprocess.run(command, capture_output=True, shell=True, text=True) print(result.stdout) print(result.stderr)What's the magic here?