Multiple REST API calls on 1m data entries using Databricks + scala?

I am trying to get an API call to get all the buildings in LA county. The website for the dataset is here

The county has 3 million buildings I've filtered buildings to 1 million-ish. You can look at my QUERY_PARAMS in the code.

I've tried using python but without surprise, retrieving 1 million data points still takes up a long time.

From the ESRI developer website, I understand that 1 single API call is limited to 10,000 results. However, because of my problem, I need to retrieve all 1 million buildings.

Here is my code so far, even after using async functions it still takes about 10 minutes

import aiohttpimport asyncioimport nest_asyncionest_asyncio.apply()  # Required if running in Jupyter Notebook# Base URL for the API queryBASE_URL = "https://services.arcgis.com/RmCCgQtiZLDCtblq/arcgis/rest/services/Countywide_Building_Outlines/FeatureServer/1/query"# Parameters for the queryQUERY_PARAMS = {"where": "(HEIGHT < 33) AND UseType = 'RESIDENTIAL' AND SitusCity IN('LOS ANGELES CA','BEVERLY HILLS CA',  'PALMDALE')","outFields": "*","outSR": "4326","f": "json","resultRecordCount": 1000,  # Fetch 1000 records per request}async def fetch_total_count():"""Fetch total number of matching records."""    params = QUERY_PARAMS.copy()    params["returnCountOnly"] = "true"    async with aiohttp.ClientSession() as session:        async with session.get(BASE_URL, params=params) as response:            data = await response.json()            return data.get("count", 0)  # Extract total countasync def fetch(session, offset):"""Fetch a batch of records using pagination."""    params = QUERY_PARAMS.copy()    params["resultOffset"] = offset    async with session.get(BASE_URL, params=params) as response:        return await response.json()async def main():"""Fetch all records asynchronously with pagination."""    all_data = []    total_count = await fetch_total_count()    print(f"Total Records to Retrieve: {total_count}")    semaphore = asyncio.Semaphore(10)  # Limit concurrency to prevent API overload    async with aiohttp.ClientSession() as session:        async def bound_fetch(offset):            async with semaphore:                data = await fetch(session, offset)                return data        # Generate tasks for pagination        tasks = [bound_fetch(offset) for offset in range(0, total_count, 1000)]        results = await asyncio.gather(*tasks)        for data in results:            if "features" in data:                all_data.extend(data["features"])    print(f"Total Records Retrieved: {len(all_data)}")    return all_data# Run the async functionall_data = asyncio.run(main())

I've turned to Databricks + scala to speed up the data retrieval faster. But I'm brand new to big data computing. I'm slightly aware you need to "parallize" your API calls and combine them into one big dataframe?

Can someone provide me suggestions?

Multiple REST API calls on 1m data entries using Databricks + scala?

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112