I would like to build an analytics solution. The main data source would be a publicly available REST API.I have written a bunch of Python scripts using requests that query multiple endpoints of the same API. The problem is that there are quite a lot of endpoints available and they are all very similar. Some accept filtering parameters like date_from and date_to. This way I can loop through the date ranges to "download" all available data from the past. I could also build a custom solution that would store information about the dates that I have already queried.
I was thinking if there is a tool/framework/technique designed to efficiently query such data sources. I want to make sure that I fetch all the data for a certain period of time, that there are no missing data points. Also - I don't want to send multiple queries for the data that I already have fetched. And obviously, I don't want to get banned for sending too many requests in a given period of time.
Bonus question: what database would you use for storing the output? Since REST APIs return JSON I was thinking of Elasticsearch.
Overall my question is pretty open-ended. I just want to learn on best practices on REST API querying.