I am running a Apache Spark
application that collects data from some source, do some processing based on business rules defined and then the final Spark dataframe
need to be converted into JSON and send to an external REST API
that takes single JSON
record as input.
My question is should I call that external REST API
from Spark
application, is it a good practice? Also, I have a 4 node running cluster as the data size is not that huge (may be a million records at max) so I am worried about performance as how would making call to REST
scale(parallelism) or should I decouple calling API
from Spark
app by sending JSON
(or Avro
to reduce the size) to Kafka topic
first and then have a Kafka consumer
that reads the data from topic (convert to JSON
in case of AVRO
) and post to external API.
Any suggestion is highly appreciated.