We are developing a SaaS product which is an easy to use invoicing software. We are using React for the frontend app and Node (with fastify web framework and sequelize as ORM) for developing a RESTful API service. For the Database we are using PostgreSQL. We are following a multi-tenant Database architecture, where every tenant will have a separate database. We want to develop the RESTful API server so that it can handle around 500 requests per second. Each request to the REST API will give around 2.1 requests to Postgres.To make sure that we have enough connections to handle requests from multiple users, we are increasing the max_connections in PG to around 2000.We are using nginx as a reverse proxy to handle requests to our RESful APIs.
To test if the server is able to handle the load we want, I develped a script. This script simulates requests to the server from multiple users. I am using the following logic (I have selected the 9 most used endpoints for this)
- Start with 375 unique users
- Send requests to each of the 9 endpoints from every user in 5 seconds (5 seconds for each endpoint. So total 3759 requests in 45 seconds) - So for 375 users it will be 3759/45 RPS = 75 RPS
- Increase this by 125 users every time. So after 45 seconds, it will send 500*9 requests in 45 seconds = 100 RPS
- Do this till we get an error from the server.
(NOTE: We are running everything [RESTful APIs, PG, script for sending requests] on a single VM. The VM has 7 cores and 14 GB RAM)
This is the relevant part of the script
for ( let numUsers = usersStart; numUsers <= usersEnd && failed === false; numUsers += usersGap ) { const totalRequests = userList.length * requestsFromEachUser; const waitTime = (userRequestInterval * 1e3) / numUsers; console.log( `>>> Sending ${totalRequests} requests from ${userList.length} users where each user will send ${requestsFromEachUser} requests` ); const queueStartTime = Date.now(); let cumulativeWaitTime = waitTime; // Each user in the group will send request 4 times in 20 seconds for ( let requestNum = 0; requestNum < requestsFromEachUser;++requestNum ) { for (let i = 0; i < userList.length && failed === false; ++i) { requests.push( sendRequest(userList[i], urls[requestNum], b, numUsers) ); if (cumulativeWaitTime > MIN_WAIT_TIME) { await new Promise((resolve) => setTimeout(resolve, cumulativeWaitTime) ); cumulativeWaitTime = waitTime; } else { cumulativeWaitTime += waitTime; } } } const queueEndTime = Date.now(); const queueExecutionTime = (queueEndTime - queueStartTime) / 1e3; const rps = totalRequests / queueExecutionTime; if (failed === false) { console.log( `>>> Sent ${totalRequests} requests from ${userList.length} users in ${queueExecutionTime} seconds with RPS = ${rps} (expected RPS = ${userList.length / userRequestInterval})` ); } selectRandomUsers(userList, DBs, usersGap); }
The error I am getting on sending the request directly to the backend is ECONNRESET. And the speed I am getting is around 280 requests/second.This speed decreases when I use nginx to ~225 requests/second.When testing instead using a single db, (instead of a separate db for each tenant) – speed increases somewhat, but only to around 360 requests/second
This is the nginx configuration I am using
user www-data;worker_processes auto;worker_rlimit_nofile 65535;pid /run/nginx.pid;include /etc/nginx/modules-enabled/*.conf;events { worker_connections 30000; # multi_accept on;}http { ## # Basic Settings ## sendfile on; tcp_nopush on; types_hash_max_size 2048; # server_tokens off; # server_names_hash_bucket_size 64; # server_name_in_redirect off; include /etc/nginx/mime.types; default_type application/octet-stream; ## # SSL Settings ## ssl_protocols TLSv1 TLSv1.1 TLSv1.2 TLSv1.3; # Dropping SSLv3, ref: POODLE ssl_prefer_server_ciphers on; ## # Logging Settings ## log_format main_ext '$remote_addr - $remote_user [$time_local] "$request" ''$status $body_bytes_sent "$http_referer" ''"$http_user_agent" "$http_x_forwarded_for" ''"$host" sn="$server_name" ''rt=$request_time ''ua="$upstream_addr" us="$upstream_status" ''ut="$upstream_response_time" ul="$upstream_response_length" ''cs=$upstream_cache_status' ; access_log /var/log/nginx/access.log main_ext; error_log /var/log/nginx/error.log debug; ## # Gzip Settings ## gzip on; # gzip_vary on; # gzip_proxied any; # gzip_comp_level 6; # gzip_buffers 16 8k; # gzip_http_version 1.1; # gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript; ## # Virtual Host Configs ## include /etc/nginx/conf.d/*.conf; include /etc/nginx/sites-enabled/*;}
I have the following questions:
- Is the approach we are following to ensure that we are able to handle multiple connections to DB correct? (i.e. by increasing max_connections). If yes, what are the limitations of this and is there a better approach? If no, what are the possible solutions?
- Why I am getting ECONNRESET error?
- Why do the speed reduce when using nginx?
- How to determine what is causing the server to fail at higher request rate?
I tried to check the logs for the RESTful API, but seems like the request does not even reach it. I also tried to look at the nginx logs, but it does not give any information about the error.