I am building a high-performance REST API. My goal is at least 10k req/s with less than 400ms latency (while testing locally). Ideally something close to 15-20k. Machine that I use for testing is an AWS instance with 64 vCPUs, 128gb RAM and an ssd, which should be an overkill? The nature of a request I am testing on is the following: I get an identificator, perform mathematical operations on it (modify it in a special way), then add it to a predefined js script, that is stored locally, and return to a user.
My problem is related to a maximum request load that my server can handle. You can see the details below.
Right now I am using Expressjs. You can see the mock code below:
const express = require('express');const rateLimit = require('express-rate-limit');const compression = require('compression');const fs = require('fs');const path = require('path');const { method1 } = require('...');const { method2 } = require('...');const { method3 } = require('...');const app = express();app.use(express.json());app.use(compression());const limiter = rateLimit({ windowMs: 60 * 1000, max: 6000000000000000, message: 'Too many requests, please try again later'});app.set("trust proxy", true);app.use(limiter);const port = 4528;const scriptPath = path.join(__dirname, './script.js');const scriptContent = fs.readFileSync(scriptPath, 'utf8');app.get('/mock-endpoint', async (req, res) => { const id = req.query.id; const result = await processAsync(id, scriptContent); return res.send(result);});function processAsync(id, scriptContent) { return new Promise((resolve, reject) => { try { // const worker = new Worker("./worker.js", { // workerData: { // id, // scriptContent, // }, // }); // worker.on("message", (data) => { // resolve(data); // }); // worker.on("error", (msg) => { // reject(`An error ocurred: ${msg}`); // }); const value1 = method1(id); const value2 = method2(value1); const value3 = method3(value2); const scriptResult = scriptContent.replace("-insert-here-", value3); resolve(scriptResult); } catch (error) { reject(error); } });}app.listen(port, () => { console.log(`Backend is running at http://localhost:${port}`);});
When load testing this script I:
- Use clustering (using "pm2" with "-i max" parameter (which is equal to "-i 64" in my case)). I have also tried doing 16 and 32 instances, but I niether made things better nor worse
- Set UV_THREADPOOL_SIZE to 64
- All of the methods (method1, method2 and method3) are node addons that are are originally written in C++. I thought that since there's pure math there it will be better to offload these tasks to cpp rather than nodejs (which boosted the performance by 8x compared to doing same operations in nodejs). Addons are context aware, because, as you can see, I have tried to offload the tasks to a worker, which only made things worse
- Offload calling these methods to a Promise.
- Node environment is set to production.
I also use autocannon as following:
autocannon -c 10 -d 30 -p 10 http://localhost:4528/mock-endpoint?id=123456789qwerty
Which gives me around 4.5k req/s with around 200ms latency. While doing this test I can observe only a shy of 11% CPU usage and less than 1% RAM usage, so I am definetely not lacking resources.
I have also tested the same script over the web on the same server using loaderio, still the same CPU and RAM usage, but some of the requests are timed out (take more than 10s), so the average time with 10k req/s load is around 1.4s. I should also mention that for the load tests nginx was in place.
I think it is worth mentioning that caching of both the response and scriptContent is not the way to go: response is always different and scriptContent is saved as string at the start of the server.
So the final question is why is my app struggling to process these request fast enough both locally and over the web, considering the fact that both CPU and RAM usage is so low? Is this some kind of nodejs-sided limitation? If so, how do I increase the limits, if it is possible? There's also an idea to try several lower hardware capable instances together in a cluster (since I only use 15% of this one at the moment), but I first want to understand if it is possible to push current server to the limits.
PSAfter some time of struggling with a situation that you can see above I have decided to try to move everything to Go. The choice was in favour of Fiber framework, because it is quite similar to express and seems to be one of the fastest that Go can offer. I basically recreated what I have in express there. I build it with go build -ldflags="-s -w" -o server server.go
. Running same localhost benchmark on it resulted in 200 req/s and 500ms latensy, which is awful. This is a completely side question unrelated to the main one, but if you know some ways to maximize golang server performance I will be very pleased to hear them. Btw, "Hello world" test on Go did almost 5 times better than nodejs at around 47k req/s, idk why this one did so poorly.