For the performance, a questionnaire of the sort is prepared. Publishing it here as I would like to save them and also get some critics from the experts.
The eventual plan is to use Prometheus, Grafite, Grafana stack for Metrics and ELK (ElasticSearch, Logstack, KibanaO for log aggregation. We are also experimenting with some APMs like Newrelic in the meantime.
- What is the GC (garbage collection) parameters we have ?
- How heap size/RAM is available for the NodeJS process ?
- Are we using all the CPUs available on a given server or instance ?
- What are the sysctl and process security limits for the NodeJS processes in a given server/instance ?
- Have we optimized the network connections to the servers to support maximum connections ?
- For given CPU and RAM, say, 1 Ghz and 1 GB RAM, roughly how many concurrent connections we can support ?
- Are we running I/O bound NodeJS processes ? (This is when DevOps will show your BPF super powers ? :slightly_smiling_face: )
- OpenTracing – Can we use Grafana – ELK to map the CPU spikes ?
- Newrelic / Dynatrace or an Open Source solution for APM ?