Background
Nacos’s current health-check mechanism often fails to detect instance failures in time, especially under high QPS and node volatility.
We’ve encountered cases where:
- Instances remain marked as healthy even after the backend port crashes
- Heartbeat intervals are too generous for fast-fail microservices
- Health status updates are delayed due to lazy sync strategy
Solution: Custom FAST_TCP HealthCheckProcessor
We wrote a minimal plugin FastHealthCheckProcessor that implements two enhancements:
- Shortened heartbeat timeout (default 15s → now 10s)
- Actual TCP connectivity test (verifies that the target instance is connectable via socket)
Highlights:
- Non-invasive, uses
HealthCheckType.CUSTOM - Triggers immediate
updateInstancesync on failure - Can be plugged via standard configuration
- Avoids false-positives while improving failure detection accuracy
Code
“`java
// import com.alibaba.nacos.api.naming.pojo.Instance;
import com.alibaba.nacos.naming.healthcheck.AbstractHealthCheckProcessor;
import com.alibaba.nacos.naming.healthcheck.HealthCheckTask;
import com.alibaba.nacos.naming.healthcheck.HealthCheckType;
import java.net.Socket;
import java.util.Optional;
/**
– 即时状态推送,避免延迟同步
*/
public class FastHealthCheckProcessor extends AbstractHealthCheckProcessor { @Override
public String getType() {
return “FAST_TCP”;
} @Override
public void process(HealthCheckTask task) {
Optional optional = task.getCluster().getService().allIPs().stream()
.filter(inst -> inst.getIp().equals(task.getIp()) && inst.getPort() == task.getPort())
.findFirst(); if (!optional.isPresent()) return; Instance instance = optional.get(); long lastBeat = instance.getLastBeat(); long now = System.currentTimeMillis(); boolean heartbeatTimeout = (now - lastBeat) > 10000; // 心跳超过10秒未响应 boolean tcpAlive = isPortAlive(instance.getIp(), instance.getPort()); if (heartbeatTimeout && !tcpAlive) { instance.setHealthy(false); task.getCluster().getService().updateInstance(instance); System.out.println("[FAST_TCP] Down: " + instance.getIp() + ":" + instance.getPort()); } else { instance.setHealthy(true); } } private boolean isPortAlive(String ip, int port) {
try (Socket socket = new Socket()) {
socket.connect(new java.net.InetSocketAddress(ip, port), 2000);
return true;
} catch (Exception e) {
return false;
}
} @Override
public HealthCheckType getHealthCheckType() {
return HealthCheckType.CUSTOM;
}
}
FastHealthCheckProcessor 插件(用于替代 Nacos 默认健康检查机制)
特点:
– 心跳检测更灵敏(10秒判断)
– 增加真实 TCP 端口探测,防止假存活
