Server monitoring and the mesh network

How CreateYourVPN keeps an eye on server health: self-checks, a mesh network of mutual checks between servers, and automatic hiding of unreachable servers from users.

A server can let you down at the worst possible moment: a service crashes, the host reboots the machine — or, worse still, the server gets blocked in a user's country and stops being reachable even though it's technically up. Your users should never notice any of this. This lesson covers how CreateYourVPN watches over your servers and what happens when one of them is in trouble.

Two layers of checks

The system looks at every server from two angles.

1. The server checks itself

Every few minutes, each server reports in: is the traffic listener on port 443 alive, and is the VPN service running? The metrics from lesson 3 arrive with the same report. If the report says "I'm not well," the server gets the "Server unavailable" status. If reports stop coming altogether, the status becomes "No data from server" — the machine may be powered off or have lost its network.

2. Servers "knock on" each other

Self-checks can't catch the sneakiest case: a server believes it's healthy, but from the outside it can't be reached — for example, its IP has been blocked. That's why all your servers are joined into a mesh network of mutual checks: each server regularly tries to reach every neighbor — the same way a user's app would.

From there, a "vote" takes over:

A single failed knock means nothing — the network may have just blinked. Only sustained streaks of failures count.
A server is marked "Unreachable from your servers" only when several independent servers consistently fail to reach it — one server's opinion is not a verdict.
There's also protection against the "unreliable witness": if some server suddenly claims it can't see half its neighbors, the problem is most likely its own — and its votes are discarded.

This is exactly the scheme that catches blocks: if a server gets blocked in a country where your other servers live, the neighbors from that country will quickly stop reaching it, a quorum builds up, and the system marks the server unreachable. Meanwhile, servers in countries without the block will keep seeing it — and the difference of opinions makes the picture obvious.

What happens to a problematic server

The statuses feed directly into balancing: servers with the "Server unavailable" and "Unreachable from your servers" statuses are pulled out of rotation — no new connections are sent their way, and in users' subscriptions their place is taken by healthy servers from the same routes.

At the same time, the system is careful — it's built on the "do no harm" principle:

A route is never left empty. If every server on a route turns out to be "bad," the system will serve users the best option available rather than an empty list — otherwise apps would decide the servers are gone and wipe them.
Silence is not a verdict. The "No data from server" status doesn't remove a server from rotation on its own: without fresh data, the system makes no sudden moves.

Everything heals itself

None of the statuses "sticks" forever — the state is re-derived from fresh data:

the server sends a healthy report again → "Server unavailable" is lifted;
the neighbors start reaching it again → the failure streak resets, and the "Unreachable from your servers" mark goes away;
reports resume → "No data from server" disappears.

Fix the server (or wait for it to be unblocked) — and it puts itself back into service. No manual "turn it back on."

Where to see this in the panel

Open any user's card → their server list: next to each server there's a status dot, and when something's wrong — a caption with the reason: "Server unavailable", "Unreachable from your servers" (including how many of your servers can't see it), or "No data from server". Indirect signs show up on server cards too: a "stale" badge on the metrics and "No data — agent not responding".

Mesh checks are extremely frugal: they are tiny probe touches with a strict daily volume cap — they have no impact on your servers' traffic or performance.

Key takeaways

Two layers of control: a server checks itself, and its mesh neighbors check it from the outside.
Blocks are caught by a "vote" of several servers — no panic over one-off hiccups.
Problematic servers drop out of rotation; users silently move to healthy ones.
Everything self-heals: the moment a server comes back to life, it rejoins automatically.

Next up

The infrastructure knows how to fix itself, but there's one thing worth protecting separately — your user base.

Lesson 8. Backups →

Saving your users to your own storage and learning to restore them.

Server monitoring and the mesh network

Lesson 8. Backups →

On this page