Skip to content

Update Meilisearch's parts about Replication and Sharding support#454

Open
Kerollmops wants to merge 7 commits intotypesense:masterfrom
Kerollmops:master
Open

Update Meilisearch's parts about Replication and Sharding support#454
Kerollmops wants to merge 7 commits intotypesense:masterfrom
Kerollmops:master

Conversation

@Kerollmops
Copy link
Copy Markdown
Contributor

Yellow, the Typesense team 👋

Change Summary

I have updated your comparison table to reflect our recent releases and, most importantly, the release of geo-replicated sharding. Users can now configure a cluster and define how to replicate and shard their documents. We use virtual-naming to let users scale horizontally, define a replication-only setup, or both simultaneously.

I also updated parts discussing the impact of available RAM on performance.

Thank you for your time, and have a great week-end 🌵

PR Checklist


That said, Meilisearch is a relatively new project, and while it's not suited for serious production use-cases today,
the project has a good team & momentum behind it. We're eager to see how the project evolves.
That said, Meilisearch is a relatively new project. The project has a good team & momentum behind it. We're eager to see how the project evolves.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably remove the whole sentence.

<td>❌</td>
<td>❌</td>
<td></td>
<td>✅<br /><br />Can optionally use a GPU to generate embeddings</td>
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already what's written on the v3 page

@jasonbosco
Copy link
Copy Markdown
Member

Do you have an article in your docs that describes how this specifically solves for High Availability?

Because sharding and high availability are distinct things, and having sharding doesn't imply being highly available. For eg, what happens when a shard goes down.

Anyway, I'll reserve my questions for after I read your docs. I checked your public docs and couldn't find it described there, so if you have a direct link, I can read through it.

@jasonbosco jasonbosco marked this pull request as draft March 20, 2026 02:24
@Kerollmops
Copy link
Copy Markdown
Contributor Author

Kerollmops commented Mar 20, 2026

Hello @jasonbosco 👋

Thank you very much for taking the time to review my PR.

Do you have an article in your docs that describes how this specifically solves for High Availability?

Because sharding and high availability are distinct things, and having sharding doesn't imply being highly available. For eg, what happens when a shard goes down.

Anyway, I'll reserve my questions for after I read your docs. I checked your public docs and couldn't find it described there, so if you have a direct link, I can read through it.

We haven't updated our public documentation yet, but we have internal documentation on how to set up full or unbalanced replication in the changelog for v1.37, the version introducing replicated sharding.

Note that Meilisearch supports sharding and replication, which are handled via a virtual naming approach to be very flexible for any situation.

Have a nice day 🌵

@Kerollmops
Copy link
Copy Markdown
Contributor Author

Hey @jasonbosco 👋

We finally merged the documentation about the new sharding feature of Meilisearch. As you can see, Meilisearch supports full sharding without replication, full replication, or a mix of sharding and replication. The engine handles those different configurations by using a virtual-naming approach.

I let you read our documentation,
Have a great day 🌵

@Kerollmops Kerollmops marked this pull request as ready for review April 7, 2026 11:50
@jasonbosco
Copy link
Copy Markdown
Member

@Kerollmops Thank you for sharing the link to the docs.

After reading through it, I still don't think Meilisearch's replication architecture qualifies as high availability or fault tolerance in the standard distributed systems sense.

Just to align on terminology, high availability / fault tolerance typically means that there is no single point of failure and a system continues to operate correctly (serving both reads and writes) despite a node failure, with automatic failover and no operator intervention required.

Based on your docs, Meilisearch's current implementation doesn't seem to meet that bar.

--

On the write side, your docs say:

If the leader goes down:

  • Search may be affected: if search requests are routed to the downed leader, they will fail
  • Writes are blocked: no documents can be added or updated until a leader is available. Note that alive remote instances continue to process tasks
  • Manual promotion: you must designate a new leader by updating the network topology with PATCH /network and setting "leader" to another instance
Screenshot 2026-04-23 at 1 18 29 PM

Source: https://www.meilisearch.com/docs/resources/self_hosting/sharding/configure_replication#the-leader-instance

So there’s a single leader for writes, and if that node goes down - say at 3am - writes are down until someone manually promotes a new leader. That’s a single point of failure for the write path and requires operator intervention to recover, regardless of the number of shards or replicas, even when fully replicated.

--

Even on the read side, your docs say:

The downside is that there is no automatic fallback. If the remote assigned to a shard is unreachable, that shard’s results are missing from the response — Meilisearch does not yet retry using another replica that holds the same shard.

Screenshot 2026-04-23 at 12 12 42 PM

Source: https://www.meilisearch.com/docs/resources/self_hosting/sharding/configure_replication#remote-availability

Based on that, my understanding is that during a distributed search, each shard is assigned to a specific remote. If that remote is down, the query does not retry against another replica that holds the same shard.

The implication of this is that, let's say there's a Meilisearch cluster with a shard fully replicated across multiple nodes. Again it's 3am and one node goes down, and the query planner happens to pick that node for that shard, the request may return partial or failed results - even though another replica is healthy. From the application’s point of view, this breaks the expectation that replicas provide seamless failover.

--

Putting both together, your current replication mechanism does not provide fault tolerance when a node is down. It only gives you data redundancy for existing data, but without automatic replica fallback on reads and automatic leader election for writes, the system will go into a degraded state where some searches might return partial results and writes will become unavailable, even if one node (especially the leader) fails. So this design is still not fault tolerant and does not provide high availability for use in a production environment.

Let me know if I've misunderstood any behavior you've mentioned in the docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants