Skip to content

Add pallet-rate-limiting#2150

Open
ales-otf wants to merge 154 commits intodevnet-readyfrom
feat/rate-limit-pallet
Open

Add pallet-rate-limiting#2150
ales-otf wants to merge 154 commits intodevnet-readyfrom
feat/rate-limit-pallet

Conversation

@ales-otf
Copy link
Contributor

@ales-otf ales-otf commented Oct 21, 2025

Description

This PR introduces pallet-rate-limiting as a uniform rate-limiting solution and migrates a part of legacy rate limiting to this pallet.

Rate limits are set with pallet_rate_limiting::set_rate_limit(target, scope, limit). target is either one call or a group of calls. scope is optional context used to select the configured span (for example, netuid). If a call/group does not need context, it can be configured directly at runtime with target + limit and no resolver data.

There are standalone calls and groups of calls. Groups are defined at runtime, and each call can be either standalone or within a group, but not both. Groups allow multiple calls to share rate-limiting behavior. Depending on mode (ConfigOnly, UsageOnly, ConfigAndUsage), calls share config, usage tracking, or both.

For calls that need additional context, resolvers provide it. Limits can be configured either globally per target, or scoped per target+scope. The role of ScopeResolver is to provide that scope context (for example, netuid) so the extension can select the correct scoped limit entry. ScopeResolver can also adjust span (for example, tempo scaling) and define bypass behavior. UsageResolver resolves usage key(s) so LastSeen is tracked with additional context (for example, per account/per subnet/per mechanism), not only by target.

Enforcement happens via UnwrappedRateLimitTransactionExtension. It first unwraps nested calls (sudo, proxy, utility, multisig), then delegates each inner call to RateLimitTransactionExtension. The extension checks the resolved target/scope and compares current block vs LastSeen. If within span, validation fails with InvalidTransaction::Custom(1). On successful dispatch, the extension writes LastSeen for resolved usage key(s). This is what enforces rate limiting for subsequent calls.

Other pallets should use rate-limiting-interface (RateLimitingInterface) to read limits and last-seen state without depending on pallet internals. Writes through this interface should be avoided as much as possible because they introduce side-effects. The expected write path is the transaction extension itself. Manual set_last_seen writes are only for cases where usage must be updated outside the normal rate-limited call path.

pallet-rate-limiting is instanceable and is intended to be used as one instance per pallet, with local scope/usage types and resolvers. This runtime currently uses one centralized instance for pallet-subtensor + pallet-admin-utils as a transitional setup. Migration and resolvers already exist for grouped and standalone legacy limits, but this PR migrates only grouped legacy limits (GroupedRateLimitingMigration). Standalone migration/cleanup is deferred to a follow-up PR.

How to review:

I tried to organize this PR so that each major change is represented by a single commit, while avoiding meaningless commits as much as possible. You can refer to the commit history and review changes related to specific limits.

  1. To review how data is migrated: runtime/src/migrations/rate_limiting::commits_grouped. This is where you'll find everything considered "legacy" rate-limiting. You can then review each migration from the list separately.
  2. To review whether behavior is covered correctly: the migration above + runtime/src/rate_limiting. There you'll find the resolver implementations and can review how different calls are bypassed, adjusted to tempo, and scoped (for limits: only by NetUid; for last-seen timestamp: various cases).
  3. For proof of correctness and behavior: runtime/tests/rate_limiting. These are integration tests at the extrinsic level, where transaction extensions are involved. You can verify whether rate-limiting behavior is consistent with legacy behavior and whether all cases are covered.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Other (please describe):

⚠️ Breaking Changes

Deprecated extrinsics with their equivalents in pallet-rate-limiting

extrinsic (pallet-admin-utils) pallet-rate-limiting::set_rate_limit params
sudo_set_tx_rate_limit(tx_rate_limit) target = Group(GROUP_SWAP_KEYS), scope = None, limit = Exact(tx_rate_limit)
sudo_set_serving_rate_limit(netuid, serving_rate_limit) target = Group(GROUP_SERVE), scope = Some(netuid), limit = Exact(serving_rate_limit)
sudo_set_weights_set_rate_limit(netuid, weights_set_rate_limit) target = Group(GROUP_WEIGHTS_SET), scope = Some(netuid), limit = Exact(weights_set_rate_limit)
sudo_set_network_rate_limit(limit) target = Group(GROUP_REGISTER_NETWORK), scope = None, limit = Exact(limit)
sudo_set_tx_delegate_take_rate_limit(tx_rate_limit) target = Group(GROUP_DELEGATE_TAKE), scope = None, limit = Exact(tx_rate_limit)
sudo_set_owner_hparam_rate_limit(epochs) target = Group(GROUP_OWNER_HPARAMS), scope = Some(netuid), limit = Exact(epochs)

You can find the values of GROUP_* constants in common/src/rate_limiting.rs.

From the client's perspective, you can query pallet-rate-limiting::Groups storage to list all groups with their IDs and configuration, or pallet-rate-limiting::GroupNameIndex to get the ID of a particular group by name.

Removed storages from pallet-subtensor

On the client side, use pallet-rate-limiting::Limits storage to fetch limits, or pallet-rate-limiting::LastSeen to fetch last-seen timestamps.

  • NetworkRateLimit -> Limits({ Group: GROUP_REGISTER_NETWORK })
  • OwnerHyperparamRateLimit -> Limits({ Group: GROUP_OWNER_HPARAMS })
  • ServingRateLimit -> Limits({ Group: GROUP_SERVE }) then scoped value for netuid
  • StakingOperationRateLimiter -> LastSeen({ Group: GROUP_STAKING_OPS }, { ColdkeyHotkeySubnet: { coldkey, hotkey, netuid } })
  • TxDelegateTakeRateLimit -> Limits({ Group: GROUP_DELEGATE_TAKE })
  • TxRateLimit -> Limits({ Group: GROUP_SWAP_KEYS })
  • WeightsSetRateLimit -> Limits({ Group: GROUP_WEIGHTS_SET }) then scoped value for netuid

pallet-subtensor::Config changes

  • added type RateLimiting: RateLimitingInterface
  • removed:
    • InitialServingRateLimit
    • InitialTxRateLimit
    • InitialTxDelegateTakeRateLimit
    • InitialNetworkRateLimit

Removed events

They moved to pallet-rate-limiting::RateLimitSet event like this:

  • NetworkRateLimitSet -> { target: Group(GROUP_REGISTER_NETWORK), scope: None, limit: Exact(span) }
  • OwnerHyperparamRateLimitSet -> { target: Group(GROUP_OWNER_HPARAMS), scope: None, limit: Exact(epochs) }
  • ServingRateLimitSet -> { target: Group(GROUP_SERVE), scope: Some(netuid), limit: Exact(span) }
  • TxDelegateTakeRateLimitSet -> { target: Group(GROUP_DELEGATE_TAKE), scope: None, limit: Exact(span) }
  • TxRateLimitSet -> { target: Group(GROUP_SWAP_KEYS), scope: None, limit: Exact(span) }
  • WeightsSetRateLimitSet -> { target: Group(GROUP_WEIGHTS_SET), scope: Some(netuid), limit: Exact(span) }

Additional changes

get_network_lock_cost() now uses T::RateLimiting::last_seen(GROUP_REGISTER_NETWORK, None) instead of legacy RateLimitKey::NetworkLastRegistered.

pallet_admin_utils::sudo_set_tx_rate_limit -> pallet_rate_limiting::set_rate_limit(... GROUP_SWAP_KEYS) is now limit + 1:

  • before (pallet_admin_utils::sudo_set_tx_rate_limit(N)) - delta <= limit: a swap was still blocked when exactly N blocks had passed, and it became allowed only after N + 1 blocks (delta > limit);
  • now (pallet_rate_limiting::set_rate_limit(... GROUP_SWAP_KEYS, Exact(S))) - delta < span: a swap is blocked while fewer than S blocks have passed, and it is allowed at exactly S blocks (delta >= span).

So to keep the same real wait time as old N, you must set S = N + 1 (N = 0 stays 0).

The reason for this change is that pallet-rate-limiting uses one unified comparison rule (delta < span) for all limits. We keep that consistent and compensate for this specific legacy boundary in migration (+1).

Checklist

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have run cargo fmt and cargo clippy to ensure my code is formatted and linted correctly
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules

@ales-otf ales-otf self-assigned this Oct 27, 2025
@ales-otf ales-otf added the skip-cargo-audit This PR fails cargo audit but needs to be merged anyway label Oct 27, 2025
@ales-otf
Copy link
Contributor Author

ales-otf commented Feb 13, 2026

A couple of comments:

  • IMHO, we should break large PRs into meaningful pieces in the future.
  • It makes sense to test it on a mainnet clone and verify that all limit classes are migrated successfully.
  • I don't see the latest "add_stake_burn" (subnet_buyback) limit as part of the migration here.
  • probably, we should use if-let-else instead of match-some-none.
  • There are multiple broken comment numerations after the refactoring

Thank you for review! Replying to this one, will go next comments one by one afterwards.

  • yes, agree. the problem with this one is caused by the intention to reduce the number of migrations, but generally - yes. no more than 5000 LOC anymore (i'll try less), until some crazy Cargo.lock rebuild
  • i've covered it with the integration tests, but i'll check, thanks
  • it's a standalone call - this PR covers only grouped calls. you introduced it, when i'd already finished writing migrations and had considered to split it to the standalone/grouped parts. and since standalone calls go to the following PR, i didn't add the migration for that call yet
  • i checked the places you mentioned and i disagree with that, but it's mostly a style preference. i prefer the match in those cases, because on my opinion it's easier to read and follow. we probably need a style guide to prevent such collisions and to also reduce git conflicts, but since we don't have it yet, i'll stick to the match in this pr
  • yeah, well i thought it's ok for the comments, but i'll fix them

@ales-otf ales-otf dismissed stale reviews from sam0x17 and gztensor via 0563aca February 13, 2026 11:10
Comment on lines +395 to +398
(
frame_metadata_hash_extension::CheckMetadataHash::<runtime::Runtime>::new(false),
node_subtensor_runtime::rate_limiting::UnwrappedRateLimitTransactionExtension::new(),
),
Copy link
Collaborator

@l0r1s l0r1s Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires a node upgrade but should be fixed when the MEV shield is upgraded because we won't have extension node side anymore

Comment on lines +1 to +82
//! RPC interface for the rate limiting pallet.

use jsonrpsee::{
core::RpcResult,
proc_macros::rpc,
types::{ErrorObjectOwned, error::ErrorObject},
};
use sp_api::ProvideRuntimeApi;
use sp_blockchain::HeaderBackend;
use sp_runtime::traits::Block as BlockT;
use std::sync::Arc;

pub use pallet_rate_limiting_runtime_api::{RateLimitRpcResponse, RateLimitingRuntimeApi};

#[rpc(client, server)]
pub trait RateLimitingRpcApi<BlockHash> {
#[method(name = "rateLimiting_getRateLimit")]
fn get_rate_limit(
&self,
pallet: Vec<u8>,
extrinsic: Vec<u8>,
at: Option<BlockHash>,
) -> RpcResult<Option<RateLimitRpcResponse>>;
}

/// Error type of this RPC api.
pub enum Error {
/// The call to runtime failed.
RuntimeError(String),
}

impl From<Error> for ErrorObjectOwned {
fn from(e: Error) -> Self {
match e {
Error::RuntimeError(e) => ErrorObject::owned(1, e, None::<()>),
}
}
}

impl From<Error> for i32 {
fn from(e: Error) -> i32 {
match e {
Error::RuntimeError(_) => 1,
}
}
}

/// RPC implementation for the rate limiting pallet.
pub struct RateLimiting<C, Block> {
client: Arc<C>,
_marker: std::marker::PhantomData<Block>,
}

impl<C, Block> RateLimiting<C, Block> {
/// Creates a new instance of the rate limiting RPC helper.
pub fn new(client: Arc<C>) -> Self {
Self {
client,
_marker: Default::default(),
}
}
}

impl<C, Block> RateLimitingRpcApiServer<<Block as BlockT>::Hash> for RateLimiting<C, Block>
where
Block: BlockT,
C: ProvideRuntimeApi<Block> + HeaderBackend<Block> + Send + Sync + 'static,
C::Api: RateLimitingRuntimeApi<Block>,
{
fn get_rate_limit(
&self,
pallet: Vec<u8>,
extrinsic: Vec<u8>,
at: Option<<Block as BlockT>::Hash>,
) -> RpcResult<Option<RateLimitRpcResponse>> {
let api = self.client.runtime_api();
let at = at.unwrap_or_else(|| self.client.info().best_hash);

api.get_rate_limit(at, pallet, extrinsic)
.map_err(|e| Error::RuntimeError(format!("Unable to fetch rate limit: {e:?}")).into())
}
}
Copy link
Collaborator

@l0r1s l0r1s Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the documentation they seems to indicate that implementing custom RPC is somewhat deprecated and they recommend combining a runtime API with state_call which makes it much easier for dApps and doesn't require any node upgrade when update/changes are being made.

That could be something to look into: https://paritytech.github.io/polkadot-sdk/master/polkadot_sdk_docs/reference_docs/custom_runtime_api_rpc/index.html

Wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a look. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW it looks like I forgot to add the rpc to the node!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but as I understand it the point is to stop adding the RPC to the node. so I can just remove it and keep only runtime api? then the clients will use state_call and pass the runtime api method instead, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking-change This PR introduces a noteworthy breaking change skip-cargo-audit This PR fails cargo audit but needs to be merged anyway

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants