Latency Testing Solana RPC APIs Providers

11 min readNov 3, 2022

The goal of this testing is to further the community’s understanding of Solana JSON RPC API endpoints and their response times. Latency is a single metric used in evaluating an API provider, and therefore this post is not intended to be a complete evaluation of Solana RPC APIs. Furthermore, metrics like throughput, reliability, and developer tooling may be even more important for you in some cases. There are high-quality providers that aren’t at the top of these tests, and that’s ok; response times are just a single data point that may or may not matter to a Solana project.

This test is an extension of my RPC testing project that lives at http://milliseconds-matter.me/rpc-speed-comparison/

Solana‘s unique infrastructure needs

Solana‘s 400ms blocktime means the Solana blockchain is reliably updated every 400ms. While not as fast as traditional databases, this is a huge accomplishment for a blockchain. There is much to be web3 optimistic about when such high performant “global state machines” and “decentralized databases” are affordable and accessible to creators and builders.

However, Solana‘s speed creates unique infrastructure requirements that can be difficult to fulfill. Due to the sticky nature of investment, both financial and technical, as well as the importance of Ethereum in web3 ecosystem, the majority of existing web3 infrastructure was built to service Ethereum. Unfortunately, the infrastructure for Ethereum does not reliably ctrl+c and ctrl+v to Solana; what works for Ethereum’s ~12 second blocktime does not always work for Solana’s 400ms blocktime.

Solana may be catching up in NFT transactions and active addresses, but Ethereum has a strong institutional hold in the infrastructure space.

Why RPC response times matter on Solana

1. An engaging user experience requires fast response times

The modern attention economy demands a̶d̶d̶i̶c̶t̶i̶n̶g engaging apps; a successful app must capture users, and capturing users requires a frictionless experience.

Web3 will not have a killer app until the web3 user experience matches the web2 user experience.

A responsive experience keeps users engaged to the app. Friction from long response times breaks that engagement. Therefore, apps need to respond to user input as quickly as possible. Studies show that response times over 400ms break a user’s flow.

2. Information accuracy requires speed

With ~400ms blocktimes a single second of latency means your version of reality is out of date by two to three blocks. Working with data that lives in the past means working with potentially inaccurate data. The slower the RPC, the longer the further behind the application is from the global state. If the application acts on or interacts with data from the Solana blockchain this may cause it issues.

Some implications around the importance of block-height are more obvious: for example, for an arbitrage trader, having access to more up-to-date information has an obvious advantage over the competition. But even for a non-trading application, access to more recent block-height data has an advantage. For an application broadcasting information for customers to act on, stale information, even by a few seconds, opens up that application to vulnerability. Malicious actors can manipulate the users of that application, leading to a poor experience. Thus having up-to-date blockchain information is not just an advantage, but a baseline necessity for any application powered by blockchain. — quicknode’s latency whitepaper

3. Transaction reliability can be hurt by slow response times

Slow response times of RPC APIs can cause failed transactions.

All transactions need to be received by an RPC and then re-transmitted to a validator for confirmation. This creates potential bottlenecks from user to RPC, and then from RPC to the validator and finally to the blockchain. If the bottlenecks slow down the transaction transit enough, a block may pass before the transaction is received by a validator. In this case the state the transaction is dependent on may no longer exist; the world may have moved on.

Latency sensitive transactions, those that are dependent on time or state, need to arrive to a validator in time to satisfy those dependencies. With 400ms blocktimes the window is very narrow. If a state dependent transaction is too slow in arriving, it may be dropped or fail. This is a poor experience for the user as they’ll need to wait while the transaction is resent until it’s successful, or they’ll need to manually retry the failed action on their end.

Test results in depth

Declaration of potential bias or conflict of interest: The author was employed at StackPath at the time of publishing. StackPath is a member of the Solana Foundation’s server program, and is a provider of servers used to for RPC nodes. StackPath does not provide a Solana RPC API as a product or RPC nodes as a product. Additionally, the Sentries team are technical consults on the StackPath RPC API.

I attempted to be unbiased in my testing. However, an attempt to be unbiased does not mean a lack of bias. I’ll provide all data and testing methodology at the end of this post.

Results presented in graph form. All source data is provided in methodology section.

These charts cover three regions: North America with San Francisco, Dallas, and New York City, Europe with Paris and Tel Aviv, and APAC with Tokyo.

Spreading these tests across multiple cities within a region provides a geographic coverage of populations. The intent is to best match what an average user would experience in these regions.

I would again like to reiterate that latency is not the only metric that matters when choosing an RPC API. Yes, milliseconds-matter, but the point of diminishing returns for latency is completely dependent on your application. Applications that find 500ms response times perfectly would benefit from prioritizing other metrics like pricing, feature set, and support.

A note differentiating RPC nodes and RPC APIs

A Solana RPC node is a server or VM running the solana-validator program. RPC nodes are discrete components — they are individual servers with unique IP addresses in an internet exchange (IXs are where Solana infrastructure ought to live as data centers can be too slow.) RPC nodes have an inherent latency advantage because communication from a user or app is made directly with the server.

A Solana RPC API is an API endpoint that accepts Solana RPC requests. A single IP address or domain is used to access the API. There may be multiple load balanced or geo-distributed servers behind that single address, and the API determines which RPC node a user will communicate with. Proxying necessarily introduces latency, but that extra latency is not a deal breaker. In fact, APIs can provide a better user experience to a global user base if the nodes are geo-distributed. Having a gatekeeper at the front-door also allows APIs to be more secure, resilient, and scalable. Generally, RPC APIs are what most projects will land on.

A single box with an IP address vs an endpoint supported with N number of RPC nodes.

Most Solana RPC providers function as an API, but some will spin up private RPC nodes for customers. It’s necessary to differentiate the two when performing global latency testing: An RPC node raw dogging the internet with no proxy has a large latency advantage for users near it. If a provider has RPC nodes available in every region (RPCPool or StackPath) they would have an unfair advantages against APIs in this test. Therefore, to make things fair RPC nodes are not included in the main tests and have their own category.

I believe the sf and nyc results may be a bug related to my VPN testing methodology. If not, congrats to RPCPool for an insanely fast RPC node!

RPCPool (Triton) have many RPC nodes around the globe. GenesysGo does as well. Unfortunately, GenesysGo took their public nodes offline while I was in the middle of this testing. GenesysGo are busy at Breakpoint, but asked I follow up with them for future testing. The StackPath RPC nodes are the individual nodes that make up the StackPath-api.

Minimizing RPC response times

How can Solana projects reduce their RPC response times?

Geographic nearness to the the end user is the single biggest factor in RPC response times. Understanding if an RPC has acceptable latency for a projects user base will require testing from the users perspective. Often the most performant RPC for your project will be located geographically central of your user base. For example, if the majority of your users are in Europe, an RPC node in Amsterdam or Frankfurt would provide good coverage over the majority of your users.

Projects have two options. The first is to use one of the many competent RPC APIs out there. I’ve made a good effort to find and test all that are available as of October 2022. I would suggest creating demo accounts and checking out the different portals, pricing, feature sets, and automation options (APIs) of the different providers to find your fit. Of course performing response time testing should also be part of the selection process.

A third option is to run your own RPC nodes and roll your own API.

How we built a low-latency Solana RPC API

I mentioned earlier that the StackPath API is made up of individual RPC nodes. While it was not a huge technical challenge for myself to design and build the proof-of-concept, building out the production ready product required high-level skills in infrastructure orchestration and Linux systems administration. I did not have these skills, and so we were lucky to have the Sentries team as consultants on this project! They specialize in Solana infrastructure and run some of the top performing validators and RPC nodes.

In a previous post I went into great detail on how we designed and built the StackPath RPC API, so I’ll only give the Twitter Spaces pitch here.

A Solana RPC API must be performant, resilient, and scalable.

Geo-distributed RPC nodes to provide coverage of the web3 user base with deployments in APAC, NA, and the EU. A single Anycast IP address routes user requests to the nearest geographic RPC node.
High network speeds only possible with a strong peering strategy inside of an internet exchange (traditional commodity data-centers are too slow for Solana RPC nodes).
Fail over redundancy with readiness probes to pull bad nodes from Anycast routing.
Protected from DDoS, bots, and other attacks.
An API that allows for full automation to make scaling and maintenance easy.

I’m passionate about web3 infrastructure, but I recognize others might not be. All that I need to say is that it’s easier than you may think to roll your own Solana RPC API. However, I understand most teams won’t have the resources to invest in engineering their own infrastructure. Luckily, there are many competent providers of RPC APIs to choose from!

Testing methodology

Tooling

Testing was performed locally in Chrome with my milliseconds-matter JavaScript web app. It’s available on github or live.

If you run the test locally yourself, be sure to check the logs in the developer console! Available at http://milliseconds-matter.me/rpc-speed-comparison/

I used Mozilla’s VPN (an easy way to support Mozilla) for testing all markets besides my market of Dallas. My choice of test markets was at least partially limited by Mozilla’s VPN coverage.

The data was prepared in Excel, and the charts were made in Canva.

Process

Testing took place over two weeks in October 2022. I ran the tests nine total times in each market.

The milliseconds-matter app sends a “getTransactionCount” call to each RPC address on the list of ~50 or so addresses I have. It sends this request in six batches for each test. Any RPC that times out or fails has the address removed from the test results. It then returns a list of RPCs ordered by average response times. The first of the six batches is not used in this average as the first request’s response time tends to be unpredictably higher than average (possibly due to DNS resolution). I ran a special version of the milliseconds-matter app with longer time out periods and larger delays in between each of the six batches. The JSON output log from each test was then saved from the console.

Milliseconds-matter tests outputs JSON log for each test.

To test different markets I used a VPN. This is likely a non-perfect method, but workable for a non-scientific test. Before testing I performed a ping test for each market’s VPN endpoint. I then used the latency from the ping tests to offset the RPC latency from individual markets test results. This would make the result of the RPC latency test as if I ran the test from the VPN endpoint. Mozilla’s VPN lacked market coverage in the APAC region. Otherwise I would have ran tests from India and Singapore as well.

VPN latency tests were used to adjust RPC response times so as to attain results as if tests were run from the VPN’s endpoint.

Data

Data from the tests was then imported into Excel using Power Query.

The “.” character in Excel is distinct from the “.” character outputted by JavaScript/JSON/Chrome! - painful fun fact

The data was prepared so that for each test in each market only the best results from a single provider was used. So if an RPC API had an EU and an NA endpoint, for each market and for each test only the best performing endpoint was used for that market’s average. This was done so that each provider’s results were a best case scenario.

Providers with multiple endpoints had their best result used on a per test / per market basis.

Providers that have RPC nodes (StackPath and RPCPool) had their test results split so the RPC node results were separated from the RPC API results.

RPC APIs:
WAF-SECURED-STACKPATH
RPCPOOL-API-MAINNETRPC Nodes:
STACKPATH-DETROIT-MILLISECONDS-MATTER       
STACKPATH-AMSTERDAM-MILLISECONDS-MATTER       
STACKPATH-SEATTLE-MILLISECONDS-MATTER       
STACKPATH-SEOUL-MILLISECONDS-MATTER       
STACKPATH-MIAMI-MILLISECONDS-MATTER
RPCPOOL-AUDIUS       
RPCPOOL-HXRO       
RPCPOOL-SLOPE       
RPCPOOL-HEDGEHOG       
RPCPOOL-REN
etc...

The data was then imported into Canva to generate the graphs.

The data is available in Excel form here https://docs.google.com/spreadsheets/d/1m0Oj69Viv5btMxIwfSyXDbYXB-TFYtyD/edit?usp=sharing&ouid=108746504184539235850&rtpof=true&sd=true

Limitations

This testing should be ok for evaluating latency, but is by no means of scientific or academic quality.
Latency is important, but it is not the only metric one should use in selecting a Solana RPC API.
Free accounts were used for this testing. In Chainstack’s case free accounts only allow one endpoint and thus do not have the geographic coverage available to paying customers. In theory additional endpoints could be deployed in more locations to improve their test results.
Using a VPN for testing is not ideal. One issue would be if the VPN endpoint is hosted in the same facility as the RPC. In this case the test results will not reflect the experience of actual users.
Testing more market in Europe and Asia would give a more accurate picture of performance in those markets. Additionally, testing in South East Asia (India) would be ideal due to the importance of the market.
All of these providers are hardworking and constantly improving their platforms. My bet is that this data will be out of date in less than a month.

Thanks, and I’m happy to field any questions. I’d especially like to hear if you have an issue with the methodology in these tests. Reach out via twitter at https://twitter.com/J_Shelby_J