Archive Nodes Explained: When Do You Actually Need One - and When You Are Overpaying?
GETBLOCK
April 22, 2026
8 min read
Many Ethereum projects are running archive nodes they don't need. Here's how to tell which one your application actually requires – and what it costs either way.
TLDR:
Archive nodes store the full Ethereum state history from genesis, enabling instant historical queries at any block. Full nodes only retain recent state and must recompute older data on demand.
Use an archive node if your app queries historical balances, runs backtests, or powers analytics.
Use a full node if you only need current state – wallets, NFT ownership checks, live event monitoring.
Most teams overpay by defaulting to archive access without checking their RPC logs first.
Check what your app actually requests before committing to the more expensive setup.
What's the Difference Between Archive and Full Nodes?
Every Ethereum node processes the same chain, but they differ significantly in how much data they keep. A standard full node tracks the current network state – account balances, contract code, recent history. It holds enough to validate new blocks and answer standard queries. Anything older gets dropped, and reconstructing it requires real compute work.
Archive nodes take a fundamentally different approach. They store the entire chain state from the genesis block onward, meaning they can return the exact state of any account at any point in history instantly – no recomputation needed. This isn't just a larger database. It's a different data model built around speed and precision for historical lookups.
That architectural difference ends up dictating infrastructure costs more than most developers anticipate during early planning.
Full Node vs. Archive Node: At a Glance
Spec | Full Node (Geth/Snap) | Archive Node (Geth/Archive) |
Disk Usage | ~1.2 TB (SSD) | ~14–18 TB (NVMe) |
RAM Requirement | 16 GB minimum | 32–64 GB recommended |
Initial Sync Time | 12–24 hours | 2–6 weeks |
Historical State Access | Last 128 blocks only | Full history from block 0 |
eth_getBalance (past block) | Requires recomputation | Instant |
Estimated Monthly Cost (self-hosted) | ~$150–300/mo | ~$800–2,000+/mo |
Managed API (shared) | Free–$50/mo | $200–800/mo |
Dedicated Managed Node | $1,000/mo+ | $1,500–4,000/mo+ |
Ideal For | Wallets, dApps, live monitoring | Analytics, backtesting, forensics |
Costs are estimates based on 2025-2026 bare-metal and cloud provider pricing. NVMe storage and bandwidth are the dominant cost drivers on the archive side.
When You Actually Need an Archive Node
High-Volume Historical Queries
The clearest signal that you need archive access is this: your application regularly pulls account balances or contract execution results from specific past blocks. On a full node, that means repeated reconstruction – latency spikes, wasted compute, and a setup that doesn't hold up under production load.
Use Cases That Require Archive Access
The following categories almost always require a dedicated archive endpoint. If your product falls into one of these, there's no practical workaround on a full node:
DeFi analytics dashboards – tracking historical position sizes, liquidation events, or LP returns across arbitrary time ranges
On-chain forensics and AML tooling – tracing fund flows through wallets and contracts across hundreds of blocks
Tax reporting and portfolio valuation – calculating exact asset values at end-of-year or transaction-date snapshots using eth_getBalance or eth_call at a specific block
MEV research and backtest engines – replaying mempool conditions or simulating strategy execution at millisecond-level historical precision
Protocol auditing tools – verifying contract state at the exact block a vulnerability was exploited
Indexers building from scratch – services like The Graph's hosted nodes or custom subgraphs that need to replay full event history on redeployment
When a Full Node Is Completely Fine
A lot of development teams default to archive access because it feels like the "complete" version of the chain. In practice, most production apps never touch that historical depth at all.
Full nodes are built specifically for real-time workloads and handle them well. Performance stays high as long as the app stays in recent block territory. Wallet backends, NFT marketplaces verifying current ownership, apps that monitor live events or send transactions – none of these need data from months ago.

How to Tell the Difference
Pull up the RPC logs. They'll show exactly what the application is requesting. If those calls are consistently for recent blocks, archive access adds zero value and the extra cost buys nothing. The practical rule: don't upgrade infrastructure until a specific feature actually requires historical data.
The Real Cost of Running Archive Infrastructure
Storage, Hardware, and Sync Time
Choosing an archive node is an operational commitment, not just a pricing tier. Storage requirements are substantial – the exact size depends on the client, but figures can reach several terabytes and grow every day as the chain extends. Beyond raw capacity, the hardware matters too. Archive nodes need fast NVMe drives and high-bandwidth systems to return queries quickly. Syncing from genesis can take weeks depending on local hardware and network conditions.
Recommended Self-Hosted Archive Node Specs
If you've decided self-hosting is the right path, here's a realistic hardware baseline to work from, in roughly the order you should configure them:
CPU – 8-core modern processor (AMD EPYC or Intel Xeon); Geth and Erigon are I/O-bound more than compute-bound, but weak CPUs bottleneck state trie operations during sync
RAM – 32 GB minimum for Erigon; 64 GB recommended for Geth archive to keep the state cache warm and avoid thrashing
Primary storage – 4 TB NVMe SSD (PCIe Gen 4 preferred); sustained write speeds below 2,000 MB/s will noticeably slow both sync and live query response times
Secondary storage – plan for 20+ TB total capacity as the chain grows; some teams use tiered setups with NVMe for hot state and high-speed HDD arrays for older history
Network – 1 Gbps symmetric connection minimum; archive sync pulls heavily on upstream bandwidth and peer discovery slows on congested connections
Client software – Erigon is the current recommendation for new archive deployments; its staged sync architecture typically cuts final disk usage to around 2.5–3 TB versus Geth's 14+ TB for equivalent history
Monitoring stack – Prometheus + Grafana dashboards for disk I/O, peer count, sync stage, and block lag; without this, you won't catch a stalling sync until it's already a production incident
Ongoing Maintenance
Once a node is live, the work doesn't stop. Client updates, health monitoring, disk management – it's a continuous responsibility that teams consistently underestimate until someone has to actually do it.
Managed Providers vs. Self-Hosting
Most projects don't have to choose between a bare full node and fully self-hosted archive infrastructure. Managed API providers offer archive endpoints without the hardware burden, which works well at lower query volumes. The trade-off is that shared environments can get unreliable under heavy traffic – throttling and unpredictable latency are common complaints once request frequency climbs past a few hundred calls per minute.

For teams that need consistent performance without taking on server management, dedicated nodes hit a reasonable middle ground: isolated hardware, stable response times, and someone else handling the operational side. The right choice really comes down to query volume and latency tolerance. Low frequency – shared API. High frequency or latency-sensitive – dedicated nodes or self-hosted nodes.
How Teams End Up Overpaying
The Root Cause: Guessing Instead of Measuring
Most infrastructure overspending starts during development. A team signs up for archive access as a precaution, ships the app, and that setup stays in place indefinitely – even when the application never actually queries historical data. Costs grow. Nobody revisits the decision.
The fix is simple: look at the logs before committing to anything. Infrastructure decisions should reflect actual usage patterns, not assumptions made during a planning meeting. Starting with the most expensive option by default is rarely the right financial call.
The Hardware Trap
There's also a secondary market angle worth flagging. Used machines for mining is widely available and affordable, which makes it tempting when budgets are tight. The problem is that mining rigs were built for GPU compute and raw throughput – not the write endurance and consistent disk performance that archive nodes depend on. Running archival storage on the wrong hardware leads to disk failures and unstable query times under load, which defeats the entire purpose.
Making the Right Call
The decision is straightforward once data requirements are clear. If the application needs fast, repeated access to historical state, archive infrastructure is worth the cost. If it only needs to know what's happening right now, a full node is the better choice – cheaper, simpler, and equally performant for that workload.
The harder part is catching this before the bills start accumulating. Requirements also shift over time: an application might start lean and eventually grow into an archive use case, or it might stay lightweight indefinitely. Revisiting infrastructure choices every few months, with production logs in hand, is the only reliable way to stay on the right side of that line.
Popular Posts
June 9, 2021
4 min read
November 9, 2021
5 min read
March 18, 2021
4 min read
May 16, 2022
5 min read