Written by Geng Kai, DFG
Data is the key to blockchain technology and the basis for developing decentralized applications (dApps) . While much of the current discussion revolves around data availability (DA) – ensuring that every network participant has access to recent transaction data for verification – there is an equally important aspect that is often overlooked: data accessibility.
In the era of modular blockchain, DA solutions have become indispensable. These solutions ensure that transaction data is available to all participants, enabling real-time verification and maintaining the integrity of the network. However, the DA layer functions more like a billboard than a database. This means that the data is not stored indefinitely; it is deleted over time, just like a poster on a billboard is eventually replaced by a new one.
Data accessibility, on the other hand, focuses on the ability to retrieve historical data, which is crucial for developing dApps and conducting blockchain analysis. This aspect is crucial for tasks that require access to past data to ensure accurate representation and execution. Although data accessibility is important and less discussed, it is just as important as data availability. The two play different but complementary roles in the blockchain ecosystem, and a comprehensive data management approach must address both issues to support powerful and efficient blockchain applications.
Since its inception, blockchain has revolutionized infrastructure and promoted decentralization in various fields such as gaming, finance, and social networks. The creation of dApps. However, building these dApps requires access to large amounts of blockchain data, which is difficult and expensive.
One option for dApp developers is to host and run their own archived RPC nodes. These nodes store all historical blockchain data from the beginning, allowing full access to the data. However, maintaining archive nodes is expensive and has limited query capabilities, making it impossible to query data in the format developers need. While running cheaper nodes is an option, these nodes have limited data retrieval capabilities, which may hinder the operation of dApps.
Another approach is to use a commercial RPC (remote procedure call) node provider. These providers are responsible for the cost and management of nodes and provide data via RPC endpoints. Public RPC endpoints are free but have rate limits that may negatively impact the dApp's user experience. Private RPC endpoints provide better performance by reducing congestion, but even simple data retrieval requires a lot of back-and-forth communication. This makes them request-heavy and inefficient for complex data queries. Additionally, private RPC endpoints are often difficult to scale and lack compatibility across different networks.
Blockchain indexer plays a vital role in organizing on-chain data and sending it to the database for easy querying , which is why they are often called the “Google of blockchain.” They work by indexing blockchain data and making it readily available through a SQL-like query language (using APIs like GraphQL). By providing a unified interface for querying data, indexers allow developers to quickly and accurately retrieve the required information using a standardized query language, greatly simplifying the process.
Different types of indexers optimize data retrieval in various ways:
Full node indexers: These indexers run full blockchain nodes and extract directly from them data, ensuring that the data is complete and accurate, but requires a large amount of storage and processing power.
Lightweight indexers: These indexers rely on full nodes to fetch specific data on demand, reducing storage requirements but potentially increasing query times.
Specialized indexers: These indexers specialize in certain types of data or specific blockchains, optimizing retrieval for specific use cases, such as NFT data or DeFi transactions.
Aggregated indexers: These indexers pull data from multiple blockchains and sources, including off-chain information, to provide a unified query interface, which is especially useful for multi-chain dApps.
Ethereum alone requires 3TB of storage space, and as the blockchain continues to grow, the amount of data storage for Erigon archive nodes will continue to increase. The Indexer Protocol deploys multiple indexers to efficiently index and query large amounts of data at high speeds that are not possible with RPC.
The indexer also allows for complex queries, easy filtering of data based on different criteria, and post-extraction analysis of data. Some indexers also allow aggregation of data from multiple sources, thereby avoiding the need to deploy multiple APIs in multi-chain dApps. By being distributed across multiple nodes, Indexers provide enhanced security and performance, whereas RPC providers may experience outages and downtime due to their centralized nature.
Overall, indexers improve the efficiency and reliability of data retrieval compared to RPC node providers, while also reducing the cost of deploying a single node. This makes the Blockchain Indexer protocol a top choice for dApp developers.
As mentioned before, building a dApp requires retrieving and reading blockchain data in order to run its service. This includes any type of dApp, including DeFi, NFT platforms, games and even social networks, as these platforms need to read data before they can perform other transactions.
DeFi protocols require different information to quote users specific prices, rates, fees, etc. Automated market makers (AMMs) require price and liquidity information about certain pools to calculate swap rates, while lending protocols require utilization to determine lending rates and liquidated debt ratios. Before calculating the interest rate performed by a user, it is essential to enter the information into their dApp.
GameFi needs to index and access data quickly to ensure smooth game play for users. Only through lightning-fast data retrieval and execution can Web3 games rival Web2 games in performance, thereby attracting more users. These games require data such as land ownership, in-game token balances, in-game actions, and more. Using indexers, they can better ensure stable data flow and consistent uptime to ensure a flawless gaming experience.
NFT marketplaces and lending platforms require indexed data to access a variety of information such as NFT metadata, ownership and transfer data, royalty information, and more. Quickly indexing this data eliminates the need to browse each NFT individually to find ownership or NFT attribute data.
Whether it’s a DeFi Automated Market Maker (AMM) that needs price and liquidity information, or a SocialFi application that needs to update new user posts, being able to quickly retrieve data is critical for a dApp to function properly. With the help of indexers, they can retrieve data efficiently and correctly, providing a smooth user experience.
Indexers provide a way to extract specific data from raw blockchain data, including smart contract events in each block. This provides the opportunity for more specific data analysis to provide comprehensive insights.
For example, a perpetual trading protocol can find out which tokens have high trading volume and which tokens incur fees, thereby deciding whether to list those tokens as perpetual contracts on their platform. DEX developers can create dashboards for their products to gain insights into which pools have the highest returns or are the most liquid. Public dashboards can also be created, giving developers the freedom and flexibility to query any type of data to be displayed on charts.
With multiple blockchain indexers available, identifying the differences between indexing protocols is critical to ensuring developers choose the indexer that best suits their needs.
The Graph is the first An indexer protocol launched on Ethereum that makes it easy to query transaction data that was not easily accessible before. It uses subgraphs to define and filter subsets of data collected from the blockchain, such as all transactions related to the Uniswap v3 USDC/ETH pool.
Using index proof, the indexer pledges the native token GRT for indexing and query services, and the principal can choose to stake his tokens here. Curators have access to high-quality subgraphs to help indexers determine which subgraphs to index data for to earn the best query fees. In its transition to greater decentralization, The Graph will eventually discontinue its hosting services and require subgraphs to upgrade to its network, along with upgraded indexers.
Its infrastructure enables an average cost per million queries of $40, which is significantly lower than the cost of self-hosted nodes. Using file data sources, it also supports parallel indexing of both on-chain and off-chain data for efficient data retrieval.
Looking at The Graph’s indexer rewards, it has been growing steadily over the past few quarters. This is partly due to an increase in query volume, but also due to an increase in the token price as they plan to integrate AI-assisted queries in the future.
Subsquid is a point-to-point, horizontally scalable decentralized data lake that efficiently aggregates large amounts of on-chain and off-chain data and protects it through zero-knowledge proofs. As a decentralized network of workers, each node is responsible for storing data from a specific subset of blocks, speeding up the data retrieval process by quickly identifying the nodes holding the required data.
Subsquid also supports real-time indexing, allowing blocks to be indexed before they are finalized. It also supports storing data in the format of the developer's choice, allowing for easier analysis using tools like BigQuery, Parquet, or CSV. Additionally, subgraphs can be deployed on the Subsquid network without migrating to the Squid SDK, enabling codeless deployment.
Although still in the testnet phase, Subsquid has achieved impressive statistics with over 80,000 testnet users, over 60,000 Squid indexers deployed, and over 20,000 verified users on the network Developer. Most recently, on June 3, Subsquid launched the mainnet of its data lake.
In addition to indexing, Subsquid Network data lakes can replace RPC in use cases such as analytics, ZK/TEE coprocessors, AI agents, and Oracle.
SubQuery is a decentralized middleware infrastructure network that provides RPC and indexed data services. It initially supported the Polkadot and Substrate networks and has now expanded to include over 200 chains. It works similarly to The Graph using Proof of Index, with an indexer indexing data and serving query requests, and delegators staking their shares to the indexer. However, it introduces consumers to submit purchase orders to show that the indexer's income is guaranteed, rather than the manager's.
It will introduce SubQuery data nodes that support sharding to prevent constant synchronization of new data between each node, thereby optimizing query efficiency while moving towards greater decentralization. Users can choose to pay a computational fee of approximately 1 SQT token per 1000 requests, or set a custom fee for the indexer through the protocol.
Although SubQuery only launched its token earlier this year, issuance rewards for nodes and delegators have also grown month-over-month in USD value, representing what is available on its platform. The number of query services continues to increase. Since the TGE, the total amount of staked SQT has increased from 6 million to 125 million, highlighting its growth in network participation.
Covalent is a decentralized indexer network in which Block Sample Producer (BSP) network nodes create copies of blockchain data through batch export and store them in Proofs are published on the Covalent L1 blockchain. These data are then refined by the Block Result Producer (BRP) node according to the set rules, and the data that meets the requirements are filtered out.
Through the unified API, developers can easily extract relevant blockchain data in consistent request and response formats, without having to write custom complex queries to access the data. These pre-configured datasets can be extracted from network operators using CQT tokens settled on Moonbeam as a means of payment.
Covalent’s rewards appear to be generally increasing from Q1’23 to Q1’24, partly due to the increase in the price of the Covalent token CQT.
Some indexers (such as Covalent) are general-purpose indexers that only provide standard preconfigured data through the API set. While they may be fast, they do not provide flexibility for developers who need custom datasets. By using the indexer framework, it allows for more custom data processing to meet application-specific needs.
Indexed data must be secure, otherwise dApps built on top of these indexers will also be vulnerable. For example, if transactions and wallet balances can be manipulated, a dApp risks losing liquidity, impacting its users. While all indexers employ some form of security through indexer staking tokens, other indexer solutions may use proofs to further increase security.
Subsquid offers the option to use optimistic and zero-knowledge proofs, while Covalent also publishes proofs that include block hashes. Graph provides contentious challenge periods for indexer queries in the form of optimistic challenge windows, while SubQuery generates Merkle Mountain proofs for each block to calculate hashes for each block of all data stored in its database.
As blockchains continue to grow, so do transaction volumes, making indexing large amounts of data more cumbersome as more processing power is required and storage space. As blockchain networks grow, maintaining efficiency becomes more difficult, but the Indexer Protocol introduces solutions to meet these growing needs.
For example, Subsquid scales horizontally by adding more nodes to store data, and it is able to scale as hardware improves. Graph provides parallel streaming data for faster data synchronization, while SubQuery introduces node sharding to speed up the synchronization process.
Although most blockchain activity still takes place within Ethereum, different blockchains have grown in popularity over time. For example, Layer 2s, Solana, Move blockchain, and Bitcoin ecosystem chains all have their own set of growing developers and activities, which also require indexing services.
Providing support for certain chains not supported by other indexer protocols can earn more market share fees. Indexing data-intensive networks like Solana is no easy task, and so far only Subsquid has successfully provided indexing support for them.
Despite their widespread adoption in dApp development, the potential of indexers is still huge, especially when integrated with AI. As AI continues to become more prevalent in Web2 and Web3, its ability to improve depends on access to relevant data to train models and develop AI agents. Ensuring data integrity is critical for AI applications as it prevents models from being fed biased or inaccurate information.
In the world of indexer solutions, Subsquid has made significant progress in performance and user metrics. Users have already begun experimenting with building AI agents using Subsquid, demonstrating the platform’s versatility and potential in the growing world of data indexing. Additionally, tools like AutoAgora help indexers use AI to provide dynamic pricing for query services on The Graph, while SubQuery supports multiple AI networks such as OriginTrail and Oraichain for transparent data indexing.
The integration of artificial intelligence with indexers is expected to enhance data accessibility and usability in the blockchain ecosystem. By leveraging artificial intelligence technology, indexers can provide more efficient and accurate data retrieval, enabling developers to build more complex dApps and analytics tools. As AI and indexers continue to evolve together, we remain optimistic about the future of data indexing and its role in shaping the decentralized digital landscape.
The above is the detailed content of Web3 data access: introduction to indexers and related projects. For more information, please follow other related articles on the PHP Chinese website!