- Common Crawl and Constellation Network will leverage Constellation’s Hypergraph network to provide immutability, provenance, and auditability to Common Crawl’s extensive dataset, which has been instrumental in training 80% of LLMs.
- They will initially implement a customizable “metagraph” to integrate a portion of Common Crawl’s data into Constellation’s network.
The Common Crawl Foundation, known for its expansive internet archive, and Constellation Network, a Web3 blockchain ecosystem, have entered a strategic partnership to enhance the accessibility and transparency of web-crawled data for artificial intelligence (AI) applications.
The collaboration aims to leverage blockchain technology to improve the utility of Common Crawl’s extensive dataset, which has been integral to the training of large language models (LLMs).
The Common Crawl Foundation, which has amassed an archive of nearly 9 petabytes from over 250 billion web pages, sees 80% of LLMs relying on its data.
Constellation Network’s decentralized Hypergraph network will be used to add layers of immutability, provenance, and auditability to this data, a move intended to foster responsible AI development.
By providing transparent access to these large open datasets, the partnership seeks to address growing concerns around the security and authenticity of training data, especially as AI evolves into a projected $3 trillion industry by 2030.
“This partnership represents a significant step forward in securing trusted distribution of Common Crawl,” said Rich Skrenta, Executive Director of the Common Crawl Foundation.
He emphasized that this collaboration allows developers and researchers to verify the authenticity of open datasets, which is crucial for AI training.
Ben Jorgensen, CEO of Constellation Network, mentioned: “It showcases mainstream adoption of Web3 solutions beyond crypto, emphasizing our commitment to a data-focused future with a zero-trust network.”
The initiative will roll out in phases, starting with a customizable subnet, or “metagraph,” that integrates a subset of Common Crawl’s data.
Common Crawl mentioned in its announcement that the metagraph is currently being tested and will soon transition to Constellation’s public Hypergraph network. It will offer developers new opportunities to work with a blockchain-backed data archive.
Further details on the deployment and participation options for organizations will be announced in the coming weeks.
Read Also: M^0 Integrates with Wormhole to Boost $M Stablecoin’s Cross-Chain Capabilities