@grampo: #ylr-waivio Research:

Research: protocol / ARWeave - для обсуждения

Core goal
Create an open data layer where publishers post updates once, and everyone else (apps, indexers, AI agents) subscribes to updates instead of scraping/polling a million sites.

Today, demand-side services repeatedly hit the same endpoints (“did it change?”). An open update stream turns that into publish-once, consume-many, eliminating redundant traffic and fragmentation.

Concern: if a bot publishes on behalf of a merchant, the chain’s “author” is the bot, not the merchant.

That’s the current “map”: open publish/subscribe data layer + hybrid Hive (fast deltas) and Arweave (bulk) + bot orchestration + SaaS economics + trust registry + scalable indexer.

Hybrid publishing architecture (fast + scalable)

Lane A — Real-time small updates on Hive

Best for frequent, tiny changes (price, inventory, sale flag, metadata tweaks).
Uses Hive’s fast block cadence for near-real-time propagation.
Constraint: publishing throughput is bounded by Resource Credits (RC) / stake.

Lane B — Bulk snapshots / rollups on Arweave, anchored on Hive

Best for big chunks (catalog snapshots, backlog rollups).
Store the large file on Arweave.
Publish a signed on-chain reference on Hive (tx includes content hash + Arweave tx id + metadata).
Indexers treat that as “effectively on-chain” by resolving the reference and ingesting the file.

Publishing service

If payload is small/urgent → publish directly to Hive.
If payload is large, or the bot is RC-constrained / backlog is building → push chunk to Arweave and anchor the pointer on Hive.Result: speed when you can, scale when you must, always verifiable.

Possible Economics

Publishing capacity rental
Users with low volume publish directly from their own accounts (small HP is enough).
High-volume publishers subscribe to a service that effectively rents publishing throughput (RC delegation / bot publishing).
Pricing could be quota-based (daily/monthly bytes, operations, or “updates per day”).
Indexing / API rental

Consumers can self-host a Hive node + indexer + DB + read replicas (costly + operationally heavy).
Or pay a SaaS fee for an API token (cheaper than running 2–3 servers and scaling read replicas).
Indexers can apply policy: “trust these publishers/bots, ignore others.”

Data scale sanity checks

If a product record is ~8 KB, then:

Daily deltas can be much smaller (hundreds of bytes per update), so gigabytes/day is plausible at large scale but not absurd.
Arweave is better for large files; small frequent updates belong on Hive.

Trust and “source of truth” problem (bots publishing for others)

Maintain an explicit trust registry (list of trusted publishing bots).
Require payloads to carry merchant identity claims inside the data.
Indexers choose policy: trust bot X/Y, reject unknown bots.
(Next step in the new chat: tighten this with signatures/attestations so the merchant can cryptographically authorize updates even when a bot broadcasts them.)

Indexing/storage layer

You’re thinking in the hundreds of millions of documents / ~terabyte-ish territory for global catalogs.
MongoDB can handle it with correct architecture (sharding, indexes, replicas), but the real design work is: schema, indexing strategy, partitioning, hot paths, and trust filtering.

That’s the current “map”: open publish/subscribe data layer + hybrid Hive (fast deltas) and Arweave (bulk) + bot orchestration + SaaS economics + trust registry + scalable indexer.