Hey @TFA-Labs, thanks for your questions in this thread! This forum went down on the day of the AMA, so unfortunately I wasn't able to see and answer those in the session itself. Happy to answer here though!
Would streamer consider switching to Protocal Buffers or gRPC?
Sure! I agree that the protocol could be much more compact. That question and answer comes in two parts which need to be considered separately: there's the protocol itself, which wraps the message payloads, and there's the message payloads themselves. At the moment both are JSON for various reasons: human-readability is convenient while building and debugging it; bandwidth is not usually the most constrained resource and improves over time; and also some historical reasons which are no longer valid.
The protocol messages (which wrap payloads) are serialized as JSON arrays, which are more inefficient than protobufs for sure, but probably not by an order of magnitude or anything. However, the protocol is versioned, so switching to another serialization format would be quite straight forward. We've actually discussed switching to protobufs, and it seems very likely to happen at some point, although we don't think it's particularly urgent at the moment. Our current priority is to first build the things that unlock decentralization, such as end-to-end encryption. Once those things are in place, I think it's time to consider optimizations like this one.
As for payload content, those tend to be JSON objects, which (like you pointed out) has the downside of repeating the field headers. That actually doesn't need to be the case even now, but in practice that's what people do because the standard apps built on top of the network (Core, Marketplace) assume JSON content in places where message payloads are accessed. IMO, at the lowest level the protocol should allow for arbitrary bytes. A content type field should define how those bytes should be interpreted. Admittedly, achieving that over a JSON wrapper serialization is a bit tricky, because arbitrary binary content needs to be encoded to text.
So to summarize, the efficiency of the protocol could be improved in the following ways, and for sure we're considering them once we're done with the "bigger fish to fry":
- Switch to protobuf format for protocol messages
- Add more content type codes to the protocol spec - some users might still choose to use UTF-8 encoded JSON like now, but make other options "official"
- Make sure the Core and Marketplace apps support those new content types as well as they now do JSON
Please let me know if you'd be interested in contributing, for example by creating .proto
files for the different messages defined in the protocol as an experiment to drive this forward.
Without strict data formatting, verifying signatures at a later date can be problematic.
Message serialization is decoupled from computing signatures; the contents-to-be-signed are actually quite well-defined. The signature scheme is also versioned, so we can change it later if necessary. But it's true that verifying old signatures at a later date can be problematic - this risk realizes if some old signature schemes are no longer supported by client libraries. But in any case it will never become impossible.
Will I be able to prove, my IoT device did or didn't post X message to Streamr, at a particular time trustlessly.
Depends on who do you need to prove it to, and at what confidence level? Say a stream is stored by multiple independent storage nodes, all of which will serve the original message upon request. Someone seeking to verify the above can query the storage nodes and verify the signatures to ensure that the content (including timestamp) of the message has not been altered. Then it boils down to two things:
- Does the verifier trust that the storage nodes are not colluding (most "trustless" systems do require assumptions about nodes not colluding)
- Does the network actually prevent producing old messages at a later time?
For the latter point: At the moment, clients can retroactively "backfill" historical data to streams without restrictions. Some use cases want that feature, while some (like yours) don't. One solution would be to make it a stream-specific setting, which ones network nodes would respect: they would refuse to store/propagate messages timestamped too far back (e.g. message propagation time in network * some safety margin multiplier
. It's impossible to prove the true, exact wall-clock time when a message was created (fundamentally impossible, not just a restriction of Streamr Network); but this way it can be proven that a message was propagated to the network within some safety margin of the message's timestamp.
Due to your mention of Merkle proofs, I'm wondering if you might want to prove it to a smart contract. That's a bit harder than the procedure described above, as smart contracts can't access the outside world and query the storage nodes. It could be accomplished with oracles, or alternatively some kind of cryptographic proof setup. Merkle proofs can be used if there's some process that periodically commits the Merkle roots to the smart contract, so that the path to an individual message in that state can be verified. It's a pretty specific need however, so I think it's out-of-scope for Streamr, but it could be quite easily implemented as a use case-specific layer on top. I'm not sure if the Merkle tree approach is the best candidate for a solution though.
I'm delighted by such good, technical questions! I hope this helps, and please feel free to ask more! 🙂