@cm-steem: The only downside

The only downside I can think of is that it's binary so it's more difficult to read off the air.

Do you think that's possible via flat buffers or grpc?

To further compress the binary serialization we could use 16 byte binary representation of the CPIDs instead of using it's hexadecimal form. I suspect that's where a lot of the storage goes.

Do you have more details on how this can be done in python? Do you mean compresing the string or just converting the CPID from a string to binary?

The files would be far smaller if the CPID was omitted, relying on userId instead & perhaps constructing a separate index for userId:CPID for quick lookup.