Steem-Python Streaming Blocks
In this post, I will compare the speed when streaming data from the Steem Blockchain using the steem-python library.

The idea for this post came after a discussion with after he had presented his asyncsteem (GitHub Link) code. He is currently building a new library utilizing asynchronous processing in an attempt to speed up interactions with the Blockchain.
These tests include three different functions available in the steem-python library. As well, the same tests are executed against different RPC nodes to compare the performance of the various nodes.
In upcoming tests with asyncsteem I want to use this as a baseline to be able to compare execution speed.
Different options in steem-python:
- stream_comments() - Steemd
- Wrapper for stream_from()
- stream_from() - Blockchain
- API call - get_ops_in_block
- get_blocks_range() - Steemd
- API call - get_blocks()
stream_comments()
Many people starting out with the steem-python library might try to use this when they want to stream posts from the Blockchain. I’ve seen it in several guides, and it is extremely simple to use as it returns a Post instance for every post. But it’s drawback is the extra overhead. Most people are not interested in each and every post, so performing an extra API call is a big waste of time and resources. If the RPC node is not fast enough this can also easily lead to the script not being able to keep up with all blocks.
stream_from()
Stream_comments() described above is a wrapper for this function. So without considering the difference in functionality, stream_comments() will always be slower.
This function executes the API call get_ops_in_block, which returns all operations part of a specific block. So comparing with stream_comments(), this can be used for any kind of operation, not only for posts/comments. But the returned data is in an unprocessed state, so if you want a Post instance your code needs to make that call. But doing this on demand, only when needed, will for sure speed up your code.
get_blocks_range()
While the two functions above end up using the same API call, this one is different. The get_block API call is used instead. Another difference is the execution of the RPC call. The two functions above will use the call() function, as defined in http_client.py, while here, the call_multi_with_futures() function is used, which is a wrapper for call() that allows for a threaded execution with 10 workers by default.
Results
Below is a summary of execution times against different RPC nodes. api.steemit.com executes get_blocks_range() much faster compared to the other two functions, while for the other two RPC nodes, the execution time is evener. This is true even for very old blocks, so I don't think the Jussi implementation on api.steemit.com plays a role. It seems like the threaded calls when using get_blocks_range() is handled differently by api.steemit.com, which would explain the performance difference.
get_blocks_range()
rot@tor:~$ python3 testblocks_range.py
* 100 blocks processed in 2.206186532974243 seconds api.steemit.com
* 100 blocks processed in 11.113192319869995 seconds rpc.buildteam.io
* 100 blocks processed in 5.037718057632446 seconds rpc.steemviz.com
rot@tor:~$ python3 testblocks_range.py
* 100 blocks processed in 2.258392810821533 seconds api.steemit.com
* 100 blocks processed in 10.185550689697266 seconds rpc.buildteam.io
* 100 blocks processed in 5.035927772521973 seconds rpc.steemviz.com
Executed multiple times against api.steemit.com
Multiple executions of the same query, the first iteration is always slower. This shows the positive effect of Jussi on api.steemit.com. This was not visible for the other RPC nodes in the test.
* 100 blocks processed in 2.6615469455718994 seconds api.steemit.com
* 100 blocks processed in 1.9518346786499023 seconds api.steemit.com
* 100 blocks processed in 1.8236401081085205 seconds api.steemit.com
stream_from()
rot@tor:~$ python3 teststream_from.py
* 100 blocks processed in 15.777471780776978 seconds api.steemit.com
* 100 blocks processed in 12.378310680389404 seconds rpc.buildteam.io
* 100 blocks processed in 5.519677400588989 seconds rpc.steemviz.com
rot@tor:~$ python3 teststream_from.py
* 100 blocks processed in 15.657495975494385 seconds api.steemit.com
* 100 blocks processed in 15.46108865737915 seconds rpc.buildteam.io
* 100 blocks processed in 5.46795654296875 seconds rpc.steemviz.com
Executed multiple times against api.steemit.com
Interesting to see that there is no gain executing the same query multiple times. This improvement was only seen with the get_blocks_range function.
* 100 blocks processed in 17.176149129867554 seconds api.steemit.com
* 100 blocks processed in 17.165062189102173 seconds api.steemit.com
* 100 blocks processed in 16.03311800956726 seconds api.steemit.com
stream_comments()
Please note that this script checks for 100 posts and not specifically 100 blocks. In the range I tested, there were less than 100 blocks processed to get to 100 posts. As this is anyway just a wrapper for stream_from() I didn't feel the need to try to perfectly align this one.
rot@tor:~$ python3 teststream_comment.py
* 100 posts processed in 17.911587715148926 seconds api.steemit.com
* 100 posts processed in 22.999014854431152 seconds rpc.buildteam.io
* 100 posts processed in 5.980460166931152 seconds rpc.steemviz.com
rot@tor:~$ python3 teststream_comment.py
* 100 posts processed in 17.890295267105103 seconds api.steemit.com
* 100 posts processed in 13.94810175895691 seconds rpc.buildteam.io
* 100 posts processed in 6.749229669570923 seconds rpc.steemviz.com
Example Code
These sample scripts are modified from 's benchmark script part of asyncsteem (GitHub Link).
get_blocks_range()
rot@tor:~$ more testblocks_range.py
#!/usr/bin/python3
import steem
import time
nodes = nodes=["https://api.steemit.com/",
"https://rpc.buildteam.io/",
"https://rpc.steemviz.com/"]
steemd = steem.steemd.Steemd(nodes)
for node in nodes:
last_block = 19399400
current_block = steemd.last_irreversible_block_num
ltime = time.time()
blocks = steemd.get_blocks_range(last_block,last_block+100)
for entry in blocks:
block_no = entry["block_num"]
if block_no % 100 == 0:
now = time.time()
duration = now - ltime
ltime = now
print("* 100 blocks processed in",duration,"seconds", steemd.hostname)
steemd.next_node()
stream_from()
rot@tor:~$ more teststream_from.py
#!/usr/bin/python3
import steem
import time
nodes = nodes=["https://api.steemit.com/",
"https://rpc.buildteam.io/",
"https://rpc.steemviz.com/"]
steemd = steem.steemd.Steemd(nodes)
blockchain = steem.blockchain.Blockchain(steemd)
for node in nodes:
last_block = 19399400
ltime = time.time()
for entry in blockchain.stream_from(last_block):
block_no = entry["block"]
if block_no != last_block:
last_block = block_no
if last_block % 100 == 0:
now = time.time()
duration = now - ltime
ltime = now
print("* 100 blocks processed in",duration,"seconds", steemd.hostname)
break
steemd.next_node()
stream_comment()
rot@tor:~$ more teststream_comment.py
#!/usr/bin/python3
import steem
import time
nodes = nodes=["https://api.steemit.com/",
"https://rpc.buildteam.io/",
"https://rpc.steemviz.com/"]
steemd = steem.steemd.Steemd(nodes)
for node in nodes:
last_block = 19399400
ltime = time.time()
for index, entry in enumerate(steemd.stream_comments(last_block)):
if index >= 99:
now = time.time()
duration = now - ltime
ltime = now
print("* 100 posts processed in",duration,"seconds", steemd.hostname)
break
steemd.next_node()
Thank you for your time!!