Moby Dick off the port side, cap'n!
Like many people who get more involved in understanding the steem blockchain than they really should, taking at least 2d10 SAN damage in the process, I have certain questions that I would like to know the answer to.
and the rest of the #BI community have been doing a great job at running the numbers and presenting them on a regular basis, and they should be commended at the highest levels for doing so. Judging by the votes their posts get, that's exactly what's happening. But they tend to get into the minutia of time series and I'm more interested in dealing with understanding exactly what the numbers mean at a functional level.
So I've dragged my old "I'm a Programmer!" hat out of the back of the dusty closet it'd been hanging in, knocked it against the wall a few times, put it on, and ignored studiously the cascade of rust flakes that fell off of it.
Tools of the Trade
There is no chance that I'm going to use the high-powered tools that the #BI guys use. It's not that I don't have the hardware to do it, but I just don't have access to the software – and I don't really have the experience to make use of them well. Instead, I'm going to use the most public domain and open source stuff that I can find, in part because that means that anybody that reads this can do the same thing, replicate my findings, dispute my findings – just generally do "that science thing."
That's how we get better.
Choice of language: Python 3.6.4.
Is it my favorite language? Maybe not. By and large I would prefer to use Erlang as my toolset, but it just so happens that there are some very useful libraries available for Python and I've written a ton of code for it in the past, albeit in the last major version.
(There has been some significant changes in the language since I last hacked at it. Good times!)
Choice of editor: Atom.
Because sometimes what you need is a ridiculously capable, highly modular editor which has support for turning into a full IDE with linter. That just means that the editor itself will tell me if I'm making strange, gross syntax errors and do its best to correct me along the way. If you have to write code, you want to do it in an editor that helps you rather than hinders you.
Yes, I know. I could've gone with Emacs, and there were times in my life where that would've been my primary choice. But not for the last decade. I love Emacs! I love eLisp! At a certain point you really want to spend more time writing content than you do tweaking the internals of how the editor you use works.
I have long passed that point.
(Interestingly, Atom is also the editor I use for writing my replies. The Steemit reply box is just too small to work with and the Markdown interpretation is entirely in the wrong place, so I just pull up a blank Markdown file to work in. I get a nice interpreted view of what my Markdown looks like and a very nice editing environment that just requires that I cut-and-paste the results back over into the tiny white box. It's vastly superior.)
Choice of database back-end: SteamData.
And now the moment I know you've all been waiting for.
After all, the SQL database has gone pay, and all reasonable people use SQL as a database query language. How could I possibly get access to the contents of the steem blockchain without going through that service?
Well, it's easy. I don't like SQL.
Heresy, I know! Frankly, it gives me COBOL flashbacks. All that text, all that annoying formatting, all that weird "trying to be English" construction – I just don't need that in my life.
MongoDB has very different approach to dealing with database interfaces. The queries are much more programmatic and, frankly, it just feels better for me to use.
Also, it's still free to access and has a solid Python library to interface with it in code. Cheers to on that one; it's solid tech.
Choice of DB spelunker: Robo 3T.
It's been a long, long time since I was anything like a database programmer. I might have been good – once. One of the things that I need to make me even barely competent is the right tool for exploring the database itself. I need a GUI that lets me poke at the fields and the collections, figure out what the format looks like and how to access any given piece. What the default values are. Lucky for me, the tool that used to be known as RoboMongo is still available from the guys who bought it out from the developer, and it's still a great tool.
Choice of Graphics Generator: Plotly For Python
Odds are good that at some point I'm going to want to draw some pretty pictures to keep the attention of part of the audience. You know who you are.
The fact that I've never used Plotly before is almost immaterial to this endeavor. Of course, it's full of bizarre syntax, confusing methodologies, and poorly written documentation – but when have we ever let that hinder us?
What's The Question?
I started this whole process just to answer one question.
If we take a list of "active accounts" and sorted in descending order by the amount of SP/vests that they have, at what point do the cumulative vests above the line exceed the cumulative vests below the line?
Or to put it in terms which are more procedural, what is the cut off point below which it simply doesn't matter what the rest of the active population of the steem blockchain wants because they don't have enough cumulative voting power to stand against the extant whales above them?
From a sociological and political perspective, this is an important question. Below this notional line, the members of the blockchain can do whatever they want and it simply won't matter because they can be almost trivially overruled, individually and en masse, by those at the top.
My hypothesis is that this line is really quite high and that it only takes a minimal number of the high-end population to override the will of anybody underneath. My initial estimation is somewhere short of 400, and perhaps radically short of 400.
We have a question. We have a hypothesis. Now – in order to do science – we must do an experiment.
Limiting the Terms
The basic idea is set.
I have proven that I can pull data from the database into the system. I've got a roadmap of all of the attributes for accounts, and I have considered what things I can actually work with.
A raw pull of every account accessible from the database nets me about 750,000 accounts. This looks reasonable, from a quick poke around the results. Unfortunately, right up front I can see that we are going to have some problems with the usability of this content.
That is the Robo RT view of the first object returned from the naked query for all accounts. There are some obvious problems with this data.
It's not a problem with the data, per se. This is a real account, and it was created way back in the first days of the blockchain, I'm quite sure – and it really hasn't been touched or active since then.
That provides us something of a problem. Sure, I could play around with ancient accounts that don't really do much, or I could filter based on some qualification to trying get accounts which are actually somewhat more valid and interesting.
Something obvious here is that the dates which are stored by default in the steam blockchain have kind of a strange epoch. January 1, 1970 at midnight. This particular account only varies from those defaults in the last vote time for some reason, which was in April 2016.
This is some crap.
Poking around at the accounts which are returned and moving deeper into the stack, I believe that I have determined a field which I can use to somewhat cut back on a lot of the garbage accounts.
"last_account_update"
A by hand survey suggest that this field is only updated on accounts which have been relatively active at some point. Because I'm lazy, and I really don't want to take too small a slice, I want my breakpoint time to be January 1, 2000. I know that the system itself hasn't been in use since then, so any dates earlier than that point are clearly untouched defaults.
Applying this as a filter to my query cuts down the number of replies to a mere 250,000 or so. A little more than that, actually.
This feels like a number that I can deal with. (Whether it's a number that tools can deal with remains to be seen.)
Building the Books
So what does that MongoDB request in Python end up looking like?
from steemdata import SteemData
from datetime import datetime
db = SteemData()
breakTime = datetime(2000, 1, 1)
query = db.Accounts.find({'last_account_update':
{'$gte': breakTime}},
projection={'name': 1,
'vesting_shares.amount': 1,
'_id': 0},
)
We import the modules necessary to generate our query. We need the datetime module in order to generate the proper offset object for the comparison.
The query itself is relatively straightforward. We just want to match on everything whose last account update time is more recent than 1 January 2000. Really quite straightforward.
The projection tells the system what fields we actually want to get back. We don't care about all the fields in every account. That's just too much data. We only want the name of the account, the floating-point number of vesting shares, and we in particular don't want the big hash ID which does us no good, anyway.
We end up getting back an object which is uninstantiated. That is, the actual interaction with the database hasn't occurred yet, this literally represents only a query which will be sent once something is done to this data.
So let's turn it into a list. We can work with lists. They're iterable and dynamic, so if we need to walk the list to do something to the content, we can.
queryList = list(query)
I know. My naming scheme for variables is almost incomprehensible.
What does this data look like?
>>> from pprint import pprint
>>> pprint(queryList[:10])
[{'name': 'a-00', 'vesting_shares': {'amount': 12422.16369}},
{'name': 'a-11', 'vesting_shares': {'amount': 12107.547996}},
{'name': 'a-2', 'vesting_shares': {'amount': 8097.823354}},
{'name': 'a-3', 'vesting_shares': {'amount': 3554.801833}},
{'name': 'a-4', 'vesting_shares': {'amount': 19749.370041}},
{'name': 'a-5', 'vesting_shares': {'amount': 2490.81233}},
{'name': 'a-6', 'vesting_shares': {'amount': 13497.429734}},
{'name': 'a-7', 'vesting_shares': {'amount': 321268.833422}},
{'name': 'a-8', 'vesting_shares': {'amount': 11928.556824}},
{'name': 'a-a-0', 'vesting_shares': {'amount': 16582.118011}}]
That's pretty interesting!
These are the first 10 results from the search to the database. I find it more than a little curious that very obviously testing accounts are still showing up after our filter. Some content dropped out, because the filtered list is about 1/3 as long as the original. But these accounts pointedly did not.
Also note that inside the list, each account came back as a Python dictionary, the second of which is a dictionary nested inside a dictionary. This is kind of a pain in the ass, so inevitably I'm going to write a function which when given one of these elements, pulls out and returns the vesting shares value as a value. Anything else would be insane.
This is still not quite as useful as it needs to be, because what we really want is a sorted list of these accounts in descending order of vesting shares. Luckily, we have more than enough horsepower to generate a new list in short order.
def vestingAmount(queryEntry):
return queryEntry['vesting_shares']['amount']
def sortQuery(queryList):
return sorted(queryList, key=vestingAmount, reverse=1)
>>> sQuery = sortQuery(queryList)
>>> pprint(sQuery[:10])
[{'name': 'steemit', 'vesting_shares': {'amount': 101294832035.79428}},
{'name': 'misterdelegation', 'vesting_shares': {'amount': 33854469950.665653}},
{'name': 'steem', 'vesting_shares': {'amount': 21249773925.079193}},
{'name': 'freedom', 'vesting_shares': {'amount': 15507987396.01915}},
{'name': 'blocktrades', 'vesting_shares': {'amount': 9494007774.078524}},
{'name': 'ned', 'vesting_shares': {'amount': 7344140982.676874}},
{'name': 'databass', 'vesting_shares': {'amount': 3500010180.297931}},
{'name': 'hendrikdegrote', 'vesting_shares': {'amount': 3298001762.871842}},
{'name': 'jamesc', 'vesting_shares': {'amount': 3199868835.022211}},
{'name': 'val-a', 'vesting_shares': {'amount': 3132003554.29581}}]
We have to be careful not to forget the reverse Boolean on the sort (like I did the first two times) or else you get the lowest value accounts at the top. There are a surprising number of accounts with zero SP which made it through my initial filter.
I'm sure we'll talk about that a little bit later.
Instead, let's look at what we did get.
Unsurprisingly, the first three accounts can be directly traced back to Steemit corporate, holding a vast amount of resources in reserve. As other people have reported, there have been some recent movements from the Steemit account to Mr. Delegation, theoretically to help fund some further delegation to applications which are being developed on the blockchain.
That's actually all well and good, but the sheer vastness of the numbers is really going to screw with any kind of analysis that we can do here. I am tempted to remove the first three accounts from consideration when it comes to asking the question I've already proposed. After all, it really doesn't matter what the rest of the platform wants if corporate doesn't want something to happen.
Maybe we'll just set them aside.
Going further, is an interesting case.
Let's take a look over at another view of the blockchain for what freedom is up to.
Freedom is just a big old transfer bin. It appears to simply exist to transfer money in and out. There is no kind of activity of curation or posting, it only exists as an arbitrage point.
(Yes, I know – most of the people interested in reading this sort of thing already understand what kind of insanity is going on at the top of the most valuable accounts on the blockchain. But someone might not. They might be reading this. Maybe. What I find interesting is the sheer number of beg-messages hitting freedom as memo transfers. I could probably make a tidy sum just off of the accumulated bits and pieces being spent to send freedom messages.)
Being an exchange is clearly a profitable occupation, because follows freedom in the list. As well you probably should, since both of them need a lot of liquid resources in order to function as an exchange in the first place. That's not completely surprising.
After that comes @Ned, whom as an implementer of the blockchain itself I would expect to have quite a comfortable nest egg tucked away, just in case – and, in fact, that's exactly what we see.
The numbers are starting to fall off really fast, now. Even this high on the food chain, there are pretty big gaps between major players.
Almost no activity. In fact, this account was created by Steemit originally with a massive vest-dump – and it has done nothing since.
Literally, nothing.
Though it does stand as a pretty good example of remembering that nothing is private on the blockchain. Nothing is concealed. I almost feel sorry for the Earth Nation bot over there…
But then everything changed when the Fire Nation attacked.
The other thing that is brought to mind by this account and other accounts like it is that is going to be impossible to truly filter out all of the accounts which are or have been architecturally intended to be repositories for Steemit vests. This is a particularly large one, with a particularly funny name. How many more accounts that even passed my very lax filter are really just resource dumps?
It would require a lot more research and effort than I am both willing to do and capable of doing to even start making a dent in that problem. I can imagine some sort of immense directed acyclic graph which would depict the relationships as far as we can tell between accounts, but that's no small accomplishment.
Maybe someone else will take up that project one day.
Just for grins, let's look a little further down the database.
>>> pprint(sQuery[10000:10010])
[{'name': 'webdesign29', 'vesting_shares': {'amount': 426642.409568}},
{'name': 'misha', 'vesting_shares': {'amount': 426576.330304}},
{'name': 'spencec6', 'vesting_shares': {'amount': 426533.168183}},
{'name': 'joao-cacador', 'vesting_shares': {'amount': 426521.652178}},
{'name': 'arthur.grafo', 'vesting_shares': {'amount': 426507.687568}},
{'name': 'ainiaziz', 'vesting_shares': {'amount': 426476.41644}},
{'name': 'kode', 'vesting_shares': {'amount': 426282.95957}},
{'name': 'dannyleenders', 'vesting_shares': {'amount': 426239.559514}},
{'name': 'creditceo', 'vesting_shares': {'amount': 426189.040678}},
{'name': 'thecryptodavid', 'vesting_shares': {'amount': 426188.271218}}]
The vests here have really fallen off, which is quite interesting. These all appear to be relatively sensible accounts (though the lack of creativity from should be a case for a little bit of snickering).
Let's pick one at random and take a look at what their activity profile looks like.
This all looks pretty reasonable, though I'm more than a little jealous of 's ability to earn some fat cash for writing poetry.
I went into the wrong line of work, I think.
A mere 10,000 steps down the list, however, and we have dropped into the realms of only 200 SP accounts. That's barely enough vests for the system to allow you to decide for yourself how large your votes should be. This is firmly down into the territory of "regular people can be here."
Remember, this list is over 250,000 lines long. We are less than 1/25 of the way into it, and we are already looking at account levels of SP which are not even really considered into the "minnow" range. These are tasty plankton.
That bodes well for proving my hypothesis, but it probably bodes poorly for the ecology of the steem blockchain.
Answers Forged
Well – that was depressing.
Let's get back to answering the question that we posed originally. This should be pretty straightforward, because we have all the data in lists and it's easy to build some accumulations, figure out the numbers, and find out what's going on.
I've flattened the list a bit to just make it easier to deal with.
def queryList2queryTupList(queryList):
idx = 0
outList = []
for e in queryList:
outList.append(query2tup(idx, e))
idx += 1
return outList
>>> fQuery = queryList2queryTupList(sQuery)
>>> len(fQuery)
287464
>>> pprint(fQuery[:30])
[(0, 'steemit', 101294832035.79428),
(1, 'misterdelegation', 33854469950.665653),
(2, 'steem', 21249773925.079193),
(3, 'freedom', 15507987396.01915),
(4, 'blocktrades', 9494098772.554981),
(5, 'ned', 7344140982.676874),
(6, 'databass', 3500010180.297931),
(7, 'hendrikdegrote', 3298001762.871842),
(8, 'jamesc', 3199868835.022211),
(9, 'val-a', 3132003554.29581),
(10, 'michael-b', 3084198458.874888),
(11, 'val-b', 3058661749.06894),
(12, 'proskynneo', 2991480332.628863),
(13, 'thejohalfiles', 2633775823.387832),
(14, 'minority-report', 2230797417.585554),
(15, 'xeldal', 2044948082.411315),
(16, 'roadscape', 1959600356.285733),
(17, 'jamesc1', 1644781210.539062),
(18, 'arhag', 1605237829.023531),
(19, 'fyrstikken', 1546686313.784641),
(20, 'adm', 1524486923.142378),
(21, 'safari', 1500015009.426551),
(22, 'riverhead', 1493568252.839648),
(23, 'adsactly', 1334221109.867592),
(24, 'trafalgar', 1326882955.849771),
(25, 'created', 1247305568.029396),
(26, 'tombstone', 1210362781.831355),
(27, 'wackou', 1113334187.722864),
(28, 'glitterfart', 1062137422.056735),
(29, 'steemed', 1051204436.079952)]
A list of tuples is pretty easy to deal with.
So -- what's the ridiculously high sum of total vests?
>>> totVests = 0
>>> for e in fQuery:
totVests += e[2]
>>> totVests
348533694697.9651
At the moment, Steemd shows steem_per_mvests at 489.056, making this roughly 170,452,494.594 STEEM/SP in aggregate value.
So we just need to count down the list of accounts adding up vests until we exceed 50% (or 174,266,847,348.98254 vests), and wherever that falls, that's the break point.
Easy!
>>> accumVests = 0
>>> idx = 0
>>> for e in fQuery:
accumVests += e[2]
if accumVests > halfVests:
print (idx, accumVests, halfVests)
break
0 181401162080.11328 174266847348.98254
>>> fQuery[0]
(0, 'steemit', 101294832035.79428)
Okay, that's a problem. The corporate account has more than 50% of the vests in the entire system of reasonably active accounts.
Let's just skip over the first four accounts altogether, shall we? In fact, make it the first six.
Anyone figured out what the problem is?
Ayup. Once we cut out the top 5, we really needed to recalculate what half the vests are.
Let's just write a function to do this. It's getting messy to do it in the REPL.
def computeMidbreak(tupQueryList):
totVests = sum([e[2] for e in tupQueryList])
halfVests = totVests / 2
print('Total vests: {}, half vests: {}\n'.format(totVests,
halfVests))
accumVests = 0
for e in tupQueryList:
accumVests += e[2]
if accumVests > halfVests:
print('Rank {} - {} reaches {} of {}!\n'.format(e[0],
e[1],
accumVests,
halfVests))
break
else:
if (e[0] % 500) == 0:
print('Accum rank {} - {}, {} of {}\n'.format(e[0],
e[1],
accumVests,
halfVests))
>>> computeMidbreak(fQuery[5:])
Total vests: 167132532617.84103, half vests: 83566266308.92052
Rank 73 - gtg reaches 83675498293.07193 of 83566266308.92052!
This is some blunt force code. It's not even brute force – it goes well beyond that.
Effectively we take whatever tuple query list we're handed, go ahead and calculate the total vests by simple summary, calculate the half vests off of that, and then step through the list, accumulating vests as we go until we either reach the half-point or run out of list.
You'll notice that I didn't even code for the possibility that we run out of list. I'm both lazy and know that at some point we have to have more accumulated vests than half the value in the pile.
I didn't actually expect that we would reach the line at position 73 of over 250,000. I knew the distribution was bad, but I didn't realize how bad.
The top 73 holders of vests on the steem blockchain, even allowing for decapitating the top five which are corporate content and the top one which holds more SP in one place than half the rest of the blockchain represent more voting and influence power than the rest of the accounts on the blockchain combined.
Or to put it a different way, no matter who you are, no matter what you do, if the top 73 (actually 78) people involved with this blockchain decide something should happen – that's what happens. No amount of campaigning, no amount of persuasion, no amount of subterfuge can change that fact.
Out of curiosity, let's see what happens if we decapitate the top 10,000 accounts on this list. We will go ahead and move the top of the bar down 10,000 spots, which only brings the number of accounts covered to 240,000, and let's see how far you have to go before 50% of the vests are owned.
>>> computeMidbreak(fQuery[10000:])
Total vests: 4094385384.9629292, half vests: 2047192692.4814646
Accum rank 10000 - readingdanvers, 427126.578891 of 2047192692.4814646
Accum rank 10500 - christianytony, 204839141.37938106 of 2047192692.4814646
Accum rank 11000 - timoshey, 389534075.1581148 of 2047192692.4814646
Accum rank 11500 - catchup, 557556731.7472327 of 2047192692.4814646
Accum rank 12000 - hammockhouse, 711657143.056614 of 2047192692.4814646
Accum rank 12500 - ask-not-please, 853782777.8299729 of 2047192692.4814646
Accum rank 13000 - thesimplelife, 985274542.4688984 of 2047192692.4814646
Accum rank 13500 - roundoar03, 1107113991.4568622 of 2047192692.4814646
Accum rank 14000 - brainisthekey, 1220770239.1390197 of 2047192692.4814646
Accum rank 14500 - uncle-blade, 1327720546.3466616 of 2047192692.4814646
Accum rank 15000 - samiksa1982, 1428844721.4849968 of 2047192692.4814646
Accum rank 15500 - chivacoa, 1522977099.2646227 of 2047192692.4814646
Accum rank 16000 - elibemusic.com, 1610075085.9102407 of 2047192692.4814646
Accum rank 16500 - lokkie, 1690947907.6110935 of 2047192692.4814646
Accum rank 17000 - catonwheels, 1766333288.3911705 of 2047192692.4814646
Accum rank 17500 - thegame68, 1836934473.2640998 of 2047192692.4814646
Accum rank 18000 - mikefromak, 1903144829.8388672 of 2047192692.4814646
Accum rank 18500 - augistune, 1965502901.859729 of 2047192692.4814646
Accum rank 19000 - elisambre, 2025283613.9336421 of 2047192692.4814646
Rank 19189 - andyblack reaches 2047242786.7394085 of 2047192692.4814646!
Allowing for the fact that we started at rank 10,000, it was only another 10,000 accounts until the owned vests exceeded 50% of the total vests.
That is a ridiculously sharp dropping curve right there.
Let's take this one step further. Let's go that one step beyond.
Let's decapitate the top 100,000 accounts on this list. Now, I know that because the differentiation between account values at that point on the list is really small, is going to require a lot of accounts to accumulate before the 50% mark gets hit. I'm going to change the code so that we only get an accumulated rank update every 5000.
def computeMidbreak(tupQueryList):
totVests = sum([e[2] for e in tupQueryList])
halfVests = totVests / 2
print('Total vests: {}, half vests: {}\n'.format(totVests,
halfVests))
accumVests = 0
for e in tupQueryList:
accumVests += e[2]
if accumVests > halfVests:
print('Rank {} - {} reaches {} of {}!\n'.format(e[0],
e[1],
accumVests,
halfVests))
break
else:
if (e[0] % 5000) == 0:
print('Accum rank {} - {}, {} of {}\n'.format(e[0],
e[1],
accumVests,
halfVests))
>>> computeMidbreak(fQuery[100000:])
Total vests: 185774094.90749836, half vests: 92887047.45374918
Accum rank 100000 - avicena41, 1611.526929 of 92887047.45374918
Accum rank 105000 - stewiegriffin, 7597302.133254998 of 92887047.45374918
Accum rank 110000 - bioherby, 14380629.430285048 of 92887047.45374918
Accum rank 115000 - jameseaton, 20525595.943487044 of 92887047.45374918
Accum rank 120000 - whitelotus, 26220640.330002043 of 92887047.45374918
Accum rank 125000 - legrandgm, 31624152.514669873 of 92887047.45374918
Accum rank 130000 - aaqibsohail, 36872727.66174664 of 92887047.45374918
Accum rank 135000 - unknownplayer, 42054003.03950547 of 92887047.45374918
Accum rank 140000 - lena-mikado, 47229579.20794142 of 92887047.45374918
Accum rank 145000 - charleneishere, 52402406.15960627 of 92887047.45374918
Accum rank 150000 - mocle, 57571997.65572927 of 92887047.45374918
Accum rank 155000 - bangbang, 62738220.82528121 of 92887047.45374918
Accum rank 160000 - alansmithee, 67900191.79367426 of 92887047.45374918
Accum rank 165000 - darmidayitrizi, 73058103.28484616 of 92887047.45374918
Accum rank 170000 - potcurator, 78212514.4902361 of 92887047.45374918
Accum rank 175000 - farimani, 83362800.59005027 of 92887047.45374918
Accum rank 180000 - johntheviper, 88508593.28006594 of 92887047.45374918
Rank 184258 - hecqubus reaches 92887116.07530342 of 92887047.45374918!
It only took 84,000 accounts once you offset by 100,000 to control half of the remaining SP pool.
Out of curiosity, let's check that guy out.
Well, he seems all right. He's got a couple of posts, he's got a couple of uploads and…
He has 0.503 SP plus his initial goal creation investment of about 15.
That's how far down the account blockchain that we've come by jumping to the 100,000 point.
Accounts of this level, carrying this much SP, make up the vast bulk of accounts on the blockchain.
Let's look at that.
What's That Look Like?
Lines in the Sky
This is what just a naked calculation with linear scaling of rank versus vests looks like.
Yes, the curve really is that steep. Steemit is holding so much value compared to the rest of the blockchain that this is the distribution curve without any scaling. Now, it would look a little bit different if we strip off the top five accounts again, but just contemplate this.
Think about it.
Maybe this isn't the right way to look at the data. I have a little bit of experience with data mining. Whenever I see this sort of distribution, my first thing to reach for is a logarithmic curve. If we plot the vests as they stand on a log curve, surely it can't possibly be so bad.
It can really be that bad.
Notice our distribution here. Horizontally in this inter is 100 vests, or roughly .25 SP at the current rate. There is a huge population on this platform which is almost indistinguishable in the amount of vests that they have hanging out in their pockets.
And there is a very tiny number, comparatively, who have quite a lot of vests in their pocket. So much so that when plotted on a logarithmic curve they still evidence an exponential J hook.
Let's see if we can cut off the top five accounts and replot this.
Oh look! When plotted on a linear scale, you can just about see a very tiny inflection after we trim off the top five accounts. Of course, that also just changes the top from somewhere over 10 billion to a mere 3.5 billion.
Let's go back to log.
No real difference, except the slope may be very slightly less at the under 20 K rank. Very slightly. We know that the breakpoint for owning more than 50% of the vests is at position 73 on this graph.
This is kind of brutal.
Even changing the y-axis floor to zero doesn't really help matters. It just helps hide how harsh that drop really is.
On the positive side, if you ever wondered where you stood as regards the whole population of the steem blockchain in terms of how much SP you retain – here you go. Odds are good that you are somewhere between the rank 100,000 and 260,000, in that vast, highly populated plateau.
Epilogue
So what does that actually mean?
The distribution of voting power on the steem blockchain is worse than the distribution of wealth on most of the planet. The major difference is that on the blockchain, we can actually see it directly rather than merely observe what that wealth can bring.
Frankly, I prefer the real world. I can at least aspire to be rich, work to be rich, and labor under (perhaps) the delusion that my efforts will make a difference. Here… Well, there are a lot of people who like to talk about the amazing technology of social networking and the blockchain to give people a say in the governments of their communities.
That's just silly.
Perhaps the saving grace of the blockchain, just as it is in the real world, is the inability of people to get along – particularly of people with equal opportunity and power. While conspiracies are the theories that keep giving, in reality we know that competition between equals is some of the most vicious and cooperation is opportunistic. If anything, we may be saved by the fact that if you laid every whale end-to-end, like economists they would all point in different directions.
This data could definitely use more mining. Some sort of directed acyclic graph as I mentioned before would be awesome for building a level of comprehension of what accounts are related to others. There's an entire field of visualization just waiting to be turned.
I don't promise to be any kind of business intelligence guy. Don't consider this investment advice. In no way take anything that I've said as an endorsement or denigration of anything except human nature, because humans suck. I'll go on record as saying that.
Do take this as food for consideration and as an invitation to go and explore the data available yourself. I suspect you will learn things that you never expected.
- Music to hack database code and wrestle with graphing code to: