Before I begin the analysis, I have to give endless credit to , the creator of the http://steemsql.com/ database. Without making this amazing project free, I could not have done this analysis. All the data used here is extracted from the SteemSQL database by Tableau. Please go and support him if you can, even in the smallest way, so that people can continue to produce transparent and interesting analyses about the world of Steem.
I would like to try something: Instead of me creating an elaborate data analysis as I usually do, I would like to try a simple question-answer format.
For a first attempt messing around with the data, I'm curious as to what we can learn about some social patterns from looking at comment data in different #categories.Can we answer this question:
I thought perhaps a good initial indicator would be the level of commenting on one another's posts.
Due to the extremely large number of categories that have been posted and commented under, the first thing to do is to filter the results to a more manageable size: "total sum of comment length in category" > 50,000,000 characters.
As a proxy for measuring sociability or interaction, I chose the average comment text length and average depth of comments (i.e. how many levels of comments on average in the comment hierarchy are there for a post)
So we can start off with a look at the average comment text length
Curious results, #spam has by far the longest average comment length. I'm sure someone has a very reasonable theory for this :)
Let's take #spam out for a moment and sort by average comment text length to get a better comparison.
I've also highlighted anything more or less
crypto-related in .
Creative-related in and
regional/language-related in
We do get a pattern emerging here somewhat.
- People tend to discuss more about crypto-stuff than other things.
- Creative-related categories are surprisingly low down on the interactivity level. Perhaps it's more about looking at stuff than discussing stuff? That would make sense, since writing and fiction are high-up exceptions.
- Some region/language-specific categories are more sociable than others :). Korean uses 1-4 letters per character with the norm being 2-3, so you could multiply it by 2.5x - putting kr very close to the others.
Now, let's shade it in with the average comment depth or the amount of hierarchical levels of comments per category:
The comment depth scale isn't very broad, no categories reach over 2.2 average comment depth:
Funnily enough, the top interactive category by comment length is ICE COLD by comment depth - that's #ripple.
#money on the other hand, has 2.2 average comment depth. Am I right in interpreting that people on #ripple just throw their opinions out there en masse but don't engage in discussion whereas people on #money like to discuss ideas on how to best make money in much more depth?
Thanks for bearing with me, I know this was a long post! Please drop your feedback, theories and opinions below - I am so excited to hear them all.
Also, please let me know what kind of things you wondered about Steemit, and what kind of questions you have that I might answer in a future post!! I would love to do this on a regular basis, it was excellent fun :).
Thank you for reading and happy steeming! :)