Good morning to everyone , I hope you are having a great day.
Today's post will be about how many posts were made using #Leofinance or 'hive-167922' tag.
How many users have posted over 100 comments , over 10 posts and much more .
For those who doesn't care about the code , just right to the last heading - DATA and CHARTS .
For nerds like me :) Please cross check the code and see if I can improve the code.
Basic codes
import json
import pandas as pd
import pyodbc
conn = pyodbc.connect('Driver={SQL Server};'
'Server=vip.hivesql.io;'
'Database=DBHive;'
'uid=Hive-amr008;'
'pwd=--hidden--;'
'Trusted_Connection=no;')
cursor = conn.cursor()
The above code is to just establish connection with HiveSQL .
Test_Query = pd.read_sql_query('''select * from TxComments where timestamp > GETDATE()-30 ORDER BY ID DESC ''',conn)
I am getting all the posts + comments for past 30 days and storing it in DataFrame Test_Query .
Output of the Test_Query looks like this -
Posts/ Comments count with LeoFinance tag
posts_count=0
comments_count=0
complete_list=[]
for i in range(0,len(Test_Query)):
json_tags=json.loads(Test_Query['json_metadata'][i])
if 'tags' in json_tags:
if('hive-167922' in json_tags['tags'] or 'leofinance' in json_tags['tags']):
if(Test_Query['parent_author'][i]==''):
posts_count+=1
if 'app' in json_tags:
complete_list.append([Test_Query['author'][i],pd.to_datetime(Test_Query['timestamp'][i]).date(),'post',json_tags['app']])
else:
complete_list.append([Test_Query['author'][i],pd.to_datetime(Test_Query['timestamp'][i]).date(),'post','None'])
else:
comments_count+=1
if 'app' in json_tags:
complete_list.append([Test_Query['author'][i],pd.to_datetime(Test_Query['timestamp'][i]).date(),'comment',json_tags['app']])
else:
complete_list.append([Test_Query['author'][i],pd.to_datetime(Test_Query['timestamp'][i]).date(),'comment','None'])
print(posts_count,comments_count,posts_count+comments_count)
Steps -
- Go through each row ( post/comment)
- check if 'hive-167922' or 'leofinance' tag is used in that post/comment
- if yes , check if it is a post or comment
- if post increase the post count otherwise increase the comment count.
- Store all in 'complete_list' list.
df_data=pd.DataFrame(complete_list)
df_data.columns=['Author','Date','Type','Front-end']
df_posts= df_check[df_check['Type']=='post']
df_posts.columns=['Author','Date','posts','Front-end']
df_comments = df_check[df_check['Type']=='comment']
df_comments.columns=['Author','Date','comments','Front-end']
Steps-
- Take the whole list and convert to DataFrame
- Store only posts in df_posts
- Store only comments in df_comments
df_posts_count=df_posts.groupby('Author').count()
df_comments_count=df_comments.groupby('Author').count()
I have just grouped it by Author to get the count . This will give me how much a particular user has posted and commented .
Output-
count_1to10_posts=0
count_10to30_posts=0
count_30_posts=0
count_1to100_c=0
count_101to250_c=0
count_251_c=0
for i in range(0,len(df_posts_count)):
if df_posts_count['posts'][i] >0 and df_posts_count['posts'][i] <=10:
count_1to10_posts +=1
elif df_posts_count['posts'][i] >10 and df_posts_count['posts'][i] <=30:
count_10to30_posts +=1
else:
count_30_posts +=1
for i in range(0,len(df_comments_count)):
if df_comments_count['comments'][i] > 0 and df_comments_count['comments'][i] <=100:
count_1to100_c +=1
elif df_comments_count['comments'][i] > 100 and df_comments_count['comments'][i] <=250:
count_101to250_c +=1
else:
count_251_c +=1
print('Number of users who have -')
print('Posted 1 - 10 posts:'+str(count_1to10_posts))
print('Posted 11 - 30 posts:'+str(count_10to30_posts))
print('Posted over 30 posts:'+str(count_30_posts))
print('\n')
print('Posted 1- 100 comments:'+str(count_1to100_c))
print('Posted 101 - 250 comments:'+str(count_101to250_c))
print('posted 251 or above comments:'+str(count_251_c))
Steps -
- I have just gone through each row and have segregated based on posts/ comments count .
- I am just increasing the count for posts/ comments if the condition holds true.
DATA and CHARTS
To all those who skipped the above code part , this is what you need -
Output -
Number of users who have -
Posted 1 - 10 posts:1103
Posted 11 - 30 posts:261
Posted over 30 posts:105
Posted 1- 100 comments:1745
Posted 101 - 250 comments:69
posted 251 or above comments:33
To get better picture , let me show the same data in charts -
Posts
So the users who have made just 1-10 posts are huge ( 1103 ) followed by 11-30 ( 261 ) .
I have taken 30 as the dividing point because 30 days = 30 posts = 1 post per day and its good to see 105 users here .
Comments
Those who have commented over 250 times with leofinance tag = just 33 :(
Engage more people , engage moree.
Date wise posts+comments with LeoFinance tag.
Average is 1960 posts + comments per day with leofinance tag . That is quite amazing :)
Top 20 Authors ( Posts count )
| Account | Post_count |
|---|---|
| 332 | |
| 332 | |
| 161 | |
| 125 | |
| 124 | |
| 117 | |
| 107 | |
| 92 | |
| 84 | |
| 80 | |
| 76 | |
| 75 | |
| 71 | |
| 68 | |
| 64 | |
| 63 | |
| 61 | |
| 61 | |
| 60 | |
| 59 |
Top 20 Authors ( Comments count )
| Account | Comment_count |
|---|---|
| 1724 | |
| 913 | |
| 864 | |
| 825 | |
| 726 | |
| 635 | |
| 537 | |
| 498 | |
| 464 | |
| 432 | |
| 407 | |
| 404 | |
| 385 | |
| 373 | |
| 361 | |
| 358 | |
| 351 | |
| 342 | |
| 341 | |
| 322 |
Let me know if you want your data in the comments, I will get it for you .
df_posts_count[df_posts_count['Author']=='amr008']
df_comments_count[df_comments_count['Author']=='amr008']
Output =
20 posts :)
537 comments .
Follow me on noise.cash where I promote Leo/Hive posts - https://noise.cash/u/AMR
Follow me on twitter - https://twitter.com/NaveshSapad
Regards,
MR.