Looks like a cool project . Looking forward to read more about it!
May I give some thoughts which comes in my mind?
Have you removed common words like he, she, the, a ...? In your found topics it seems they are still in. Also html tags like "href" are in. Removing them could already improve your features.
Have you done a e.g. 5 fold cross validation of the training and test set? This often gives a more realistic view of the predicted results.
Did you have a look at the far outliers in the prediction and reviewed them manually? I think this could give interesting insights to improve your features.
Thanks for sharing :)
RE: *TrufflePig*: Introducing the Artificial Intelligence for Content Curation and Minnow Support