Reddit to me is a large community where people with common interests congregate and posts/shares their experience or viewpoints on certain matters.

What I'm Trying to Resolve

The problem with Reddit has always been to find the appropriate thread for your posts. Which is why I worked on a project with others to create a web application that informs you of the best thread for your posts. It also predicts the amount of upvotes that subreddit would produce.

I was largely responsible for creating the endpoints as well as connecting them to our web development team’s endpoints so that they are able to utilize the models from our machine learning engineers.

Challenges

Our challenge was incoporating 2 models into our endpoint to make sure that the user inputs are able to get not only a thread recomendation but an upvote prediction as well.

We had to feed in our results from our first model into our second model to achieve what we wanted using this function:

def suggestions():
        title_input = request.json['title']
        text_input = request.json['text']
        results_input = request.json['results']

        # Funnels the user inputs into the function and through the model.
        sample_results = subreddit_prediction(title_input,
                                              text_input,
                                              results_input)
        sample_results = sample_results.reset_index()

        sug_sub = (sample_results[0])

        # Funnels the results of the first model into the funciton for upvotes.
        upvote_path = Path(r"Models/up_vote_model.pickle")
        with open(upvote_path, "rb") as f:
            model_uv = pickle.load(f)
        predictor_uv = upvote_predictor(model_uv)

        # Creates the array from the results of the models.
        results = []
        for sub in sample_results[0]:
            results.append({
                'suggested_subreddit': sub,
                'pred_upvotes': predictor_uv.predict(title_input,
                                                     text_input, sub)})

        # Returns results in json format.
        return jsonify(results)

Another challenge that we encountered was utilizing one of our models. The file size for said model was extremely large and we had to create a seperate method just to be able to open the pickled model.

def decompress_pickle(file):
    import bz2
    import pickle
    import _pickle as cPickle
    data = bz2.BZ2File(file)
    data = cPickle.load(data)
    return data

Some of the challenges that the machine learning engineers informed me of was the amount of time it took for the model to train.

One of the models took about 5 hours to train. This meant that if you trained your model and the accuracy of your model is very low. That would mean you just wasted 5 hours training a useless model.

Final Result

We weren’t able to get a very high accuracy on our model but it is completely functional and it gets the job done. We were able to get it to as high as 74% accuracy within our limited time.

Repository Link

App Link