Hi, this is Koshimizu, I’m a data scientist working at AnyMind Group. We recently launched a new feature on our influencer marketing platform, AnyTag, called lookalike modelling of influencer, which allows marketers to search and obtain similar influencers. In this article, I will introduce the algorithm of this lookalike feature, as well as the model training and deployment workflow.
■ What is the lookalike modelling function?
The platform’s lookalike modelling function utilizes AnyTag’s influencer database to search and show influencers similar to a specific influencer. With this feature, marketers can easily search for influencers similar to those who have achieved strong performance and results from past campaigns, and assign these lookalike influencers to new campaigns. Although this function is only available for Instagram at the moment, we are building this out for YouTube and Twitter as well.
■ Recommendation algorithm for lookalike modelling function
AnyTag’s influencer database holds information on more than 200,000 influencers and their posting data. Using this data, we can extract characteristics and hashtags from recent influencer-generated content, to create vectors that represent the characteristics of that influencer and calculate the similarity between influencers. Here I will explain the algorithm in more detail.
First, nouns, verbs, adjectives and hashtags are extracted from each influencer’s post and aggregated. Then, we create a matrix where each row consists of influencer vectors using Positive Pointwise Mutual Information (hereafter PPMI).
In the below example, the frequency with which a word w appears in the post of an influencer i be n(i,w), the number of words in i be n(i), the frequency of the occurrence of w in the entire post is n(w), and the overall number of words be N. Then the PPMI is as follows:
By using this PPMI matrix, we can reduce the impact of common and high frequency words such as “this” and “do”, and give more importance to characteristic words compared to a simple co-occurrence matrix.
If we use the PPMI matrix as is, the more words we have, the more elements become null/zero, and it becomes less robust. Therefore, we perform Singular Value Decomposition (SVD) on this PPMI matrix to obtain a matrix consisting of “influencer vector” with reduced dimensions.
By using this matrix to calculate the cosine similarity between the target influencer vector and other influencer vectors, the influencers are then displayed in order of similarity.
■ Creating the lookalike model and API
As new influencers and posts accumulate in the AnyTag database, the lookalike model is updated daily to match them. The training jobs and API creation for this model are all done by a set of services configured on GCP.
The overall structure of the pipeline is shown below.
① Using CloudScheduler, invoke CloudFunction
② Send the model creation job to AI Platform Training through CloudFunction
③ AI Platform Training retrieves influencer data from the Influencer DB and creates a lookalike model
④ Upload the created model to CloudStorage
⑤ CloudBuild downloads the model, builds the lookalike API and deploys it on Cloud Run
The pipeline we have built automatically handles everything from job execution to model training and API deployment, so we can always make recommendations based on the latest information without manual operations.
■ visualize influencer vectors
The following is an actual example of influencer vectors on a given day with Japanese influencers, created in the above manner.
This is a visualization of a portion of influencer vectors extracted into two dimensions using the t-SNE technique. The closer the influencers are to each other, the more similar the characteristics of their Instagram posts are.
The red boxes are groups where you can see the general trend. For example, the “item & product” group in the lower right corner contains many accounts that post about items. The “sports & activity” group in the upper center contains many accounts that post about strength training and golf.
In addition, accounts belonging to the same YouTube group are placed close to each other, and various other structures can be seen. The results of the t-SNE visualization change every time it is run, and since the model is updated every day, the next day’s results might be different, depending on changes in an influencer’s post content – but the results are greatly satisfactory.
■ Summary and future development of this function
The above maps the overall structure of AnyTag’s lookalike modelling feature.
Even with the current model, platform users are getting great results. However, in order to further improve the accuracy of recommendations in the future, we need to:
- 1. Create a model that includes not just text information, but also posted images, videos and influencer metadata, and more
- 2. Support similar searches from different perspectives such as influencers with similar hashtags or influencers with similar images
As a result, the current pipeline and infrastructure will be reviewed from time to time, as the model size and training time may increase.
Influencer marketing on social media is a rapidly-evolving field, and we need to improve our machine logic on a daily basis so that we can make more appropriate recommendations. We’re definitely looking to continue to improve our products, and I will share more when we make future updates. I hope you will stay tuned for future developments!