Retrieval-Ranking Recommendation System for H&M Dataset

Modern eCommerce Solutions Based on Deep Learning Algorithms (Part 3)

17 min readFeb 2, 2023

Series index:

How to Improve Product Search Algorithm to Meet Growing Customer Expectations
Visual Search Engine — the Future of Search
Retrieval-Ranking Recommendation System for H&M Dataset

Introduction

This article is part of our “Modern eCommerce Solutions Based on Deep Learning Algorithms” series, in which we delve into the secret of using state-of-the-art algorithms to grow eCommerce business, meet growing customer expectations and easily gain market advantage.

The previous two articles in this series addressed the topic of searching for products based on specific requirements, such as the phrase typed in or the pictures attached. These solutions match situations in which the customers know exactly what they want to buy. However, this is only a small part of shopping. What does deep learning offer in other situations?

Let’s consider cases where the customer:

is looking for a product, but hasn’t specified its details
wants to find additional products to complete the set
browses a store’s offer without specifications, expecting something to catch his/her eye

From a customer’s point of view, browsing through a store’s wide offering can be overwhelming. This is especially noticeable in the fashion industry, where we look for products taking into account a number of factors, such as how they fit our style, how consistent they are with our closet or whether they are in line with the latest trends. This is where recommendation systems come to our aid. They shorten our product search path (saving our time), suggest products we would most likely choose, and even direct us to products we wouldn’t have thought of before (taking into account our style, needs, shopping preferences and previously purchased products).

Why are they so attractive?

The reason is that they benefit both:

customers: by enriching the shopping experience and improving the decision-making process;
retailers: by helping to maintain customer relationships (which in the e-commerce market is extremely important), increasing the number of products purchased in a single shopping cart, and maximizing final profits.

Here we introduce readers to the construction of a personalized recommendation system using the fashion industry as an example. However, keep in mind that this type of system can be directly mapped to all e-commerce industries, and it is one of the most effective modern methods of determining recommendations for customers, used, for example, by Amazon or eBay.

In this article, we will briefly discuss:

the basic concept of how recommendation systems work
challenges in this field
the architecture of our recommendation system for the H&M dataset
results and examples of recommendations

How do recommendation systems work?

Recommendation systems create a personalized purchase suggestion based on a customer’s search behavior and/or purchase history. In the fashion industry, we can think of them as personal virtual sales advisors focused on discovering our preferences, suggesting items that fit our style, accessories that match the clothes we choose, and showcasing the latest trends we might be interested in.

Figure 1 below shows the general process of a recommendation system:

a customer’s action is passed to the algorithm (it can be a click, an addition to favorites or a purchase of a specific product);
based on this, recommendations are created (taking into account all the information available to the system about customers, products, trends and relationships between them all);
the algorithm then updates its results and presents the customer with a personalized list of products (the recommendations can be updated in real time or after a certain unit of time);
and so on and so forth, all the above points — so that you always have the most up-to-date and relevant recommendations.

Figure 1 — Process of Recommendation System for fashion industry

Challenges

Building a recommendation system is not an easy task and there are many challenges. Let’s take a look at some of them now, and later in the article we will show how we overcame these challenges on a sample dataset.

Data scope

The approach can vary depending on the information we have:

with only customer-product interactions, we can create a basic collaborative filtering recommender that identifies similar users and makes predictions based on that;
on the other hand, using customer and/or product features (based on tabular data, images, text descriptions, …) we can create a content-based recommender that uses these additional attributes to recommend items similar to what the user liked/purchased in the past;
the combination of both collaborative filtering and content-based filtering is called a hybrid recommendation engine.

These methods can be a stand-alone recommendation system or one component of an expanded system.

Cold-start problem

The cold start problem occurs when a new customer or new item joins the system, and the recommendation system can’t use any historical customer-product interaction to provide prediction. There are many ways to solve this challenge, but it’s important to be aware of how often this happens in a dataset in order to create the right recommendation system that works in these cases.

Capturing changes in user behavior and trends

Customer behavior, trends and product popularity are constantly changing, especially in the fashion industry. A good recommendation system should identify these changes and provide accurate predictions. For this, it is crucial to determine how to use patterns of customer behavior and how far back to reach for historical data (so as not to interfere with new trends with outdated patterns).

Data size

Another very important thing is to create a solution that fits the size and dimension of the data. Some recommendation methods score every user and object and are very computationally demanding. To cope with the huge scale of data, we should consider two-stage recommendation models — retrieval ranking — which is the subject of the following discussion.

H&M recommendation system architecture

We decided to build our own fashion recommendation system, answering all the challenges described in the section above. To do this, we used the dataset of the popular fashion brand H&M, which is available on Kaggle as part of the competition. In addition to being very diverse (tabular data, text data, images), it gives the opportunity to compare the effectiveness of the built algorithm with the top solutions of the competition (3000 other participants).

Dataset

The dataset contains historical transactions and customer and product metadata:

transactions — consisting of nearly 32 million purchases from each customer for each date, as well as additional information;
items — detailed metadata for each of the more than 100,000 products available for purchase;
customers — metadata for each of the nearly 1.4 million customers in the dataset;
images — a folder of images corresponding to each product.

Goal

Our goal was to predict the products (12 top recommendations) that each customer would purchase within 7 days after a certain date (so it was to be an “on-demand” system).

Solution idea

The dataset was too large to analyze every customer-article pair (1.4M x 100K pairs) as in the basic collaborative filtering approach. Therefore, we decided to build a recommendation system divided into two main parts, retrieval and ranking.

This approach is based on pre-selecting the items that are suitable for the customer from the entire store’s offer (retrievals) and then ordering them (ranking). This effectively reduces the computing power required and allows faster identification of products that best fit the customer profile.

An additional step we added to the standard retrieval-ranking structure is the filtering of business-unsuitable products (e.g., products not available in the sales channel preferred by the customer/for a different age group than all previous purchases, etc.) The idea is shown in Figure 2.

*Figure 2— Retrieval-ranking process for fashion dataset*

The other important assumption was to train our model on the population of customers who made transactions in the last week. By doing so, we solved the problem of seasonality and changing trends in the fashion industry (which further positively affected the efficiency of the calculations).

Technical architecture

Figure 3 shows the technical architecture of our recommendation system.

After the initial data preprocessing, the key steps are:

feature engineering — initial
retrievals
filtering
feature engineering — continuing
ranking

Let’s look at them in detail.

*Figure 3 — Retrieval-ranking technical architecture*

Feature engineering — initial

In the first step of feature engineering, we prepared a set of features necessary for building retrievals. They referred to the characteristics of the customer and the product, but did not take into account the relationship between them, which was done in step 4. For consistency, all the features were described together in step 4. Now let’s move to the next step — retrievals.

Retrievals

The retrieval stage is responsible for selecting an initial set of hundreds of candidates (for each customer) from all possible products. The main goal is to effectively eliminate all items that the user is not interested in.
This was a necessary step in our recommendation system to avoid overwhelming calculations considering 100K products for each customer.

We considered several retrieval methods (content-based, collaborative and hybrid filtering):

1. Most popular products

We can assume that products with the highest purchase rate in the last week will also be popular in the following week. However, not every customer will be interested in the same top products, we should take into account customer characteristics such as age or location. How to do this? For this purpose, we built a dedicated machine learning model that predicted the top 100 products per customer (including customer and product characteristics) among the most popular 600 products of the last week. As a result, we had a diversified list of the most popular products for each customer.

This was a key retrieval in our solution because of its resistance to the new user’s cold-start problem. In the training process, the model learned which products are most frequently purchased by customers with a given set of characteristics and no purchase history. As a result, we were able to make a prediction (generate retrievals) for any customer based on their characteristics alone. Below we can see the variety of recommended products for new customers of different ages.

TEENAGER

MIDDLE-AGED

SENIOR

It is worth noting that all of the above products are dedicated to women, since they are the main group of customers of clothing stores — the model learned this from the purchase history. In addition, the dominant type of recommendations are basic collection products, such as basic T-shirts, leggings, socks.

2. Similar products

Another group of retrievals is based on product similarity. The algorithm proceeds as follows:

takes a list of purchased products (for each customer);
calculates the similarity between the purchased products (their vector representations!) and other products in the dataset using the cosine similarity measure;
selects the most similar products to those purchased in the past as a candidate set for each customer.

We used three different data sources to build a vector representation of each product:

product images

In the first method, we look for visually similar products. The architecture of the solution is based on a deep learning model (transformer architecture), which maps images to their numerical representation — embeddings. As a result, similar images have similar vector representations.

product attributes

The second method is based on product metadata (tabular data). The vector representing the product consists of attribute values such as color, type, pattern.

product purchases history

Having information about the customer’s transaction history, we created a customer-article purchase matrix. As a result, the product was represented by customers and information whether they bought the product or not. The candidate list of this retrieval is very different from the above 2, and this is because we find articles with similar purchase history (bought by the same users), rather than finding similar products based on their characteristics.

Below we can see the results returned by the above three methods for a black crown blouse:

*Figure 7 — Retrievals: Similar products*

3. Similar customers

The idea for creating this retrieval was the popular phrase “other customers also bought.” We implemented an algorithm that searched for items that were often purchased along with the product currently under consideration. In detail, the algorithm took an item from a given customer’s purchase history, collected items from the shopping carts of other customers who also bought that product, counted their popularity and returned a list of the most frequently bought as a candidate list.

The returned products were characterized by a similar style, as in the example below:

*Figure 8— Retrieval:* Other customers have also bought

4. Repurchasing

In the clothing industry, it is popular to buy certain items, such as accessories or a basic t-shirt, repeatedly. To predict such items and customers interested in potential repeat purchases, one of the retrievals included the customer’s purchase history.

Sample results:

*Figure 9— Retrieval: Repurchased products*

The final set of candidates for each client is the sum of the products provided by all the retrievals shown above.

Filtering

Filtering is an additional restriction that allows us to remove some “invalid candidates” from the data retrieval stage based on certain business rules relevant to our problem. This makes the predictions more accurate.

We considered eliminating the following products:

not available in the store’s current offering (out of stock);
not available in the customer’s location or channel (online/stationary) where they usually buy;
out of season (we do not recommend winter boots in the summer, even if the customer’s last purchase was a winter jacket);
mismatched to the price range of products the customer has bought in the past (don’t recommend luxury products to a customer who has bought lower-priced products);
mismatched to the type of customer (don’t recommend children’s products to a customer who has never bought them in the past).

Feature engineering

Proper feature engineering is a key point in selecting the dozen or so products that are best for the customer out of several hundred. For the model to be able to do this, it is important to capture the customer’s buying preferences and position the potential product in relation to these references.

Having customer-candidate (product) pairs, we defined hundreds of features based on:

customer attributes

For example, the customer’s age, postal code and club membership status.

article attributes

For example, product color, graphics, design.

statistics of historical customer transactions

Features that include the customer’s purchase history. For example, the median price of items purchased by the customer in the last month and the number of days since the customer’s last transaction.

statistics of historical article transactions

The concept is similar to the one above, but this time all the features describe the articles. For example, the number of days since the article’s last transaction or the popularity of the color/type of the product under consideration in the last month.

customer-article relation

This feature set combines a customer’s transaction history and item features over different time periods, and is a groundbreaking approach to feature engineering for recommendation systems. It allows you to answer the question in a way that is understandable for a ranking model: how well does the proposed product fit with the purchase preferences of a given customer?

The construction of these attributes is as follows:

For each key attribute describing a product, we calculate individual customer preference patterns. The preference pattern for an attribute consists of a vector of numbers, i.e. the percentage of transactions made by a customer in a given product category (candidate). The attribute can have different values for each candidate customer, as the candidate can come from different categories, such as women’s clothing, children’s clothing and sportswear, and the customer may have made a different number of transactions in each of these categories.
This approach allowed us to avoid creating multiple columns with possible attribute values, especially since we were dealing with attributes with high cardinality.

consistency of a given product feature with the customer’s purchase pattern

These types of features refer to the difference between customer and product attributes, such as the difference between the median price of a customer’s historical transactions and the price of an item.

retrieval scores

Each of the retrievals presented in Section 2 had an additional score corresponding to the importance of the candidate in that particular retrieval. We used these scores as features. One example is the frequency of product purchases for “other customers also bought.”

Ranking

Finally, having a set of candidates for recommendation and the characteristics of the customer and the candidate, we want to select the most interesting products for the customer and order them, placing the most promising at the top.

Model

To make accurate selections and rank candidates, we used one of the Learn to Rank algorithms, LightGBM Ranker.
It is a tree-based model that predicts how relevant a candidate (product) is to a given customer. We measured its effectiveness using the MAP (Mean Average Precision) metric, which compares the recommended list of items with the real set of products (items that the customer actually bought in the predicted period) and rewards for having important recommendations at the top of the list.

Results

Due to the requirements of the competition, we made predictions for all customers from our dataset and selected the top 12 products (candidates with the highest predictions) for each of them. Finally, we compared the obtained recommendations with other contest participants and got really great results — our solution was in the top 1.6% of the leaderboard!

Examples

By tracking the recommendations returned by the model and comparing them with the customer’s purchase history, we drew some interesting conclusions, which we share below.

We distinguished two basic types of purchases:

according to special preferences

This type refers to customers who consistently buy clothes of a certain type, for a certain age group, in a certain style, such as a woman buying elegant clothes for work. In this group, it is relatively easy to determine buying preferences. Our recommendation model has learned this and worked very effectively. Let’s look at its results by tracking the customer’s purchase history.

Step 1. Recommendations before purchase

Initially (for a given customer, with no purchase history), our model recommends basic items popular in the customer’s age group.

*Figure 10 — A customer who buys products in a certain style: recommendations before purchases*

Step 2. Purchase #1

The customer buys a lace black top.

*Figure 11 — A customer who buys products in a certain style: first purchased product*

Step 3. Update recommendations #1

The model adapts to the style and type of product purchased by the customer. Recommends elegant tops similar to the first purchase.

*Figure 12 — A customer who buys products in a certain style: recommendations after first purchase*

Step 4. Purchase #2

During the next purchase, the customer buys a jacket. It is a different type of product, but consistent in style with the previous purchase.

*Figure 13 — A customer who buys products in a certain style: second purchased product*

Step 5. Update recommendations #2

The model offers elegant products, including dresses that match the set with the purchased jacket.

*Figure 14 — A customer who buys products in a certain style: recommendations after second purchase*

Step 6. Purchase #3

The customer makes a third purchase — the selected dress is very similar to the recommendations given earlier. We can therefore assume that the model’s recommendation would be accurate for this customer.

*Figure 15 — A customer who buys products in a certain style: third purchased products*

Step 7. Update recommendations #3

The final recommendations are elegant tops and dresses, consistent in style with all the customer’s previous purchases.

*Figure 16 — A customer who buys products in a certain style: recommendations after third purchase*

in response to a specific need

Another group is customers for whom no buying pattern can be captured. This happens when each product purchased is from a different category, for a different target group and in a different style. Such a variety of purchases suggests that the customer comes to the store with a specific need that he or she wants to fulfill, and is also likely to buy products for the whole family (people of different genders, ages and tastes). In this case, recommending products similar to historical purchases may be much less effective. It is more important to track that person’s current needs (information about what products the customer is looking at in a given session) and recommend products tailored to those he/she has recently viewed (possibly matching the historical style).

To build such a real-time recommendation system, you need data from the activity in a particular shopping session, which is unavailable in this project. However, let’s look at an example of such a customer and the recommendations returned by our model (learned from his shopping history).

Step 1. Recommendations before purchase

As in the example above, the initial recommendations are basic products that are popular with the customer’s age group.

*Figure 17 — A customer with a specific purchase need: recommendations before purchase*

Step 2. Purchase #1

The customer buys accessories for women and men. These are less popular products.

*Figure 18 — A customer with a specific purchase need: first purchased products*

Step 3. Update recommendations #1

The model adapts to non-standard shopping preferences and recommends other accessories.

*Figure 19 — A customer with a specific purchase need: recommendations after first purchase*

Step 4. Purchase #2

The customer buys lingerie — a completely different product than before.

*Figure 20— A customer with a specific purchase need: second purchased product*

Step 5. Update recommendations #2

Again — the model instantly adapts to non-standard shopping preferences and still remembers previously purchased products:

*Figure 21 — A customer with a specific purchase need: recommendations after the second purchase*

Step 6. Purchase #3

The customer’s third purchase was a T-shirt — again, a completely different product from previous purchases.

*Figure 22 — A customer with a specific purchase need: third purchased product*

Step 7. Update recommendations #3

Recommendations after the third purchase include accessories and sweatshirts that can be paired with the purchased t-shirt.

*Figure 23 — A customer with a specific purchase need: recommendations after the third purchase*

With such varied purchases, the model (learned only from transaction history) cannot be expected to predict future purchases. Therefore, a change in strategy is needed — matching recommendations to the data from the current session and using history only to capture customer style.

Conclusions

After reviewing the examples, let’s highlight some important insights into our recommendation system:

The hybrid retrieval-ranking method we proposed allowed us to build a very effective recommendation system (top 1.6% according to competitors) with limited computing resources.
To build this system, it was enough to use only the latest data, so we guaranteed the up-to-dateness of the system. However, in order to refresh the validity of the ranking model and maintain its effectiveness, it is necessary to ensure its periodic training.
For new customers who have no history of previous purchases, the system bases its recommendations on customer attributes such as age, location, etc. Although this approach does not recognize any specific customer preferences, it solves the problem of a cold start.
The model easily learns the style of the customer, who makes consistent purchases.
The model quickly adapts to the purchase of unusual products (e.g., accessories, special designs).
The system recommends products in multiple categories, product attributes, etc., making recommendations comprehensive and allowing the customer to learn about new trends and offers. It is worth noting that the system happens to recommend a product from a category in which the customer has not bought so far.
The system builds recommendations in line with purchase history, but
does not act as a response to a specific purchase need of the customer that is not related to his previous transactions (it is not based on session data).

It is very important to track and understand customers’ needs and use the right approach to build a recommendation system according to their movements in the online store. Should we display recommendations based on the customer’s history or only on the current session? Or maybe both? Each of these has its uses and undoubtedly affects the customer’s positive shopping experience. Thanks for reading!

Words by Magdalena Malinowska, Data Scientist at Altimetrik Poland

https://www.linkedin.com/in/malinowska-magdalena/

Editing by Kinga Kuśnierz, Content Writer at Altimetrik Poland

https://www.linkedin.com/in/kingakusnierz/