Building the First Samwise Recommendation Algorithm

2024-10-09

At Samwise, we set out to solve a common but complex problem in the real estate world: helping users find the most relevant properties quickly and efficiently. Instead of showing users a raw list of properties based on simple filters like price and location, we wanted to deliver something much more intelligent — an algorithm that would rank properties based on user preferences, predicted market dynamics, and recency. This is the story of how we built our first recommendation algorithm from scratch, the challenges we faced, and the solutions we developed along the way.

The Initial Problem

At the beginning, Samwise was straightforward: users entered a maximum price, number of bedrooms, and location. Our platform filtered the listings to show properties that met these criteria, but it lacked any kind of intelligent ranking. This led to a suboptimal experience, where users had no clear indication of which properties were the best matches for their needs.

We aimed to go beyond basic filtering and create a recommendation engine that ranks properties based on multiple key factors:

Predicted Price: What the property market price should based on its features.
User Preferences: Adjusting the price based on how well the property fits the user's specific desires.
Recency: Taking into account how long a property has been listed and reducing its rank if it hasn't sold in a reasonable timeframe.
Good Deal Boost: Highlighting properties that are listed below their predicted price to capture potential "bargains."

Step 1: Predicting Property Prices

The first major step was establishing a baseline with predicted prices for each property. We built a machine learning model that considers key features of a property, including:

Location: Neighborhood desirability, proximity to amenities
Size: Square footage, number of bedrooms/bathrooms
Condition and Amenities: Newly renovated, has a balcony, parking space

This model outputs a predicted market price for every listing, giving us a fair market valuation that serves as the foundation of our ranking system. Additionally, we leverage the CogVLM2 model to process property images, extracting valuable visual information that further enhances our price prediction accuracy. This combination of textual and visual data analysis allows us to capture nuanced details about a property's condition, style, and perceived value that might not be evident from text descriptions alone.

Step 2: Tailoring Prices to User Preferences

Not all users value the same property features equally. For instance, someone looking for a family home might prioritize proximity to schools and parks, while another user might care more about having a modern kitchen or a view. This personalization required us to develop a mechanism for adjusting the predicted price based on a user's individual preferences.

We achieved this by using Large Language Models (LLMs) to process users' input (such as descriptions of their ideal home) and translate it into a set of tags. These tags were then compared against the tags associated with each property, generating a preference match score. This score adjusted the predicted price up or down depending on how well the property matched the user's ideal description.

PP_user = predictedPrice × (1 + β(M_tags - 1))

Where:

PP_user is the personalized predicted price.
M_tags is a match score between 0 and 2, where 1 represents a neutral match.
β is a tuning parameter that controls the strength of the preference adjustment.

This formula ensures that a perfect match (M_tags = 2) increases the predicted price, while a poor match (M_tags < 1) decreases it. The β parameter allows us to fine-tune the impact of user preferences on the overall ranking.

Step 3: Accounting for Recency and Market Dynamics

The next factor we considered was recency — how long a property had been listed without selling. In real estate, a property that lingers too long on the market usually indicates a mismatch between its listed price and what buyers are willing to pay. We needed a way to reduce the ranking of properties that had been listed for an extended period while preserving the value of fresh listings.

We designed a recency decay curve that starts off gentle, reflecting the reality that properties don't sell immediately. However, after a certain period, the decay accelerates, suggesting that the price is likely too high or the property is less desirable. Our decay formula is:

R = 1 / (1 + γ · [(1 / (1 + exp(-λ · (daysListed - τ))) - 0.5) · (1 + δ · log(1 + |predictedPrice - listedPrice| / listedPrice))])

Where:

γ controls the overall decay.
λ adjusts the steepness of the decay curve.
τ sets when the decay becomes more aggressive (midpoint).
δ tunes the price difference factor.

This formula ensures minimal impact on fresh listings but applies stronger decay as time goes on and price discrepancies become more evident.

Step 4: Good Deal Boost

One of the key innovations in the Samwise algorithm is the Good Deal Boost. We wanted to reward properties that were listed at prices well below their predicted market value. These underpriced listings represent potential bargains, and we needed a way to boost their visibility in the ranking.

The formula for applying this boost is:

Boost = 1 + α · max(0, (predictedPrice - listedPrice) / predictedPrice)

Where:

α controls how strong the boost is (typically between 0 and 1).
The boost is only applied when the listed price is below the predicted price.
The boost increases linearly with the percentage difference between predicted and listed prices.

This mechanism ensures that properties with a substantial price gap (e.g., a property listed for €150k but predicted at €250k) get a meaningful boost, while properties listed at or above their predicted price receive no boost.

Step 5: Putting it all together

After working through each of these components, the final ranking formula for Samwise looks like this:

FinalRank = PP_user × R × Boost

Here's how it all works together:

Personalized Predicted Price (PP_user) adjusts the property's predicted price based on how well it fits the user's preferences.
Recency Multiplier (R) applies a time-based decay to reflect how long the property has been listed.
Good Deal Boost rewards properties listed significantly below their predicted price.

This comprehensive ranking system allows Samwise to present users with a personalized, up-to-date list of properties that not only match their preferences but also represent good value in the current market.

By combining these factors, we ensure that:

Users see properties that match their specific needs and preferences.
Fresh listings are given priority, but not at the expense of older listings that might still be good deals.
Potentially underpriced properties are highlighted, giving users the opportunity to find great deals.
The overall ranking adapts to market dynamics and individual user preferences.

The beauty of this algorithm lies in its flexibility and interpretability. Each component has a clear purpose and can be adjusted independently:

The personalization factor (β) controls how much user preferences influence the ranking.
The recency decay parameters (γ, λ, τ, δ) can be tuned to reflect different market conditions and listing lifecycles.
The good deal boost (α) can be adjusted to emphasize or de-emphasize price discrepancies.

This flexible setup means we can keep tweaking and improving our algorithm based on what users tell us and what's happening in the real estate market. Our goal is simple: to help you find your dream home without the hassle. We're always working on making Samwise smarter so it can be your trusty sidekick in your house-hunting adventure!