AI Product Recommendations
AI product recommendations use collaborative filtering, content-based matching, or LLM-driven attribute understanding to surface the right next product. Here's how the engines work and the lift to expect.
AI Product Recommendations
Algorithmic suggestions that match shoppers to products using behavioural, content, or LLM-derived signals.
AI product recommendations are the ranked product suggestions that appear in PDP carousels, cart drawers, search results, and post-purchase emails — generated by an engine that scores every candidate SKU against a shopper's context in real time. The three classic approaches are collaborative filtering (people who viewed this also viewed), content-based matching (similar attributes: fabric, brand, price band), and increasingly LLM-driven engines that read product copy and reviews to understand intent.
Done well, the 'you might also like' rail stops being decorative and starts driving 10-30% of total revenue. Done badly, it shows the item the shopper just added to cart.
Recommendation surfaces have multiplied. A Shopify apparel store today might run rails on the home page, the PDP, the cart drawer, the empty-search state, the order-confirmation page, and the abandonment email — each with different intent and different ranking logic. Treating them as one block is the most common mistake.
Modern engines blend signals rather than picking one model. A typical hybrid score weights recent browse behaviour, attribute similarity, margin, and stock level. As a slice of AI Optimization, recommendations are where most stores see the fastest payback because the surface area is huge and the baselines (random or 'bestseller') are weak.
score(u, i) = w1 · CF(u, i) + w2 · Content(u, i) + w3 · Margin(i) - w4 · Recency_penalty(u, i)
CF(u, i)
Collaborative filtering score
Probability user u co-engages with item i based on similar shoppers' behaviour.
Content(u, i)
Content similarity score
Cosine similarity between user's recently-viewed attribute vector and item i's attributes.
Margin(i)
Margin uplift
Normalised gross margin of item i — used as a tiebreaker, not a primary driver.
Recency_penalty(u, i)
Already-seen penalty
Down-weights items the user has just viewed, added to cart, or purchased.
w1..w4
Tunable weights
Learned per surface — PDP weights content high, cart weights CF and margin high.
Beauty store ranking candidates for a PDP 'complete the routine' rail
CF score (cleanser → serum): 0.62
Content score (both 'sensitive skin'): 0.78
Margin score (normalised): 0.55
Recency penalty (not yet viewed): 0
Weights w1..w4: 0.5, 0.3, 0.1, 0.4
→ Score ≈ 0.5·0.62 + 0.3·0.78 + 0.1·0.55 - 0.4·0.0 = 0.60
A score of 0.60 on a 0-1 scale puts this serum comfortably in the top slots of the rail. Swap in a recently-purchased cleanser and the recency penalty drops it out entirely.
The weights matter more than the model choice. A PDP rail that over-weights margin shows expensive irrelevant products and tanks click-through. A cart rail that ignores margin leaves money on the table because the shopper is already buying. Tune per surface, not per store.
Typical revenue lift from AI product recommendations by surface (vs. no rail or static bestsellers)
| Surface | CTR on rail | Attributed revenue lift | Time to positive ROI |
|---|---|---|---|
| Home page rail | 3-6% | 1-3% | 2-4 weeks |
| PDP 'you might also like' | 8-14% | 4-8% | 1-3 weeks |
| PDP 'complete the look/routine' | 10-18% | 6-12% | 1-3 weeks |
| Cart drawer cross-sell | 12-22% | 5-10% | 1-2 weeks |
| Post-purchase upsell | 6-11% | 3-7% | 2-4 weeks |
| Abandonment email | 4-9% | 2-5% | 3-6 weeks |
The cart drawer and PDP 'complete the look' rails consistently outperform because they catch buying intent at peak. Home-page rails are the lowest-leverage surface — strong CTR, weak attributed revenue — because the shopper hasn't formed intent yet. If you're running one rail, run it on the PDP.
AI product recommendations: FAQ
Collaborative filtering uses behavioural co-occurrence — shoppers who viewed A also viewed B. Content-based uses product attributes — A and B share fabric, brand, or price band. Collaborative needs traffic to work; content-based works on day one but misses non-obvious affinities. Most production engines blend both.
Pure collaborative filtering needs roughly 50k+ sessions a month to escape sparsity issues. Content-based and LLM-driven engines work from session one because they read product attributes, not behaviour. If you're below 30k sessions, start content-based and layer collaborative signals as traffic grows.
A well-built engine returns ranked SKUs in 30-80ms server-side, and the rail itself loads lazily below the fold. The performance hit comes from heavy client-side widgets that re-fetch on every scroll. Audit Largest Contentful Paint before and after — if it moves more than 100ms, the integration is the problem, not recommendations.
LLMs read product titles, descriptions, and review text to understand attributes the catalogue doesn't expose — 'good for narrow feet', 'vegan', 'gift for a teen'. They're strongest on long-tail and cold-start SKUs where collaborative signals are thin. They don't replace CF; they add a semantic layer on top.
Use both on different surfaces. 'Frequently bought together' belongs on the cart and PDP near the buy button — it's a complement signal. 'You might also like' belongs lower on the PDP or empty search — it's a discovery signal. Mixing them on one rail confuses the ranking.
Track three metrics: rail CTR, attach rate (% of orders containing a recommended SKU), and incremental revenue from a holdout A/B test. CTR alone is vanity — a rail can drive clicks while cannibalising organic discovery. The holdout test is the only true measure of incrementality.
Yes, and you should. Split traffic 50/50 between two engines (or one engine vs. static bestsellers) for at least two full weekly cycles. Measure revenue per session, not CTR. Most stores find their first big win in the test, not in switching engines later.
Stores moving from no rail (or static bestsellers) to a personalised engine typically see 5-15% lift in revenue per session within four weeks, concentrated on PDP and cart surfaces. Lift plateaus around month three as the model has learned the catalogue.
Apply hard filters before ranking — exclude OOS, exclude SKUs below a margin floor, exclude items the shopper just viewed or bought. Then let the model rank what's left. Trying to teach the model to avoid these via training data is slower and less reliable than a filter step.
Shopify's native 'related products' is content-based and acceptable as a baseline. You typically outgrow it once you have 500+ SKUs or want behaviour-driven ranking. The signal: if your rail CTR is below 4% and attach rate is flat, the built-in engine is the bottleneck.
Get an AI expert review of your site
Paste your URL — Metricuno's AI runs the same heuristic checks a senior CRO consultant would, scoring your page and prioritising the fixes that'll move conversion fastest.