Filter query - algorithm

Associate
Joined
14 Apr 2003
Posts
1,101
Hi,

I have a series of records in MongoDB (although it could be anything for the purpose of this question). Lets say I have a number of products:

Product:
id
name
price
date_added
interestingness

I want to list the 25 most interesting products ordered by date (latest added at the top). I always want to have some products listed so if there aren't any good ones it should choose worse ones. How does this general kind of algorithm work, ideally at a database level.
 
Associate
Joined
26 Dec 2008
Posts
623
SELECT * FROM `products` ORDER by `interestingness` DESC, `date_added` DESC LIMIT 25

Assuming 'interestingness' is a double from 0->5 (or whatever, as long as its a numerical scale), this will list the 25 most interesting products, and where 2 products share the same level of "interestingness", the latest one will be ranked higher.

However, this will be quite a stale set of results - they won't change much as time goes on, assuming that the most interesting products are the most popular products and therefore more users rate those products as interesting as time goes on.

A less stale set of results (will be more different as time goes on) could be achieved by placing more importance on the date_added field by rounding the interestingness value to 0 decimal places, for example:


SELECT * FROM `products` ORDER by ROUND(`interestingness`,0) DESC, `date_added` DESC LIMIT 25
 
Associate
OP
Joined
14 Apr 2003
Posts
1,101
Hi,

Thanks for your reply. There will be new 'products' added all the time and I want the most interesting recent ones to show. Being stale is a definite no-no - I don't want a really interesting product that was added a year ago to show on the front page (strange as it may seem!).

I think this is very similar to the facebook feed?

I did think about weighting the products based on date_added. today == 1.0 and +7 days == 0. This would remove any products over a week old. Just wondered if it was a standard problem with a standard solution before I start hacking away :D
 
Associate
Joined
26 Dec 2008
Posts
623
Hi,

I don't want a really interesting product that was added a year ago to show on the front page (strange as it may seem!).

I think this is very similar to the facebook feed?

It's not similar to facebook because I think you also want to show old products which are rated very highly amongst the new products?

If you don't want to do that, just add a date limit to the query:

SELECT * FROM `products`WHERE date_added > [last month timestamp]
ORDER by ROUND(`interestingness`,0) DESC, `date_added` DESC LIMIT 25

If you're generating "interestingness" by user votes, you should read this:
http://www.evanmiller.org/how-not-to-sort-by-average-rating.html
You should read it anyway though.
 
Associate
OP
Joined
14 Apr 2003
Posts
1,101
Hi,

Thanks I've read that link - very interesting but not what I'm after, it might come in handy later though.

What I require is a timeline of products. Let's imagine that 5000 products have been added today. I want to display a subset of those products hand picked for a particular user (the interestingness). The most recent product should be at the top and the user should be able to 'load more' which will select another set of interesting products. An old product should not appear on the list until the user has clicked load more sufficient enough times.

If no 'interesting' products have been added out of the 5000 i still want to show something, so inferior products (but still added today) will be shown.

I cant limit the products to added today, because in the morning you would have very few products - in which case I would want the end of yesterdays to be included...
 
Associate
OP
Joined
14 Apr 2003
Posts
1,101
It is a value that is calculated on-the-fly based on a number of factors.

For example, if a user has shown interest in a similar product, the product has been purchased/viewed a lot etc will all increase a products interestingness...

This will be calculated during the query. I was thinking a map reduce job might be the way forward, but it isn't quite right.
 
Back
Top Bottom