mysql - Sql Popularity algorithm with weighted score -


i'm implement algorithm returns popular posts @ moment, given likes , dislikes.

to this, each post add likes (1) , dislikes (-1) score each like/dislike weighted : latest, heaviest. example, @ moment user likes post, weights 1. after 1 day, weights 0.95 (or -0.95 if it's dislike), after 2 days, 0.90, , on... minimal of 0.01 reached after 21 days. (ps: theses totally approximate values)

here how tables made :

posts table

id | title                 | user_id | ... ------------------------------------------- 1  | random post           | 10      | ... 2  | post          | 36      | ... n  | ...                   | n       | ... 

likes table

id | vote | post_id | user_id | created ---------------------------------------- 1  | 1    | 2       | 10      | 2014-08-18 15:34:20 2  | -1   | 1       | 24      | 2014-08-15 18:54:12 3  | 1    | 2       | 54      | 2014-08-17 21:12:48  

here sql query i'm using which job

select post.*, like.*,  sum(like.vote *      (1 - if((timestampdiff(minute, like.created, now()) / 60 / 24) / 21 > 0.99, 0.99, (timestampdiff(minute, like.created, now()) / 60 / 24) / 21))    ) score  posts post  left join likes on (post.id = like.post_id)  group post.id order score desc 

ps: i'm using timestampdiff minute , not day directly because i'm calculating day myself otherwise returns me integrer , want float value, in order gradually decay overtime , not day per day. timestampdiff(minute, like.created, now())/60/24 gives me number of day passed since creation decimal part.

here questions :

  1. look @ if(expr1, expr2, expr3) part : necessary in order set minimal value like's weight, not go under 0.01 , become negative (and like, older still has little weight). i'm calculating 2 times same thing : expr1 same expr2. isn't there way avoid duplicate expression ?
  2. i going cache query , update every 5 minutes, think pretty heavy on big post , like table. cache necessary or not ? i'm aiming run query on table 50 000 entries, , each 200 associated likes (that makes 10 000 000 entries like table).
  3. should create index in like table post_id ? , created ?

thank !

edit: imagine post can have multiple tags, , each tag can belong multiple posts. if want populars posts given tag or multiple tag, can't cache each query ; there amount of possible queries. query still viable ?

edit final solution: did tests. created table post 30 000 entries , 250 000 entries. without index, query incredibly long (timed out > 10mn), indexes on post.id (primary), like.id(primary) , like.post_id took ~0.5s.

so i'm not caching data, neither using update every 5mn. if table keeps growing still possible solution (over 1s it's not acceptable).

2: going cache query , update every 5 minutes, think pretty heavy on big post , table. cache necessary or not ? i'm aiming run query on table 50 000 entries, , each 200 associated likes (that makes 10 000 000 entries table).

10000 , 50000 considered small on current hardware. table sizes won't need cache, unless query run several times per second. anyway, performance test before deciding have cache.

3: should create index in table post_id ? , created ?

i create index (post_id, created, vote). way query can information index , doesn't need read table @ all.

edit (response comments):

an index slow down inserts/updates slightly. in end, path choose dictate characteristics of need in terms of cpu/ram/disk i/o. if have enough ram db expect entire like table cached in ram might better off index on post_id.

in terms of total load need consider ratio between insert , select , relative cost of insert , select or without index. gut feeling total load lower index.

regarding question on concurrency (selecting , inserting simultaneously). happens depends on isolation level. general advice keep inserts/updates short possible. if don't unneccessary things between start of insert , commit should fine.


Comments

Popular posts from this blog

java - How to specify maven bin in eclipse maven plugin? -

single sign on - Logging into Plone site with credentials passed through HTTP -

php - Why does AJAX not process login form? -