indicthreads pune12 recommenders apache mahout

Upload: indicthreads

Post on 04-Apr-2018

227 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout

    1/29

  • 7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout

    2/29

    2

    Contents

    A recommendation problem

    What is a recommender Building a recommender using Mahout

    Tips and tweaks

    Recommender considerations

  • 7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout

    3/29

    A book store

    Sells books:

    By various authors Of various categories

    On different subjects

    From various publishers

    Readers/buyers are asked to rate

    Readers/buyers can provide reviews

    You walk into the store

    (buy something for a friend)

  • 7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout

    4/29

    The store owner

    Asks you what:

    your friend reads (already owns)

    your friend usually likes more

    Has data on what:

    his customers buy his customers rate and review

    Uses a few strategies

  • 7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout

    5/29

    1 - Find similar books

    Depending on which books your friend has, pick

    books: by the same author

    on the same/similar subject/s

    in the same category from the same publication

    (those with highest sales numbers)

  • 7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout

    6/29

    2 - Find books with similar readership

    Define some similarity

    e.g. two books are as similar as the number of readersrating both of them

    Define some limit of relevance

    e.g. only consider books which are more than 4 readers

    similar

    Look for all books which are similar to booksyour friend owns

    Pick books from this set that you friend doesntown

  • 7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout

    7/29

    3 - Find people with similar tastes

    Define some similarity

    e.g. two people are as similar as the number of booksthey like from the same category

    Define some limit of relevance

    e.g. only consider the 3 top people when ordered

    according to how similar they are to your friend

    Look for users similar to your friend and seewhat they read

    Pick books which these people like and yourfriend doesnt own

  • 7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout

    8/29

    Example data1,101,5.0 3,101,2.5 4,106,4.0

    1,102,3.0 3,104,4.0 5,101,4.0

    1,103,2.5 3,105,4.5 5,102,3.0

    2,101,2.0 3,107,5.0 5,103,2.0

    2,102,2.5 4,101,5.0 5,104,4.0

    2,103,5.0 4,103,3.0 5,105,3.5

    2,104,2.0 4,104,4.5 5,106,4.0

    Your friend owns three books:

    Gave 5 stars to book 101 (likes hugely and talks about it all the time)

    Gave 3 stars to book 102 (has shown some liking to it)

    Gave 2.5 stars to book 103 (has read it, but didnt say bad things about it)

    Now, we need to recommend for your friend books he hasnt seen

  • 7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout

    9/29

    A pictorial representation

    101 102 103 104 105 106 107

    1

    2

    3

    4

    5

  • 7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout

    10/29

    Visualize

    101 102 103 104 105 106 107

    1

    2

    3

    4

    5

  • 7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout

    11/29

    A (slightly) bigger example1,101,5.0 3,111,2.5 6,103,2.0

    1,102,3.0 4,101,5.0 6,106,4.0

    1,103,2.5 4,103,3.0 6,113,3.01,109,3.5 4,104,4.5 6,115,5.0

    1,112,4.0 4,106,4.0 7,103,4.5

    2,101,2.0 4,109,2.0 7,104,2.5

    2,102,2.5 4,111,2.5 7,108,4.0

    2,103,5.0 5,101,4.0 7,109,3.5

    2,104,2.0 5,102,3.0 7,110,3.5

    2,107,4.5 5,103,2.0 7,112,2.5

    2,113,3.5 5,104,4.0 8,101,2.0

    3,101,2.5 5,105,3.5 8,105,4.03,104,4.0 5,106,4.0 8,106,4.5

    3,105,4.5 5,109,3.0 8,110,3.0

    3,107,5.0 5,112,4.0 8,114,5.0

    3,115,4.0 6,101,4.5 8,115,3.5

    A l

  • 7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout

    12/29

    A pictorial representation

    101 102 103 104 105 106 107 108 109 110 111 112 113 114 115

    1 2 3 4

    5 6 7 8

    Clearly, not a viable option

    M h h

  • 7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout

    13/29

    Mahout to the rescue

    Wh i A h M h

  • 7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout

    14/29

    What is Apache Mahout

    Apache Mahout

    A machine learning library Works with Apache Hadoop

    Use cases:

    Recommenders

    Clustering

    Classification

    R d i M h t

  • 7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout

    15/29

    Recommenders in Mahout

    Recommenders use data culled from user

    behavior Recommending using Mahout

    Similarity between users or items

    Expressed as a number between 0-1

    Neighborhood of users/items

    Recommendation using this info and an algorithm

    Generic

    Specialized

    Si il it

  • 7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout

    16/29

    Similarity

    Various algorithms:

    Euclidean distance Pearson correlation

    Cosine measure

    Spearman correlation

    Tanimoto coefficient

    Log-likelyhood

    Effectiveness dependent on the input data

    Influences running time and memory

    N i hb h d

  • 7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout

    17/29

    Neighborhood

    Nearest N neighborhood (say, 4):

    Threshold neighborhood (say, > 0.8):

    5

    U

    3

    2

    4

    1

    5

    U

    3

    2

    4

    1

    R nd r

  • 7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout

    18/29

    Recommender

    Recommenders

    Generic recommender User based

    Item based

    Slope-one recommender

    Singular Value Decomposition based

    Liner Interpolation based

    Cluster-based

    Recommender rescorer Recommender evaluator

    A real life Web application

  • 7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout

    19/29

    A real-life Web application

    News aggregator-cum-reader

    Fetches news from a news service Shows the news in a uniform UI

    Lets readers read, like/dislike and comment on news

    Link social networks and share

    Make this a personalized newspaper

    Track user actions

    Derive and store preferences

    Generate recommendations Leverage social accounts, etc.

    Overall design

  • 7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout

    20/29

    Overall design

    User, application

    data (MySQL)

    News

    aggregation,

    storage (Hbase)

    Preferences,

    Recommender

    (Mahout)

    REST

    REST

    REST

    Controller

    API (REST)

    Web application

    Phone/tablet

    applications

    Third party

    applications

    Recommender

  • 7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout

    21/29

    Recommender

    REST

    (Grizzly,Tomcat)

    REST service

    Fetch recommendations

    Input user actions

    Recommender

    (offline, run

    periodically)

    MySQL

    Database

    Input

    table

    dump

    How to extract data one dimension

  • 7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout

    22/29

    How to extract data one dimension

    4299

    511

    128

    51

    13

    4 4

    1

    2

    1

    10

    100

    1000

    10000

    1 2 3 4 5 6 7 8 9Number of News Articles

    News article readership

    News articlereadership

    How to extract data add dimensions

  • 7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout

    23/29

    How to extract data add dimensions

    1

    10

    100

    1000

    10000

    1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 42 44 46 51 57

    Number of News articles / Topics

    News articlereadership

    Topicreadership

    How more data helps

  • 7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout

    24/29

    How more data helps

    0

    5

    10

    15

    20

    25

    30

    35

    40

    0 100 200 300 400 500 600 700 800

    Number of news articles/topics

    No. of readerswith x articles

    eachNo. of readerswith x topicseach

    21

    How more data helps

  • 7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout

    25/29

    How more data helps

    0

    1

    2

    3

    4

    5

    6

    7

    8

    9

    5 25 45 65 85

    Number of news articles/topics

    No. of readerswith x articleseach

    No. of readerswith x topicseach

    How more data helps

  • 7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout

    26/29

    How more data helps

    0

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    95 145 195 245 295 345 395

    Number of news articles/topics

    No. of readerswith x articles

    eachNo. of readerswith x topicseach

    Learnings

  • 7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout

    27/29

    Learnings

    Know thy user

    Frequency of visits Preference logic wrt user

    Know thy items

    Should have enough items per user

    Maximize items per action

    Should have enough intersections

    Should not be transient

    Use tweaking abilities Sharpen the saw

    Questions

  • 7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout

    28/29

    Questions

    ?

  • 7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout

    29/29

    Thank [email protected]

    [email protected]