How to find matching profiles?

Users have to answer 17 simple yes/no questions and the answers are stored in an column for each question as tinyint 0/1 values.

At least that's what seems reasonable to me at the moment.

The table is under my control so I could change it if needed.

Now from several tenthousend or maybe hundreds of thousends of entries I need to find those with the closest match. Of course, I need all of the entries that have the exact same answers and this is no problem. But - at least if there are not enough full matches - then I need all records that have maybe 16,15,14... matches out of the 17 answers.

I have not yet the idea on how to handle this without quering 17*16 different answer schemes.





Answer this question

How to find matching profiles?

  • Carlobas

    Hi,

    glad to hear that you picked up some ideas. There sure could be more normalized (but I do not want to exaggerate :-) ). Come back if you have any more questions, you are welcome :-)

    I keep an eye on the post I answered, anyway if I overlook your answer or rerequest, feel free to contact me through my website which is mentioned below.

    HTH, Jens Suessmeyer.

    ---
    http://www.sqlserver2005.de
    ---


  • objectref

    Jens Susmeyers answer below exactly answered my problem if I added a
    ORDER BY Numberofmatches DESC



    the following query will solve your described problem:

    SELECT

    t2.[question_user_UserId],COUNT(*) AS Numberofmatches

    FROM [dbo].[tCmsElementCustomPartnermatchQuestionUser] t1

    INNER JOIN

    tCmsElementCustomPartnermatchQuestionUser t2

    ON t1.[question_user_questionID] = t2.[question_user_questionID] AND

    t1.[question_user_questionanswer] = t2.[question_user_questionanswer] AND NOT

    t1.[question_user_userid] = t2.[question_user_userid] --to eliminate the actual user which has the best match with himself :-)

    Where T1.[question_user_userid] = 15

    Group by t2.[question_user_UserId]

    Let me know if that worked for you.

    -Jens.



  • bat313

    Hi,

    I wouldn’t store the data denormalized. The better way to store it IMHO is to normalize the data, if you want closer machtes you can also setup a score for each answered question: Here is an extract of a possible solution:

    CREATE TABLE Question
    (
    QuestionId INT
    QuestionText VARCHAR(100)
    )

    CREATE TABLE Anwers
    (
    QuestionId INT
    AnswerId INT
    AnswerValue VARCHAR(10) --indicates the right answer
    Score INT
    )

    CREATE TABLE AnweredQuestions
    (
    QuestionId INT
    AnswerId INT
    ParticipantId INT
    AnswerValue
    )

    Thats just a quick one, could be sure more normalized, but these tables could be easy joined and scored as well in one query rather than using the different columns and its even more extensible than your current one.

    HTH, Jens Suessmeyer.

    ---
    http://www.sqlserver2005.de
    ---


  • RGIMatt

    I actually have set up the tables more close to what you recommended, like this and then of course the user table. I first have to do some frontenmd stuff and then will come back to see how to handle this.

    Actually there are no right or wrong answers since questions have the form like: Iwrite down what I need to buy before I go shopping. true/false.

    CREATE TABLE [dbo].[tCmsElementCustomPartnermatchQuestion] (
    [question_id] [int] IDENTITY (1, 1) NOT NULL ,
    [question_categoryID] [int] NOT NULL ,
    [question_weight] [int] NULL ,
    [question_text_DE] [nvarchar] (255) ,
    [question_text_FR] [nvarchar] (255) ,
    [question_text_IT] [nvarchar] (255) ,
    [question_active] [tinyint] NULL
    ) ON [PRIMARY]
    GO

    CREATE TABLE [dbo].[tCmsElementCustomPartnermatchQuestionUser] (
    [question_user_id] [int] IDENTITY (1, 1) NOT NULL ,
    [question_user_userID] [int] NOT NULL ,
    [question_user_questionID] [int] NOT NULL ,
    [question_user_questionanswer] [int] NOT NULL
    ) ON [PRIMARY]
    GO


  • drourke

    thanks, very friendly.

    I actually I became totally lost.

    I received a hint from my boss but this one needs a totally denormalized table and only allows answers as yes or no.

    Something like this:

    From this kind of table

    CREATE TABLE #test (

    [person_id] [smallint] IDENTITY(1,1),
    [person_name] [varchar] (20),
    [answer001] [tinyint],
    [answer002] [tinyint],
    [answer003] [tinyint],
    [answer004] [tinyint],
    [answer005] [tinyint],
    [answer006] [tinyint],
    [answer007] [tinyint],
    [answer008] [tinyint],
    [answer009] [tinyint]
    )

    I should do a query like this:

    SELECT t1.person_name AS p1_name, t2.person_name AS p2_name
    , '''' AS person_rank
    , t1.answer009*1+t1.answer008*2+t1.answer007*4+t1.answer006*8+t1.answer005*16+t1.answer004*32+t1.answer003*64+t1.answer002*128+t1.answer001*256 AS p1_value
    , t2.answer009*1+t2.answer008*2+t2.answer007*4+t2.answer006*8+t2.answer005*16+t2.answer004*32+t2.answer003*64+t2.answer002*128+t2.answer001*256 AS p2_value

    FROM #test t1, #test t2
    WHERE t1.person_id <> t2.person_id
    AND t1.person_name = 'Fritz' AND t2.person_name <> t1.person_name

    and then find matches with something like this (in ColdFusion since we couldn't find the respectively SQL functions)

    <cfoutput query="qgetmatchesprodandtype">
    <cfset tmp = QuerySetCell(qTest, 'person_rank', 9- Len(Replace(FormatBaseN(BitXor(qTest.p1_value,qTest.p2_value),2), "0", "", "ALL")), qTest.CurrentRow)>
    </cfoutput>

    So, I could really need some enlightening examples

    I'd prefer to match answers by counting the number of matches for each question and user undependently if the answer can only be 0/1 or any value between 0-9 (ore anything else)


  • Rashid Farooq

    I might look at doing it this way:

    Let's assume that you have a table of traits of people and you have a candidate and want to find the closest matches.... Kind of like a dating service.

    If I take the absolute value of (person1.trait1 - person2.trait1) then if that is 0 they are a match on that trait, if it is 1 then they are not a match....

    It follows that if I sum up the abs values of the subtracted trait pairs then the lower the overall sum the more "compatible" the two individual are:

    That would lead me to look at:

    Select
    p1.name,
    p2.name,
    Sum(
    abs(p1.trait1 - p2.trait1) +
    abs(p1.trait2 - p2.trait2) +
    abs(p1.trait3 - p2.trait3) +
    .....
    abs(p1.trait17 - p2.trait17)
    ) As matchfactor
    From members as p1, members.p2
    Where p1.id <> p2.id And p1.name = 'smith'
    Order by matchfactor



  • How to find matching profiles?