本文内容全部来自《集体智慧编程》一书,原书采用的是python,因为没有python编程环境,所以用PHP实现
PHP代码
- <?php
 - //filename:test_collecting_preferences
 - //数据和代码来自《集体智慧编程》
 - //原文采用python实现,尝试用PHP进行转换
 - //@description 搜集用户偏好寻找相近用户
 - $datalist = array(
 - 'Lisa Rose' => array(
 - 'Lady in the Water' => 2.5,
 - 'Snake on a Plane' => 3.5,
 - 'Just My Luck' => 3.0,
 - 'Superman Returns' => 3.5,
 - 'You, Me and Dupree' => 2.5,
 - 'The Night Listener'=> 3.0
 - ),
 - 'Gene Seymour' => array(
 - 'Lady in the Water' => 3.0,
 - 'Snake on a Plane' => 3.5,
 - 'Just My Luck' => 1.5,
 - 'Superman Returns' => 5.0,
 - 'You, Me and Dupree' => 3.5,
 - 'The Night Listener'=> 3.0
 - ),
 - 'Michael Phillips' => array(
 - 'Lady in the Water' => 2.5,
 - 'Snake on a Plane' => 3.0,
 - 'Superman Returns' => 3.5,
 - 'The Night Listener'=> 4.0
 - ),
 - 'Claudia Puig' => array(
 - 'Snake on a Plane' => 3.5,
 - 'Just My Luck' =>3.0,
 - 'Superman Returns' => 4.0,
 - 'You, Me and Dupree' => 2.5,
 - 'The Night Listener'=>4.5
 - ),
 - 'Mick LaSalle' => array(
 - 'Lady in the Water' => 3.0,
 - 'Snake on a Plane' => 4.0,
 - 'Just My Luck' => 2.0,
 - 'Superman Returns' => 3.0,
 - 'You, Me and Dupree' => 2.0,
 - 'The Night Listener'=> 3.0
 - ),
 - 'Jack Matthews' => array(
 - 'Lady in the Water' => 3.0,
 - 'Snake on a Plane' => 4.0,
 - 'Superman Returns' => 5.0,
 - 'You, Me and Dupree' => 3.5,
 - 'The Night Listener'=> 3.0
 - ),
 - 'Toby' => array(
 - 'Snake on a Plane' => 4.5,
 - 'Superman Returns' => 4.0,
 - 'You, Me and Dupree' => 1.0,
 - ),
 - );
 - //欧几里德距离
 - //它以经过人们的一致评价的物品为坐标轴,然后将参与评价的人绘制到图上,并考查他们彼此间的距离远近。
 - //偏好越相似的人,距离越近。不过我们还需要一个函数来对偏好越相近的情况给出越大的值,
 - //为此我们可以将函数值加1(这样可以避免遇到被零整除的错误),并取其倒数
 - //公式是 1 / (1 + sqrt ( pow( data[a][1] - data[b][1] .... ) ))
 - function sim_distance ( $datalist , $person1 , $person2)
 - {
 - $si = array();
 - foreach ( $datalist[$person1] as $moviename => $grade ){
 - if( array_key_exists( $moviename, $datalist[$person2] )){
 - $si[$moviename] = 1;
 - }
 - }
 - if( emptyempty( $si )){
 - return 0;
 - }
 - $powers = 0;
 - foreach ( $si as $moviename=>$val ){
 - $powers += pow( ($datalist[$person1][$moviename] - $datalist[$person2][$moviename] ), 2 );//两者影评分数相减的平方值
 - }
 - return 1 / (1+ sqrt($powers));
 - }
 - //测试 'Lisa Rose' 和 'Gene Seymour' 的相似度评价
 - //原书上求出来是 0.29429805508554946 , PHP 的结果是 0.29429805508555,默认精度没有python高
 - echo( sim_distance( $datalist , 'Lisa Rose' , 'Gene Seymour') );
 - echo( '<br/>' );
 - //皮尔逊相关系数
 - //该相关系统是判断两组数据与某一直线拟合程序的一种度量。对应的公司比欧几里德距离评价的计算公式要复杂
 - //但是它在数据不是很规范时(如影评者对影片的评价总是相对于平均水平偏离很大),会倾向于给出更好的结果
 - //皮尔逊相关度评价法首先会找出两位评论者都曾评过的物品
 - //计算两者的评分总和与平方和,并求得评分的乘积之和,最后,利用这个结果计算出相关系数
 - function sim_person ( $datalist ,$person1 , $person2)
 - {
 - $si = array();
 - foreach ( $datalist[$person1] as $moviename => $grade ){
 - if( array_key_exists( $moviename, $datalist[$person2] )){
 - $si[$moviename] = 1;
 - }
 - }
 - if( emptyempty( $si )){
 - return 1;
 - }
 - $n = count( $si );
 - $sum1 = $sum1Sq = $sum2 = $sum2Sq = $pSum = 0;
 - foreach ( $si as $moviename => $val ){
 - $sum1 += $datalist[$person1][$moviename]; //个人影评分数累加
 - $sum1Sq += pow( $datalist[$person1][$moviename], 2 );//个人影评分数平方的累加
 - $sum2 += $datalist[$person2][$moviename];
 - $sum2Sq += pow( $datalist[$person2][$moviename], 2 );
 - $pSum += ( $datalist[$person1][$moviename] * $datalist[$person2][$moviename]);//两人影评之乘积
 - }
 - $num = $pSum - ( $sum1 * $sum2 / $n); // 正常情况下,我怎么都觉得这是1吧?
 - $den = sqrt( ( $sum1Sq - pow( $sum1, 2 ) / $n) * ( $sum2Sq - pow( $sum2, 2 ) / $n) );
 - if ( $den == 0 ){
 - return 0;
 - }
 - return ($num / $den );
 - }
 - //继续测试 'Lisa Rose' 和 'Gene Seymour' 的相似度评价
 - //原书上求出来是 0.396059017191 , PHP 的结果是 0.39605901719067,这回。。。位数超过了python
 - echo( sim_person( $datalist , 'Lisa Rose' , 'Gene Seymour') );
 - ?>
 
有点长,随便看看吧

