What is Data Overfitting in Machine Learning? 机器学习中的过拟现象


My understating of data overfitting is that:  You have a training set, and you come out with a model, but that model is tuned too much that it only works on a specific dataset like the training set. If you apply the model to other dataset (scenarios), the results are bad.


Data are not perfect. In most cases, the training set contains noise, which needs to be filtered out instead of taken into account in the model. 


I have also written this post:


The Machine Learning Case Study – How to Predict Weight over Height/Gender using Linear Regression? 


Base on the many samples of Weight/Height relations:


Male Weight = -101.24 + 1.061 Height


Female Weight = -110.20 + 1.062 Height


I am 174cm, the weight should be 83.2kg, but I am in fact 80.0kg, so according to this model, I am fit, which is soooo much better than the  BMI.


 大数据这年头很火. 有着大数据 甚至不需要做什么就能发财. 一般来说, 你有了数据 然后就可以通过一些算法进行学习 得到一些模型. 通过这些模型来进行预测. 


 但是很有可能你的数据 (Training Set – 训练集) 是含有一些特殊例子, 或者称为噪声, 我们需要过滤掉这些数据 或者在学习的过程中不考虑它们. 否则得到的模型就会是一个过分拟合的现象. 过拟表现就是对于当前训练集, 你的模型十分的拟合, 但是这个模型却不适合于其它的场景. 



 过分拟合 // 图片来自于网络  // Image Credit: Here


 推荐数据学习的英文: The Machine Learning Case Study – How to Predict Weight over Height/Gender using Linear Regression? 


 这个文章学习了大量的 男性/女性 体重对于身高的关系, 得出了两组模型:  


男性体重 = -101.24 + 1.061 身高


女性体重 = -110.20 + 1.062 身高


 我身高174cm, 所以体重应该是 83.2kg, 我实际体重是 80.0kg, 所以是不胖滴… 这比 BMI 靠谱多了 .  😂 



Thank you for reading my post, feel free to FOLLOW and Upvote @justyy which motivates me to create more quality posts. 


非常感谢阅读, 欢迎FOLLOW和Upvote @justyy 能激励我创作更多更好的内容.


// 根据我的博文 这里这里 整理而成。


 近期热贴 Reent Popular Posts 



  1.  Just throw away the things you don’t need 断舍离

  2. Microsoft Interview Question – Get the Area of the Triangle 微软面试题:三角形的面积是多少?

  3. Poloniex is Not A Wallet! How to Transfer SBD from Poloniex to SteemIt? Poloniex 不是个钱包 – 从Poloniex转出SBD到SteemIt的经历

  4. Team-building Events (Bowling) to celebrate the new release of software 公司组织到剑桥打保龄球 

  5. SteemIt API Tool - Check If Your Followers Have Voted Your Post 撸了一个工具 - 快速检查你的粉丝到底有没有给你点赞!(带 免费API) 

  6. A Quick Tour to British Museum - The British are not returning the china collections to China! 大英博物馆的中国展区就是最好的爱国主义教育基地

  7. #Travel with me - Windsor Castle (Photography) 再访温莎城堡



This page is synchronized from the post: What is Data Overfitting in Machine Learning? 机器学习中的过拟现象

Just throw away the things you don't need 断舍离


My sister told me to throw away the stuffs that I don’t need or I don’t even use. She is right. My room is full of the rubbish that I don’t use at least in the short term. I’ve got many extra USB charging cables for my previous old phones, I ‘ve also got some old mouse or computer keyboards. They have occupied the drawers for a very long time and I don’t even remember when last time I took them out.


Thus, I have driven a car of rubbish to the nearest Recycling Center.  If I ever need it again, I can always order them online. The de-cluttering makes my room much more tidy and I somehow feel more comfortable and less burden. 


If you don’t need it, just put them on EBAY or just take it to the Recycling center, rather than leave it occupying your life.


家里有一堆平常用不上的, 一直没舍得扔, 所以房间越来越小, 总感觉装满了垃圾. 我姐说, 一件东西如果一个月没有用上 那么就可以扔了或者EBAY卖了.  


if (东西当下有用) {

  留着;

} else if (东西一个月内有用) {

  留着;

} else if (东西可以卖个好价钱 && 卖所需要的精力 < 回报) {

  卖了;

} else {

  扔了或者捐了;

}

 不得不说, 今天拉了一车平时用不上的”垃圾” 到最近的回收站, 路上还是有点纠结和心疼的. 毕竟有很多不常用到的 转换头和 各种USB充电线, 还有平时用不太上的USB风扇, 什么MP3之类的. 我姐说, 全扔了. 扔了屋里干净, 心情好, 孩子也心情好可以坐在地上玩. 


 收拾了三大箱孩子的玩具, 有些不怎么玩, 也扔了. 本来是要去花钱买储藏间, 但是朋友却说我有病, 没多少钱的东西却要花钱供着, 况且很多第一年都有折扣, 第二年就会涨价, 那么花了300多英镑供了一年后你还会再花个400英镑去供那些玩具? 


 朋友说的很有道理, 所以果断的扔. 我姐说, 平时周末少出去玩一趟省下来的钱足够买那些USB充电线之类的电子产品, 况且AMAZON如此的方便, 第二天就能送到, 所以用不上的, 扔! 



// the room full of unnecessary stuffs. 杂乱的书房。



Originally Published in Steemit. Thank you for reading my post, feel free to FOLLOW and Upvote @justyy which motivates me to create more quality posts.  


原创首发 SteemIt, 非常感谢阅读, 欢迎FOLLOW和Upvote @justyy 能激励我创作更多更好的内容。 




  近期热贴 Reent Popular Posts 



  1.  Microsoft Interview Question – Get the Area of the Triangle 微软面试题:三角形的面积是多少?

  2. Poloniex is Not A Wallet! How to Transfer SBD from Poloniex to SteemIt? Poloniex 不是个钱包 – 从Poloniex转出SBD到SteemIt的经历

  3. Team-building Events (Bowling) to celebrate the new release of software 公司组织到剑桥打保龄球 

  4. SteemIt API Tool - Check If Your Followers Have Voted Your Post 撸了一个工具 - 快速检查你的粉丝到底有没有给你点赞!(带 免费API) 

  5. A Quick Tour to British Museum - The British are not returning the china collections to China! 大英博物馆的中国展区就是最好的爱国主义教育基地

  6. The best way to travel to London? 怎么样去伦敦游玩更方便省钱?

  7. #Travel with me - Windsor Castle (Photography) 再访温莎城堡



This page is synchronized from the post: Just throw away the things you don’’t need 断舍离

Microsoft Interview Question – Get the Area of the Triangle 微软面试题:三角形的面积是多少?


 The following seems an easy question, to get the area of the rectangluar triangle with the slope equals to 10 and the height equals to 6. 


 据说是一个印度人杀入微软最后的面试, 面试官给了这么一道小学数学几何题: 



 If your answer is 30 (10×6/2), then you are falling into the trap: Such triangle doesn’t exist at all!


 这哥门也有疑问 可是最后还是坚持 答案 30 (底 X 高 / 2)


不存在 It Doest Not Exist


这是个陷井: 这个直角三角形是不存在的.   If we label the triangle with the following, 



 两个小直角三角形的勾股定理:   What we can get are: 





 两者相加:   If we add these two equations: 



 简化一下:    After simplification, we have: 






 最后我们得到:  Thus 


   因为  


 如果   并且    把函数     画出来是这样的 


If     and      the function of     is plotted below.



 最大值是 25 也就是说 c 的最大值是 5 所以这样的三角形是不存在的(斜高是6)


The maximum value for function y is 25, which means that the maximum value of c is 5. 


Such triangle does not exist (the height is 6, which exceeds the maximum length)! 


这个问题的意义在于,在实际工作中,尤其是面对开放性,创造性的工作时,不能仅仅作为一个执行者,要怎么做,我也不知道

 Typical Microsoft. If something is wrong then it is by design…and you are stuck with it. 


Thank you for reading my post, feel free to FOLLOW and Upvote @justyy which motivates me to create more quality posts. 


非常感谢阅读, 欢迎FOLLOW和Upvote @justyy 能激励我创作更多更好的内容。    


根据我的博文这里这里 整理而得.


 近期热贴 Reent Posts 



  1. Poloniex is Not A Wallet! How to Transfer SBD from Poloniex to SteemIt? Poloniex 不是个钱包 – 从Poloniex转出SBD到SteemIt的经历

  2. Team-building Events (Bowling) to celebrate the new release of software 公司组织到剑桥打保龄球 

  3. SteemIt API Tool - Check If Your Followers Have Voted Your Post 撸了一个工具 - 快速检查你的粉丝到底有没有给你点赞!(带 免费API) 

  4. A Quick Tour to British Museum - The British are not returning the china collections to China! 大英博物馆的中国展区就是最好的爱国主义教育基地

  5. The best way to travel to London? 怎么样去伦敦游玩更方便省钱?

  6. #Travel with me - Windsor Castle (Photography) 再访温莎城堡

  7. How to Convert/Transfer Steem or Steem Dollars (SBD) to Bitcoins? 小白教程 – 如何把 SBD或者STEEM转出到比特币钱包?



This page is synchronized from the post: Microsoft Interview Question – Get the Area of the Triangle 微软面试题:三角形的面积是多少?

Poloniex is Not A Wallet! How to Transfer SBD from Poloniex to SteemIt? Poloniex 不是个钱包 – 从Poloniex转出SBD到SteemIt的经历


Honestly speaking, I am not such a big fan of @poloniex because it makes simple things complicated. I transferred 25 SBD from SteemIt to poloniex but it got lost when I wanted to transfer it from poloniex to my bitcoin wallet (I withdraw the SBD incorrectly to a bitcoin wallet, which is my mistake, but it takes more than 30 days before the funds become available in my poloniex account). I also sent 3 tickets but all of them got no response. 


Later, I was transfering the 25 SBD back to my SteemIt account, and it requires the “Manual Approval”, which takes around 10 days until I sent another email to complain…


After all, it is good to know that @poloniex is not a wallet.. I am not acting aggressively but please be aware when you want to use @poloniex


You can always use the @blocktrades  which is far better because it is so simple if you want to transfer SBD to your bit coin wallets. 


How to Convert/Transfer Steem or Steem Dollars (SBD) to Bitcoins? 


老实说 我的 @poloniex 体验并不愉快, 去年注册了 SteemIt 帐号之后就尝试过 Poloniex 试了两三次未果之后就放弃了, 一点也不友好. 


 曾经有一次把25个SBD转出到 poloniex, 在直接转出比特币的钱包的时候 并没有交易 (Exchange), 就直接 Withdraw, 结果25个SBD大概2个月后才自动退回. 发了四个工单 只有最后一个回复了(而且等了10天时间).  


怎么样从POLONIEX转出SBD到STEEMIT?


从Poloniex转出SBD的过程还是相对来说较为直观好理解的. 



 第一步在 Poloniex 打开SBD钱包 点击 Withdraw 提款 



 第二步 写上 SteemIt 的ID和金额 


需要在邮件中点确认链接, 这是为了安全. 



 


点击确认后 poloniex 会发邮件需要 点确认  Requires to click the confirmation link in the email


但是这么一点金额都要人工确认, 实属无奈. 我后来发了一个工单, 10天后才转出. 



 这么一点金额都要 人工审核 无奈了 Requires Manual Approval for this tiny amount of money… can’t believe it.


后来 Poloniex 发了一邮件给我, 意思是说 Poloniex 不是钱包, Poloniex是用来转换不同货币的. 反正我是不怎么用了, 有了这个 blocktrades 方法 根本不需要用其它的方法.   小白教程 – 如何把 SBD或者STEEM转出到比特币钱包?



记住: poloniex 不是个钱包 ….   


Remember,  poloniex Is NOT a wallet!



Originally Published in Steemit. Thank you for reading my post, feel free to FOLLOW and Upvote @justyy which motivates me to create more quality posts. 


原创首发 SteemIt, 非常感谢阅读, 欢迎FOLLOW和Upvote @justyy 能激励我创作更多更好的内容。   


近期热贴 Reent Posts 



  1. Team-building Events (Bowling) to celebrate the new release of software 公司组织到剑桥打保龄球 

  2. Learn PHPUnit by Example 通过例子学写 PHP单元测试来确保API功能正常 

  3. SteemIt API Tool - Check If Your Followers Have Voted Your Post 撸了一个工具 - 快速检查你的粉丝到底有没有给你点赞!(带 免费API) 

  4. A Quick Tour to British Museum - The British are not returning the china collections to China! 大英博物馆的中国展区就是最好的爱国主义教育基地

  5.  The best way to travel to London? 怎么样去伦敦游玩更方便省钱?

  6.  #Travel with me - Windsor Castle (Photography) 再访温莎城堡

  7.  How to Convert/Transfer Steem or Steem Dollars (SBD) to Bitcoins? 小白教程 – 如何把 SBD或者STEEM转出到比特币钱包?





This page is synchronized from the post: Poloniex is Not A Wallet! How to Transfer SBD from Poloniex to SteemIt? Poloniex 不是个钱包 – 从Poloniex转出SBD到SteemIt的经历

Team-building Events (Bowling) to celebrate the new release of software 公司组织到剑桥打保龄球


Every time the company has released a new version of software, we go out for celebration (also known as team-building events).


Yesterday, we went to Cambridge Leisure Park (CB1 7YD) to do the bowling. And here are my suggestions according to my own experience:



  1. The bowling is scheduled at 5:30 and the dinner is at 7:30. I think it is better to dinner first then bowling so people are not having a empty stomach when doing activities.

  2. The company should have paid for the taxi fee instead of us driving and parking on own own.

  3. The team-building event should not be in the evenings when we should spend time with family. It could be scheduled on Friday afternoon so that we have a happier weekend.

  4. The company should pay for extra drinks and desserts.


昨天晚上公司组织到剑桥 Cambridge Leisure Park (坐标: CB1 7YD) 打保龄球 庆祝软件发布新版本. 


 我其实对打保龄球并没有多大的追求, 本身就是一个发泄压力的好运动 只不过扔两球之后就得等下一轮, 整体体验并不是很好, 原因如下:  



  • 安排在吃饭的点(5:30)先打保龄球 再吃饭(7点半). 没吃饭怎么运动

  • 打保龄球其实是件比较好的团队建设沟通(Team Building) 的活动 但是没有HR来分组 还是熟悉的人自己一组, 都很熟悉了也没啥意思, 何况又没有美女.

  • 公司没有安排统一接车, 得自己开车去, 邮费+停车费得自己出, 虽然不多 但是毕竟不爽

  • 公司没有统一在打球的时候给我们买饮料 这钱也得自己出!

  • 到餐厅用餐的时候如果需要点甜点或饭后咔啡也得自己出钱!

  • 时间安排在晚上, 这点很不好因此有些离家远的(开车1个小时的) 都没有来. 晚上时间是和家人在一起的, 为啥不从下午就开始组织活动然后傍晚吃个饭就散了呢?



 保龄球 



 休息一下 点了些吃的喝的 



 Frankie & Benny 餐厅环境


牛排史上最难吃, 虽然说的是 Medium Red 但是还是很硬, 肉感不好. 完全就是 Steak OverCooked! 



 Frankie Benny 的牛排很难吃 很硬 虽然点了是 Medium Red


综上, 差评! 



Thank you for reading my post, feel free to FOLLOW and Upvote @justyy which motivates me to create more quality posts.


非常感谢阅读, 欢迎FOLLOW和Upvote @justyy 能激励我创作更多更好的内容。   



This page is synchronized from the post: Team-building Events (Bowling) to celebrate the new release of software 公司组织到剑桥打保龄球

Learn PHPUnit by Example 通过例子学写 PHP单元测试来确保API功能正常


Yesterday, I published a handy online SteemIt tool to check the followers not voting your post:


 SteemIt API Tool - Check If Your Followers Have Voted Your Post 


And I provide a free API to use. To ensure the API are working correctly (4 API servers), I have written the following unit test by PHPUnit. As you can see, the unit tests are very important and luckily, it is not difficult to write with modern programming language and test frameworks. 


Have you used TDD (Test Driven Development) in your daily work? Writing tests first isn’t a bad thing and I would recommend this good practice.


昨天我们说到可以通过调用这个API来检查你的哪些Steem粉丝没有点赞你的文章:


撸了一个工具 - 快速检查你的粉丝到底有没有给你点赞!(带 免费API)


 那我们怎么确保这个API的功能是正常能用的呢? 万一服务器挂掉了又或者之后更新代码不小心改错了. 这些都是可以通过单元测试来确保功能可以用的并且以前能用的功能和行为并没有发生改变. 


 特别是我提供了四台API服务器: 美国东部, 美国西部, 日本东京和英国伦敦, 那我需要每天定时跑些测试来确保API一切正常. 可以通过 Crontab 每天定时跑, 一旦有错误就发邮件提醒或者记录到事件中. 


 PHP是世界上最好的语言, 通过phpunit 测试API的调用, 首先, 你需要安装 phpunit (官网安装说明), 安装完后可以运行以下命令来确认:  


$ which phpunit

/usr/local/bin/phpunit

 然后我们可以开始写一个简单的 PHP单元测试, 代码如下:  


<?php

use PHPUnit\Framework\TestCase; 



class SteemTestsWhoHasNotVoted extends TestCase {

  public function dataProvider() {

    // list of API servers    

    $servers = [

        [“happyukgo.com”],

        [“uploadbeta.com”],

        [“helloacm.com”],

        [“steakovercooked.com”]

    ];  

    return $servers;

  }  



   /*

    @dataProvider dataProvider


   */    

  public function simpleTest($domain) {

    $data = file_get_contents(“https://$domain/api/steemit/who-has-not-voted/?url=https://steemit.com/steemit/@justyy/steemit-api-tool-check-if-your-followers-have-voted-your-post-api“);

    $result = json_decode($data, true);

    $this->assertEquals(“justyy”, $result[‘id’]);

    $this->assertTrue(is_array($result[‘who-has-not-voted-yet’]));

  }

}

 然后我们保存为 steemit-who-has-not-voted-yet-api-test.php 在同文件夹下执行以下命令:  


$ phpunit steemit-who-has-not-voted-yet-api-test.php

PHPUnit 6.0.6 by Sebastian Bergmann and contributors.

….                                                                4 / 4 (100%)



Time: 13.09 seconds, Memory: 8.00MB

OK (4 tests, 8 assertions)

 PHPUnit 就会加载该PHP文件然后在执行代码中 TestCase的子类(测试类中的公开方法), 其中 dataProvider 定义了API服务器数组这样就不用为每个服务器各写一份代码了. 


 假设我们把测试方法 simpleTest中的 assertEquals 第一个参数改成 justyy1, 再执行一次, 则会报错 (F 表示 Failure 断言出错了, E表示程序方面的错 而点号则是测试通过).  


$ phpunit steemit-who-has-not-voted-yet-api-test.php

PHPUnit 6.0.6 by Sebastian Bergmann and contributors. 

FFFF                                                                4 / 4 (100%)



Time: 7.83 seconds, Memory: 8.00MB

There were 4 failures:



1) SteemTestsWhoHasNotVoted::simpleTest with data set #0 (‘happyukgo.com’)

Failed asserting that two strings are equal.

— Expected

+++ Actual

@@ @@

-‘justyy1’

+’justyy’

 

/var/www/phptests/steemit-who-has-not-voted-yet-api-test.php:19 

2) SteemTestsWhoHasNotVoted::simpleTest with data set #1 (‘uploadbeta.com’)

Failed asserting that two strings are equal.

— Expected

+++ Actual

@@ @@

-‘justyy1’

+’justyy’

/var/www/phptests/steemit-who-has-not-voted-yet-api-test.php:19 

3) SteemTestsWhoHasNotVoted::simpleTest with data set #2 (‘helloacm.com’)

Failed asserting that two strings are equal.

— Expected

+++ Actual

@@ @@

-‘justyy1’

+’justyy’ 

/var/www/phptests/steemit-who-has-not-voted-yet-api-test.php:19

4) SteemTestsWhoHasNotVoted::simpleTest with data set #3 (‘steakovercooked.com’)

Failed asserting that two strings are equal.

— Expected

+++ Actual

@@ @@

-‘justyy1’

+’justyy’

/var/www/phptests/steemit-who-has-not-voted-yet-api-test.php:19 

FAILURES!

Tests: 4, Assertions: 4, Failures: 4.

 是不是很简单? 平时写程序都得要有单元测试, 没有单元测试的项目都不算大项目, 甚至你可以先写测试用例, 定义好接口, 然后写完测试再写实现, 这也就是传说中的 TDD (Test Driven Development). 以下是我的一个个人写的项目所写的单元测试, 每天没事打开SSH跑一跑, 有一种快感. 



 未完待续… 



Originally Published in Steemit. Thank you for reading my post, feel free to FOLLOW and Upvote @justyy which motivates me to create more quality posts. 


原创首发 SteemIt, 非常感谢阅读, 欢迎FOLLOW和Upvote @justyy 能激励我创作更多更好的内容。  



This page is synchronized from the post: Learn PHPUnit by Example 通过例子学写 PHP单元测试来确保API功能正常

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×