# Big Ben

0%
Edit

Angular JS模块化，主要借助于3个方式：directive, component, module

## directive & 3 bindings

### 2-way binding “=”

function MyDirective() {  var ddo = {      scope: {        myProp: '=attributeName'      }      ...  };  return ddo;}
<!-- index.html --><my-directive attribute-name="outerProp"></my-directive><!-- directive.html --><p>{{my-prop}}</p>

### 1-way binding “<” and “@”

function MyDirective() {  var ddo = {      scope: {        prop: '<',      },      ...  };  return ddo;}

function MyDirective() {  var ddo = {  scope: {    prop: '@',  },  ...  };  return ddo;}

<my-directive my-attribute="{{outerProp}}"></my-directive>

## 简化版的directive - component

angular.module('App', []).component('myComponent', {    templateUrl: 'template.html',    controller: CompController,    bindings: {        prop1: '<',        prop2: '@',        onAction: '&'    }});

‘bindings’ object is the isolate scope param mapping definition.

isolate scope里面没有双向绑定了，明确了输入输出：

• ‘<’, ‘@’作为输入
• ‘&’作为输出，意思是通过用’&’注册callback来将输出送到相应的模块

### 使用’&’的基本方法

<my-component    prop1="val-1"    prop2="@parentProp"    on-action="parentFunction(myArg)">    {{ $ctrl.prop1.prop }} and {{$ctrl.prop2 }}</my-component>

template的写法，component的写法如上一节的例子。

<div    ng-click="$ctrl.onAction({myArg: 'val'})"> {{$ctrl.prop1.prop }} and {{ $ctrl.prop2 }}</div> 注意： • onAction在标签里的写法是on-action • myArg在template中是key name。value值可以来自于controller($ctrl)，亦或bindings中的’<’, ‘@’，方法如下，通过inject controller。
angular.module('MenuApp').controller('ItemListController', ItemListController)ItemListController.$inject = ['items']function ItemListController(items, currentItem) { var list = this list.items = items} ### Components Lifecycle •$onInit – controller initialization code
• $onChanges(changeObj) – called whenever one-way bindings are updated • changeObj.currentValue, changeObj.previousValue •$postLink – similar to ‘link’ in directive
• $onDestroy – when scope is about to be destroyed Link for directive • DOM manipulation is usually done inside of the link function • Declared on the DDO • Link function does not support injection • To use injected components, services, inject them into directive • ‘scope’ parameter is the exact$scope in directive’s controller
• ‘element’ object represents the element of the directive
• Top level element
• It’s jqLite object or jQuery object (if jQuery is included)

## module

A module is a collection of services, directives, controllers, filters, and configuration information. angular.module is used to configure the \$injector.

Create the module without dependency.

angular.module('module2', [])

Create with dependency.

angular.module('module3',                ['module1', 'module2']);

Retrieve the module.

angular.module('module1')

ng-app = main module

<!DOCTYPE html><html ng-app='module3'>…</html>

### Config and Run

angular.module('module1').config(function () {    // Inject only Providers and constants    …});
angular.module('module1').run(function () {    // Inject only instances (like services) and constants.    …});
• module.config method fires before module.run method
• All dependency modules get configured first
• It doesn’t matter which modules are listed first as long as module declarations are listed before artifact declarations on that module

## 模块化

Splitting Javascript into Several Files.

Edit

## Overview

I bought an Amazon Echo during my last trip to Boise Idaho in US. It is cool product to respond quickly and works so well in acceptable noisy environment. She is a good buddy for my 5 year-old boy to play with and learn English from.
And I want her do more for more automation stuff at home. All my appliance are traditional ones. I bought them 2 years ago when I moved to the new condo. That’s not a long time passed but I don’t even know how technology changes so fast. Every electrical stuff becomes smart even a socket or a plug. That’s a amazing!
With Alexa, I think I can do more to make my home smarter even with the old style appliances.
The first thing comes into my eys is Broadlink universal remote control. It’s actually a converter for WiFi to IrDa and RF signal. Here is the product link. With the app provided by Broadlink, it can learn any IrDA or RF code of the remote controller and build in software virtual remote controller in the app to control the appliances those it can reach. There are 2 apps in Play Store:

But only using Broadlink remote controller is not perfect. You have to pick up your phone and click several times to open the app and corresponding virtual remote controller. This is not what I want.

To integrate Alexa, there are several solutions from time Googling.

1. The appliance is born to be smart. And the vendor provides Alexa skills to control the appliance. For example, Philips HUE light, Belkin Wemo and etc.
2. Integrate Broadlink into Alexa. There are also many solutions for this, including:
2) Domoticz/Home Assistant + HA-Bridge + Alexa
3) Domoticz + python plugin + controlicz + Alexa
3. Domoticz/Home Assistant + HomeBridge + Siri

Domoticz and Home Assistant are both Home Automation System. There are a lot of people practicing them and post their experience on Internet. (They help me a lot. And this is the time I want to share my experience with others.) And both systems have their own advantages and disadvantages. The reference link 1 is the comparison among 5 popular automation systems.
I choose Domoticz is because it has native support the Synology NAS OS - DSM. Home NAS is in 7*24 hours service. I don’t need to setup extra hardware for the system. With this idea in mind, I practice solution 2)~4) in the item 2. I will consolidate all of them in this post below. The final solution architecture is shown as below:

## Domoticz into Synology

This is quite simple. Please see the official website - Domoticz for Synology NAS. Download Domoticz for Synology DSM 6.1 with Python Plugin Beta. Python plugin is very important, don’t miss it. It will be used in after chapters.
The file you download is a .spk file. This is a installable package of Synology DSM. And you can install it by clicking “Manual Install” in “Package Center”. After success, it will shown as below:

## Other Dependency

Domoticz’s plugin system depends on Python3
HA-Bridge Server depends on Java8
You’d better to have also “git” installed. That will be much easier to clone GitHub projects.
The 4 packages in Synology package center look like below.

This part takes the most issues and I spent several nights fixing the issues. I will have the issue list below.

In many posts, people suggest to use pip to install the library of python-broadlink. But I found an issue of doing it this way. python-broadlink once was built based on pycrypto, which is out of support. And then the author switch to pyaes. But I think he didn’t resolve the dependency between python-broadlink and pycrypto. If you install python-broadlink on Synology using pip, it will run into issues of compiling pycrypto. Apparently, Synology DSM doesn’t have compilation environment (GCC and etc.).
My suggested steps are:

• Make sure your python is linked to Python3. (ex. ln -s /usr/local/bin/python3.5 /usr/bin/python)
• python setup.py install
• python -m pip install pyaes. Because pip3 cannot be installed into Synology DSM, even it’s installed by python3.5 get-pip.py. The script will also install pip2.

#### Install Domoticz Python Plugin

Reference link 3 is a Domoticz wiki page provided by the plugin author. The post is mainly worked out based on Windows system, but referring the others chapters, it’s still feasible. The installation steps are:

• The plugin folder is in/usr/local/domoticz/var/plugins/BroadlinkRM2/. Copy the files to this folderincluding:
• plugin.py -main file
• plugin_send.py
• plugin_http.py
• plugin_http.sh
Be noted: the plugin folder is not /usr/local/domoticz/plugins/BroadlinkRM2/, although there is an example folder in that directory. But when I put plugin files in that directory, the plugin won’t be loaded correctly.
• Stop and Run Domoticz in Package Center of Synology DSM. And do this every time you update the plugin python file.

So far, the installation finished. You should be able to see the Broalink hardware type in the pull-down list in Setup->Hardware.

To create the hardware as below:

#### Import Devices from E-Control App导入易控(e-control)设备

Refer to reference link 3 for detailed description. It’s in User Guide->Inside Domoticz->xxx - import chapter. Author may not work with multi-byte language, such as Chinese. If there are Chinese characters in the imported json* files, the plugin will crash. See issue 3 for solution.
Another point should be noted, when WebStart is triggered, Domoticz will setup a tiny web service running on port 9000 to handle the files uploaded from mobile devices. Make sure you configure the Synology firewall correctly.

### Controlicz

This seems to be a new service. I don’t see people mention it when I’m reading through Google results. I happened to see it in Alexa skill page. Using Controlicz, you don’t need to simulate your appliances as Philips HUI light using HA-Bridge. But it has its own pros and cons with Broadlink remote controller. There is a dedicated section comparing Controlicz and HA-Bridge.
To use Controlicz:

• Enable skill and say “discover device” to Alexa.
• All done

• Domoticz service should be exposed to external access. That means you need to map port and DDNS. Because I’m using Synology, that’s not a problem to me.
• Controlicz asks Domoticz running on HTTPS protocol. That means you have to enable the Domoticz HTTPS port and corresponding Firewall configurations.

### Issue List

So far, if you are lucky enough to succeed in all above steps, you can say “Turn on the TV” to Alexa. The TV should respond to you quickly and correctly. If unfortunately it doesn’t, don’t worry. Check the issue list to see if it helps.

You have installed python-broadlink already, and you can import it from Python REPL command line. Why is the error still reported? The direct cause is the PYTHONPATH is incorrect. But when I check the code in plugin.py, I see the process of PYTHONPATH as here:

if sys.platform.startswith('linux'):    # linux specific code here    # doesn't work even if set dist-packages => site-packages    sys.path.append(os.path.dirname(os.__file__) + '/dist-packages')elif sys.platform.startswith('darwin'):    # mac    sys.path.append(os.path.dirname(os.__file__) + '/site-packages')elif sys.platform.startswith('win32'):    #  win specific    sys.path.append(os.path.dirname(os.__file__) + '\site-packages')

The path to broadlink library is /usr/local/lib/python3.5/site-packages/broadlink-0.5-py3.5.egg/broadlink/. Even when I correct the “dist-package” to “site-packages” in above code snippet, it still cannot work. However, the solution is quite simple. Just copy broadlink to the plugins folder.

There are 2 GitHub issue link for reference:

# broadlink/__init__.py160   def encrypt_pyaes(self, payload):161     aes = pyaes.AESModeOfOperationCBC(self.key, iv = bytes(self.iv))162     return "".join([aes.encrypt(bytes(payload[i:i+16])) for i in range(0, len(payload), 16)]) # <==== Error is here

bytes in python3 returns class bytes, whereas in python2, it returns plain string. See here for experiments:

Python2>>> type(bytes([1,2,3]))<type 'str'>Python3>>> type(bytes([1,2,3]))<class 'bytes'>>>> "".join([bytes([1,2]),bytes([2,3])])  <==== You cannot cat 2 bytes class objects as stringTraceback (most recent call last):  File "<stdin>", line 1, in <module>TypeError: sequence item 0: expected str instance, bytes found>>> b''.join([bytes([1,2]),bytes([2,3])]) <==== This is the good wayb'\x01\x02\x02\x03'

The correct way to concatenating is to use binary string。

#### Chinese characters in Domoticz imported files

In Python3, if a file has Chinese character, you cannot read files and decode strings as below:

with open('jsonFile') as f:    textStr = f.read() <=== Python3 will complain about the codec error here.    textStr.decode('utf8')

The correct way is:

with open('jsonFile', encoding='utf8'):    textStr = f.read() <=== Everything here has already been unicode.

Update all lines of opening the imported json* files. And one more thing you need to pay attention to, which is for ConfigParser. You can update the encoding as below for them.

config = configparser.ConfigParser()config.read(path, encoding='utf8')

Try it, it should do with no problems, aren’t you?

With HA-Bridge, you don’t need Domoticz Python plugin. You can create dummy hardware in Domoticz web page as below picture.

#### Install HA-Bridge

The whole idea of HA-Bridge is to simulate any device as Philips HUE light, which has native support in Alexa. And HA-Bridge can import all Domoticz devices easily. The cons is it can only support open/close and dim settings like you are operating a HUE light.
HA-Bridge is a jar package works with Java 8. And we need to run it in Synology background as a service. Refer to the post here -Run as a service on Synology for running it as a service.
The command line is nohup java -jar -Dserver.port=8085 /ha-bridge-4.5.6.jar &. Put it in Synology Scheduled Tasks, and then you don’t need to worry about the reboot.

Clone the project from GitHub - broadlink-http-rest. Just as HA-Bridge, it also needs to be in background as service. Otherwise, it will quite as soon as SSH connection is cut.The command line is nohup python broadlink-http-rest/server.py &. Be noted, make sure you are using “nohup”, otherwise it will still quite as SSH is disconnected even if you specify the “&”

Configure the firewall:

• 8085: HA-Bridge server
• 8443: Domoticz HTTPS service

The steps are:

• In Setup->Hardware, create a Dummy Hardware
• In the same page and in the created item, click “Create Virtual Sensors”
• In Switches tab, find the virtual sensor you just created, click Edit and configure “On Action” and “Off Action”.

#### Domoticz into HA-Bridge

Please refer to reference link 2. From the chapter of Configuring HA bridge, it starts to describe how to configure HA-Bridge to import devices from Domoticz web service.

#### HA-Bridge vs. Controlicz

##### General Smart Home

Controlicz can integrate all type of devices into Alexa. For example, if Alexa doesn’t have native support of this smart home device, you can build up a Domoticz server in your home and integrate it into Alexa by Controlicz. According to the description in Controlicz official site, it doesn’t only support on/off switches, but also other normal appliance like TV, AC and etc. And Domoticz apparently support a lot more devices than Alexa does.

Broadlink is a universal remote controller. So every devices on Broadlink is a push button device. Each one only has one action, which is being triggered. This is even weaker than the on/off switch, which at lease has 2 actions - on or off.
For example, there is a button on Fan’s remote controller. Pushing it once, it will turn on the fan and twice will turn off it. Through Controlicz, you have to say “Turn on the Fan” to Alexa to turn on the fan and you have to say the same sentence to close it. Is it weird?
If you are using HA-Bridge, you can say “Turn on …” to turn on and “Turn off …” to turn off. Is it better?
The biggest benefit of using Controlicz to integrate Domoticz is it can import all devices pre-defined in E-Control app. It can fully make use of the E-Control’s GUI. You don’t need to add virtual sensors one by one manually.

After bunch of work above, I happened to see there is a native support broadlink Alexa skill called “broadlink”. It should work with the latest version of IHC app. The bad news is both the skill and the app have very low review scores.

• IHC playstore 2.2 stars

According to the official examples, it can only support TV. And as my own experiments, when I added a TV and an AC, Alexa can only discover the TV and no AC.

Edit

## Overview

1. 本身是智能化家居，并且厂家有对应的Skill，enable该Skill即可，好处是各种原生的设置，例如:空调设几度，灯开什么颜色
2) Domoticz/Home Assistant + HA-Bridge + Alexa
3) Domoticz + python plugin + controlicz + Alexa
3. Domoticz/Home Assistant + HomeBridge + Siri

## 其他依赖包

Domoticz的python插件系统依赖于Python3
HA-Bridge Server依赖于 Java8

• 确保使用python3
• 切到python-broadlink目录，python setup.py install
• python -m pip install pyaes因为Synology上无法安装pip3，这是变通的方法。使用python get-pip.py安装的仍然是pip2

#### 安装Domoticz Python Plugin

• 下载plugin files
• Domoticz的插件目录在/usr/local/domoticz/var/plugins/BroadlinkRM2/，拷贝插件文件到该目录，包括：
• plugin.py ——插件的主文件
• plugin_send.py
• plugin_http.py
• plugin_http.sh
这里需要注意插件目录不是/usr/local/domoticz/plugins/BroadlinkRM2/，虽然这个目录底下有example目录还有其他插件的目录，但当我把插件文件放在该目录下时，会出现插件未加载的情况。
• 在Synology DSM的Package Center中重启Domoticz。注意后面每次修改plugin都需要重启Domoticz。

### Controlicz

• 使用Alexa的账号来注册Controlicz
• Alexa app上enable skill就好了
• 对着Alexa说discover device

注意：Controlcz中要求：

• Domoticz服务必须外网可以访问，所以要配置家里的端口映射，以及DDNS，因为使用的是Synology，这个我早就已经有了
• Controlicz要求Domoticz服务运行于HTTPS协议，所以必须使能8443端口，注意NAS的Firewall配置

### 踩坑记录

if sys.platform.startswith('linux'):    # linux specific code here    # doesn't work even if set dist-packages => site-packages    sys.path.append(os.path.dirname(os.__file__) + '/dist-packages')elif sys.platform.startswith('darwin'):    # mac    sys.path.append(os.path.dirname(os.__file__) + '/site-packages')elif sys.platform.startswith('win32'):    #  win specific    sys.path.append(os.path.dirname(os.__file__) + '\site-packages')

# broadlink/__init__.py160   def encrypt_pyaes(self, payload):161     aes = pyaes.AESModeOfOperationCBC(self.key, iv = bytes(self.iv))162     return "".join([aes.encrypt(bytes(payload[i:i+16])) for i in range(0, len(payload), 16)]) # <==== Error is here

python3中bytes返回class bytes，而python2中返回字符串，如下：

Python2>>> type(bytes([1,2,3]))<type 'str'>Python3>>> type(bytes([1,2,3]))<class 'bytes'>>>> "".join([bytes([1,2]),bytes([2,3])])  <==== You cannot cat 2 bytes class objects as stringTraceback (most recent call last):  File "<stdin>", line 1, in <module>TypeError: sequence item 0: expected str instance, bytes found>>> b''.join([bytes([1,2]),bytes([2,3])]) <==== This is the good wayb'\x01\x02\x02\x03'

#### Domoticz import文件中有中文的问题

with open('jsonFile') as f:    textStr = f.read() <=== Python3 will complain about the codec error here.    textStr.decode('utf8')

with open('jsonFile', encoding='utf8'):    textStr = f.read() <=== Everything here has already been unicode.

config = configparser.ConfigParser()config.read(path, encoding='utf8')

#### 安装HA-Bridge

HA-Bridge是将Alexa不支持的设备模拟成Alexa原生支持的Philips HUE Light。而HA-Bridge可以直接导入Domoticz中的所有设备。缺点是，他只能支持开关和调亮度的操作。
HA-Bridge是一个java的jar包，我们要将之作为service运行在Synology后台，方法参看Run as a service on Synology

#### 计划任务&防火墙

• 8085: HA-Bridge server
• 8443: Domoticz HTTPS service

#### HA-Bridge vs. Controlicz

##### General Smart Home

Controlicz可以整合Domoticz所有设备到Alexa中，譬如，如果Alexa并不原生支持该smart home设备，则可以先在家里搭建Domoticz服务器，然后通过Controlicz将之整合到Alexa中。根据Controlicz官网support列表，其并不只局限于开关设备：

• IHC playstore 2.2分
从官网的example看目前只支持TV。经我实践确实如此，添加了电视和空调，Alexa却只能discover到电视，没有空调。

Edit

## Overview

### SyntaxNet

#### 安装

echo 'Bob brought the pizza to Alice.' | syntaxnet/demo.sh

Input: Bob brought the pizza to Alice .Parse:brought VBD ROOT +-- Bob NNP nsubj +-- pizza NN dobj |   +-- the DT det +-- to IN prep |   +-- Alice NNP pobj +-- . . punct

SyntaxNet自带的pre-trained English parser叫Parsey McParseface。我们可以用这个parser来分析语句。根据How to Install and Use SyntaxNet and Parsey McParseface中所述，Parsey McParseface输出实为CoNLL table。这个table的格式在models/syntaxnet/syntaxnet/text_formats.cc，如下：

 50 // CoNLL document format reader for dependency annotated corpora. 51 // The expected format is described e.g. at http://ilk.uvt.nl/conll/#dataformat 52 // 53 // Data should adhere to the following rules: 54 //   - Data files contain sentences separated by a blank line. 55 //   - A sentence consists of one or tokens, each one starting on a new line. 56 //   - A token consists of ten fields described in the table below. 57 //   - Fields are separated by a single tab character. 58 //   - All data files will contains these ten fields, although only the ID 59 //     column is required to contain non-dummy (i.e. non-underscore) values. 60 // Data files should be UTF-8 encoded (Unicode). 61 // 62 // Fields: 63 // 1  ID:      Token counter, starting at 1 for each new sentence and increasing 64 //             by 1 for every new token. 65 // 2  FORM:    Word form or punctuation symbol. 66 // 3  LEMMA:   Lemma or stem. 67 // 4  CPOSTAG: Coarse-grained part-of-speech tag or category. 68 // 5  POSTAG:  Fine-grained part-of-speech tag. Note that the same POS tag 69 //             cannot appear with multiple coarse-grained POS tags. 70 // 6  FEATS:   Unordered set of syntactic and/or morphological features. 71 // 7  HEAD:    Head of the current token, which is either a value of ID or '0'. 72 // 8  DEPREL:  Dependency relation to the HEAD. 73 // 9  PHEAD:   Projective head of current token. 74 // 10 PDEPREL: Dependency relation to the PHEAD.

INFO:tensorflow:Processed 1 documents1       What    _       PRON    WP      _       0       ROOT    _       _2       is      _       VERB    VBZ     _       1       cop     _       _3       a       _       DET     DT      _       5       det     _       _4       control _       NOUN    NN      _       5       nn      _       _5       panel   _       NOUN    NN      _       1       nsubj   _       _

CoNLL table中的所有tag缩写的含义在这里Universal Dependency Relations

### NLTK

NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries, and an active discussion forum.

#### Stemming vs. Lemmatization

Lemmatisation is closely related to stemming. The difference is that a stemmer operates on a single word without knowledge of the context, and therefore cannot discriminate between words which have different meanings depending on part of speech. However, stemmers are typically easier to implement and run faster, and the reduced accuracy may not matter for some applications.

In computational linguistics, lemmatisation is the algorithmic process of determining the lemma for a given word. Since the process may involve complex tasks such as understanding context and determining the part of speech of a word in a sentence (requiring, for example, knowledge of the grammar of a language) it can be a hard task to implement a lemmatiser for a new language.

NLTK支持多种Stemmer，包括但不限于 Porter stemmer, Lancaster Stemmer, Snowball Stemmer。

>>> from nltk.stem import SnowballStemmer>>> snowball_stemmer = SnowballStemmer(“english”)>>> snowball_stemmer.stem(‘maximum’)u’maximum’>>> snowball_stemmer.stem(‘presumably’)u’presum’>>> snowball_stemmer.stem(‘multiply’)u’multipli’

NLTK中的Lemarize:

>>> from nltk.stem import WordNetLemmatizer>>> wordnet_lemmatizer = WordNetLemmatizer()>>> wordnet_lemmatizer.lemmatize(‘dogs’)u’dog’>>> wordnet_lemmatizer.lemmatize(‘churches’)u’church’>>> wordnet_lemmatizer.lemmatize(‘is’, pos=’v’)u’be’>>> wordnet_lemmatizer.lemmatize(‘are’, pos=’v’)u’be’>>>
• pos = Part Of Speech

Edit

## 在GitHub上搭建Hexo博客

2. 用hexo工具来初始化这个repo
3. blog所有的配置信息都在_config.yml文件中，包括deploy的信息
deploy:  type: git  repo: https://github.com/zhougy0717/zhougy0717.github.io.git  branch: master
1. 所有的post在source/_posts

Hexo的逻辑是在_posts下用Markdown写博客，然后调用hexo g对Markdown post进行渲染生成html，最后调用hexo s将生成的网页版post发布到GitHub上去，从而可以用.github.io的域名访问到该博客。

## Everblog

1. npm install everblog -g
2. 在主目录下添加.everblogrc，包含字段：
• token
• noteStoreUrl
• notebook
3. 在blog根目录下添加index.js
module.exports = require('everblog-adaptor-hexo-html')
1. 在blog根目录下运行everblog build
2. hexo s测试，hexo d部署

### 踩坑记录

#### 引用文字中对齐问题

Hexo theme的问题。试了几个them，最终用的是freemind没有这个问题。

Edit

m是sample数量，当m很大的时候，例如1,000,000，我们需要将所有的sample全部投入运算，这是很惊人的运算量。

1. Randomly shuffle dataset
2. Repeat
• for i = 1, …, m
• for j = 0, …, n

Batch gradient descent: Use all m examples in each iteration
Stochastic gradient descent: Use1 example in each iteration
Mini-batch gradient descent: Use b examples in each iteration

Say b = 10, m = 1000
Repeat{

• for i = 1, 11, 21, 31,…,991{
• for j = 0,…, n
• }
• }

Learning rate is typically held constant. Can slowly decrease over timee if we want to converge.

Edit

## Clustering - K-means Algorithm

### 步骤

1. Randomly initialize K cluster centroids .
1) Should have
2) Randomly pick K training examples
3) Set euqal to these K examples.
2. Repeat
for i=1 to m
:= index (from 1 to K) of cluster centroid closest to x^{(i)}
归到离其最近的centroid
for k = 1 to K
:= average (mean) of points assigned to cluster k.
按照归到该centroid的所有sample来相应地移动该centroid

### 选择cluster number

Elvow method (肘关节法)

## Dimensionality Reduction - Principal Component Analysis (PCA)

Dimensionality Reduction = 降维, 3D->2D

### 步骤

1. Feature scaling/mean normalization.
1)
2)
3) (optional) 是标准差
2. Compute “covariance matrix”
3. 调用系统库，Compute “eigenvectors” of matrix : [U,S,V] = svd(Sigma);
4. Ureduce = U(:,1:k); z = Ureduce'*x;

### Reconstruct

x = Ureduce * z

### Choose k

Typically, choose k to be smallest value so that,

0.01 means 99% variance is retained. 保留了99%的变化。

1. Start from k=1
2. [U,S,V] = svd(Sigma)
3. Pick the smallest k.

### Application of PCA

• Compression: 节省数据空间，提升计算速度
• Visualization: k=2 or k = 3

Before implementing PCA, first try running whatever you want to do with the original/raw data . Only if that doesn’t do what you want, then implement PCA and consider using

## Anomaly detection algorithm

1. Choose features that you think might be indicative of anomalous examples.
2. Fit parameters
3. Given new example , compute - 概率
4. Anomaly if

### 结果评估

Aircraft engines motivating example
10000 good(normal) engines
20 flawed engines (anomalous)

Training set: 6000 good engines
CV: 2000 good engines(y=0), 10 anomalous(y=1)
Test: 2000 good engines(y=0), 10 anomalous(y=1)

### 特征量的选取

Andrew在视频中用Octave命令行做了live demo：

### Multivariate Gaussian distribution

Andrew给出了矢量化的Multivariate Gaussian的概率密度公式：

## Recommender Systems

### Content-based recommender systems

• : 用户数
• : 电影数
• : 用户j给电影i评分了
• : 用户的真实评分
• : hypothesis，推测评分

for k = 0:

for k 0:

### Collaborative filtering

1. Initialize to small random values.

1. For a user with parameters and a movie with (learned) features , predict a star rating of

• 不用标准差除，因为评分本来就在近似的range

• 1

Collaborative filtering矢量写法是：

Edit

SVM中文译名：支持向量机

## 工作原理

### Hypothesis

Logistic Regression的Hypothesis为

SVM Decision Boundary

Gaussian Kernel

### Steps

1. Given
2. Choose
3. x->f
4. Predict’“y=1”’if

Note: Do perform feature scaling before using the Gaussian kernel.

Multiclass classification:
Use one vs. all method. (Train K SVMs, one to distinguish y= i from the rest, for i = 1, 2,…,K), get

### Parameters

C =
- Large C: small Lower’bias,’high’variance.
- Small C: big Higher’bias,’low’variance.

• Large : Features vary more smoothly. Higher bias, lower variance.

• Small : Features vary less smoothly. Lower bias, higher variance.

Andrew提到还有很多其他的Kernel，但是用处不是很多，包括：

• Polynomial kernel
• String’kernel
• chiIsquare’kernel
• histogram intersection kernel,

## Logistic regression vs. SVMs

n = number of features (), m = number of training examples

• If n is large (relative to m): Use logistic regression, or SVM without a kernel (“linear kernel”).
• If n is small, m is intermediate: Use SVM with Gaussian kernel.
• If n is small, m is large: Create/add more features, then use logistic regression or SVM without a kernel.
• Neural network likely to work well for most of these settings, but may be slower to train.

LIBSVM

Edit

## Model

: activation of unit i in layer j
: matrix of weights controlling function mapping from layer j to layer j+1

The sigmoid function is

Note：

• 的尺寸为，取决于上一层和这一层的node数量。其中的+1，是bias节点，即常量(+1)节点，而不算在节点数中
• 矩阵的数量取决于网络层数，即3层网络，只有2个矩阵

## Cost Function

• 公式的前半部分是原版的hypothesis，并对所有的分类进行累加
• 后半部分是正规化参数，是对所有的参数进行平方累加。一共有L-1层神经网络，每一层有

### Backpropagation Algorithm

Given training set
Set for all (l,i,j)。 size() = ，与
For i =1 to m
1. Set , both a and x are vectors
2. Perform forward propagation to compute for l=2,3,…,L. L=神经网络层数
——注意加入bias项

——注意加入bias项

3. Using , compute . y是对输出的量测，是对输出的模型估计，算是模型偏差。
4. Using backpropagation to compute ，输入层没有，即
——此处应当有误，具体参考相关笔记

5. Compute

Vectorized equation: ，其中j是每一层网络中的节点编号

6. Final step, compute the partial derivative of

### 计算

: an error item that measures how much the node was responsible for any errors in our output.

g=sigmoid function.

## 算法调优

• Overfitting (过拟合 / High Variance)
• Underfitting (欠拟合 / High Bias)
针对这两个问题，我们要做的是：
1. 识别他们
2. 采取相应措施

### 如何产生正确的模型

• Training set: 60%
• Cross validation set: 20%
• Test set: 20%
用training set来挑选，用Cross validation set来选择多项式幂次，最后用test set error 来评估算法的优劣。

#### Regularization -

1. Create a list of lambdas (i.e. λ∈{0,0.01,0.02,0.04,…10.24});
2. 计算不包含的train error和cross validation error
3. 画出下图
4. 和选择d类似，选取合适的

High Bias:

High Variance:

### What to try next?

• Getting more training examples: Fixes high variance
• Trying smaller sets of features: Fixes high variance
• Adding features: Fixes high bias
• Adding polynomial features: Fixes high bias
• Decreasing λ: Fixes high bias
• Increasing λ: Fixes high variance.

1. 引入很多变量，参数 => high variance, low bias
2. 提供大量training samples => low variance

### Error Analysis

• Start with a simple algorithm, implement it quickly, and test it early on your cross validation data.
• Plot learning curves to decide if more data, more features, etc. are likely to help.
• Manually examine the errors on examples in the cross validation set and try to spot a trend where most of the errors were made.

### 特例skewed data

-Score (F Score) =

Edit

## Linear Regression

Linear Regression = 线性回归。

Single Feature Hypothesis:

Cost Function:

### Cost Function & Gradient Descent

Multiple Feature Hypothesis:

Multiple Feature Cost Function:

• j := 0…m
• - Learning rate.

### How to choose learning rate - ?

• If is too small: slow convergence
• If is too large: J() may not decrease on every iteration; may not converge.

To choose , try:
…, 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1, …

### Feature Scaling

We can speed up gradient descent by having each of our input values in roughly the same range. This is because θ will descend quickly on small ranges and slowly on large ranges, and so will oscillate inefficiently down to the optimum when the variables are very uneven.

• is the average of all the values for feature (i)
• is the range of values (max - min)
• or is the standard deviation.

* standard deviation (标准差)=

### Normal Equation正规解

• 注1：training set的数量要大于feature数量，否则会不可逆，导致没有解
• 注2：正规解不需要feature scaling

## Logistic Regression

Hypothesis:

Sigmoid Function / Logistic Function

The following image shows us what the sigmoid function looks like:

y=1的条件下，x,的取值概率。

### Cost Function

Vectorized implementation:

## Overfitting过拟合

### Regularized Linear Regression

The λ, or lambda, is the regularization parameter. It determines how much the costs of our theta parameters are inflated.

### Regularized Logistic Regression

The second sum, means to explicitly exclude the bias term

!!注意：这里的regularized项，不包含，如果在Matlab/Octave中就是