Monday, August 12, 2013

What is the Kernel trick?

Kernel trick arises from speeding up the SVM learning. In the dual version of the optimization function for SVM:

Saturday, July 6, 2013

偷得浮生半日闲

竟然有时间读了下 光荣与梦想, 不错。讲述美国从上世纪三四十年代开始的逆袭故事。

Wednesday, March 13, 2013

Basic Plotting One -- y = sqrt(x) * 10

After grading the homework, I realized that a curve on the scores is a must...

The curve function is designed to be y = sqrt(x) * 10, 0 <= x <= 100. What does it look like? 

1) Google is always the best friend: Type y = sqrt(x) * 10 in the search box, you will see the plot.

2) An alternative is to use R plot function and save the plot in eps:

curve_func <- function(x) { return 10 * sqrt(x) }
setEPS()
postscript('curve.eps')
plot(curve_func, 0, 100, main = "curve_func(x) = 10 * sqrt(x)", xlab = 'original score', ylab = 'curved score')
dev.off()





3) Another way of doing it in Python:
import matplotlib.pyplot as plt
x = [0.01 * z for z in range(10000)]
y = [0.1 * sqrt(z) for z in range(10000)]
plt.plot(x, y)
plt.show()

 

Tuesday, January 18, 2011

python learning map /reduce

a = [1, 2, 3]
b = [4, 5, 6, 7]
c = [8, 9, 1, 2, 3]
L = map(lambda x:len(x), [a, b, c])

# L == [3, 4, 5]
N = reduce(lambda x, y: x+y, L)
# N == 12
# Or, if we want to be fancy and do it in one line
N = reduce(lambda x, y: x+y, map(lambda x:len(x), [a, b, c]))


I am going to implement a generic MapReduce framework using Python multiprocess module. This module will exploit the multi-core environment better.
For data exchanges, I will use the tmpfiles.

Saturday, December 25, 2010

Christmas, new hope!

It is Christmas now.
I want to form some new habits in this coming year.

1. sleep early ( set auto-shutdown in both windows and Linux), read books after the computer is closed
2. research and paper reading. form a plan
3. keep positive
4. the accumulation of the confidence and courage

Wednesday, November 24, 2010

cut and gawk

cut is one of the most useful commands for text processing.

cut -d. -f1 file # print out the first column of the file

sed:

Friday, October 29, 2010

grep inside vim

The vim now has built-in grep command. To see all the results, just type :copen

Sunday, October 10, 2010

guess the awk(gawk) command

gawk '{ if ( $1 ~ /start/) { print "process start!" ; for ( i in freq) {print i, freq[i] } delete freq } else freq[$1]++ }' tmp.out > new.out

What did it do?

Wednesday, September 29, 2010

i am not doing the right thing

What is the right thing to do?

Sunday, September 26, 2010

emacs programming

How to insert/delete comment?

Select a block of text and press 【Alt+;】 to make the region into a comment or uncomment.

Monday, August 9, 2010

emacs programming

C-M-h : select the whole function

M C-\ indent region between cursor and mark
M-m move to first (non-space) char in this line
M-^ attach this line to previous
M-; formatize and indent comment
C, C++ and Java Modes
M-a beginning of statement
M-e end of statement
M C-a beginning of function
M C-e end of function
C-c RETURN Set cursor to beginning of function and mark at the end
C-c C-q indent the whole function according to indention style
C-c C-a toggle modus in which after electric signs (like {}:';./*) emacs does the indention
C-c C-d toggle auto hungry mode in which emacs deletes groups of spaces with one del-press
C-c C-u go to beginning of this preprocessor statement
C-c C-c comment out marked area
More general (I guess)
M-x outline-minor-mode collapses function definitions in a file to a mere {...}
M-x show-subtree If you are in one of the collapsed functions, this un-collapses it
In order to achive some of the feats coming up now you have to run etags *.c *.h *.cpp (or what ever ending you source files have) in the source directory
M-. (Thats Meta dot) If you are in a function call, this will take you to it's definition
M-x tags-search ENTER Searches through all you etaged
M-, (Meta comma) jumps to the next occurence for tags-search
M-x tags-query-replace yum. This lets you replace some text in all the tagged files


C-M-n
Move forward over a parenthetical group (forward-list).
C-M-p
Move backward over a parenthetical group (backward-list).
C-M-u
Move up in parenthesis structure (backward-up-list).
C-M-d
Move down in parenthesis structure (down-list).

Monday, August 2, 2010

regular expression

Regular Expression Class Type Meaning
_
. all Character Set A single character (except newline)
^ all Anchor Beginning of line
$ all Anchor End of line
[...] all Character Set Range of characters
* all Modifier zero or more duplicates
\< Basic Anchor Beginning of word
\> Basic Anchor End of word
\(..\) Basic Backreference Remembers pattern
\1..\9 Basic Reference Recalls pattern
_+ Extended Modifier One or more duplicates
? Extended Modifier Zero or one duplicate
\{M,N\} Extended Modifier M to N Duplicates
(...|...) Extended Anchor Shows alteration
_
\(...\|...\) EMACS Anchor Shows alteration
\w EMACS Character set Matches a letter in a word
\W EMACS Character set Opposite of \w

POSIX character sets


POSIX added newer and more portable ways to search for character sets. Instead of using [a-zA-Z] you can replace 'a-zA-Z' with [:alpha:], or to be more complete. replace [a-zA-Z] with [[:alpha:]]. The advantage is that this will match internetional character sets. You can mix the old style and new POSIX styles, such as
grep '[1-9[:alpha:]]'
Here is the fill list

Character Group Meaning
[:alnum:] Alphanumeric
[:cntrl:] Control Character
[:lower:] Lower case character
[:space:] Whitespace
[:alpha:] Alphabetic
[:digit:] Digit
[:print:] Printable character
[:upper:] Upper Case Character
[:blank:] whitespace, tabe, etc.
[:graph:] Printable and visible characters
[:punct:] Puctuation
[:xdigit:] Extended Digit
Note that some people use [[:alpha:]] as a notation, but the outer '[...]' specifies a character set.

Saturday, July 31, 2010

M-C-\ indent region
C-s C-w search word under cursor

C-M-@
Set mark after end of following balanced expression (mark-sexp). This does not move point.

C-M-h c-mark-function

Wednesday, July 28, 2010

grep excludes files, directories

grep -Ir --exclude="*\.svn*" "pattern" *


note that the grep path is the full path, not just the file names!

grep regular exp.

Special Characters

Here, we outline the special characters for grep. Note that in egrep (which uses extended regular expressions), which actually are no more functional than standard regular expressions if you use GNU grep ) , the list of special characters increases ( | in grep is the same as \| egrep and vice versa, there are also other differences. Check the man page for details ) The following characters are considered special and need to be "escaped":
?  \  .  [  ]  ^  $
Note that a $ sign loses its meaning if characters follow it (I think) and the carat ^ loses its meaning if other characters precede it.
Square brackets behave a little differently. The rules for square brackets go as follows:
  • A closing square bracket loses its special meaning if placed first in a list. for example []12] matches ] , 1, or 2.
  • A dash - loses it's usual meaning inside lists if it is placed last.
  • A carat ^ loses it's special meaning if it is not placed first
  • Most special characters lose their meaning inside square brackets
  • * if at the beginning of the regular exps, lose its meaning.

A regular expression may be followed by one of several repetition operators:
? The preceding item is optional and matched at most once.
* The preceding item will be matched zero or more times.
+ The preceding item will be matched one or more times.
{n} The preceding item is matched exactly n times.
{n,} The preceding item is matched n or more times.
{n,m} The preceding item is matched at least n times, but not more than m times.

In basic regular expressions the metacharacters ?, +, {, |, (, and ) lose their special meaning; instead use the backslashed versions
\?, \+, \{, \|, \(, and \).

Monday, July 19, 2010

sed summary (cont)

Some basic POSIX groups:

\d = [[:digit:]], \D = [^[:digit:]].
\s = [[:whitespace:]], including space, tab .. ; \S = ?
\w = [[:alnum:]], including 0-9,a-z, A-Z; \W = ?

executing multiple commands with sed -e command

One method of combining multiple commands is to use a -e before each command:

sed -e 's/a/A/' -e 's/b/B/'
new

A "-e" isn't needed in the earlier examples because sed knows that there must always be one command. If you give sed one argument, it must be a command, and sed will edit the data read from standard input.

Reversing the restriction with !
Sometimes you need to perform an action on every line except those that match a regular expression, or those outside of a range of addresses. The "!" character, which often means not in Unix utilities, inverts the address restriction. You remember that


sed -n '/match/ p'acts like the grep command. The "-v" option to grep prints all lines that don't contain the pattern. Sed can do this with sed -n '/match/ !p'


Ranges by Line Number

You can specify a range on line numbers by inserting a comma between the numbers. To restrict a substitution to the first 100 lines, you can use:

sed '1,100 s/A/a/'

If you know exactly how many lines are in a file, you can explicitly state that number to perform the substitution on the rest of the file. In this case, assume you used wc to find out there are 532 lines in the file:

sed '101,532 s/A/a/'

An easier way is to use the special character "$," which means the last line in the file.

sed '101,$ s/A/a/'

The "$" is one of those conventions that mean "last" in utilities like cat -e, vi, and ed. "cat -e" Line numbers are cumulative if several files are edited. That is,

sed '200,300 s/A/a/' f1 f2 f3 >new

is the same as

cat f1 f2 f3 | sed '200,300 s/A/a/' >new


Transform with Y

If you wanted to change a word from lower case to upper case, you could write 26 character substitutions, converting "a" to "A," etc. Sed has a command that operates like the tr program. It is called the "y" command. For instance, to change the letters "a" through "f" into their upper case form, use:

sed 'y/abcdef/ABCDEF/' file

I could have used an example that converted all 26 letters into upper case, and while this column covers a broad range of topics, the "column" prefers a narrower format.

If you wanted to convert a line that contained a hexadecimal number (e.g. 0x1aff) to upper case (0x1AFF), you could use:

sed '/0x[0-9a-zA-Z]*/ y/abcdef/ABCDEF' file

This works fine if there are only numbers in the file. If you wanted to change the second word in a line to upper case, you are out of luck - unless you use multi-line editing. (Hey - I think there is some sort of theme here!)



Thursday, July 15, 2010

sed summary

sed -n pattern: we would add the /p at the end of the pattern

Thursday, July 8, 2010

How to manage experimental data

The data generated by the experiments is increasing significantly. How to manage them become a huge issue. In this blog we propose some best practices for successfully manage and index the datasets.
We will follow these practices in the future.

We will use the excel or OpenOffice to store the data.
The format is:
In the first table, it is about the overall indexing of the tables needed in experiment.
Next, in each table, it will only store one closely related set of data. For example, when you measure the performance of an algorithm, you may want to measure the running time of the algorithm, also, you want to store the space efficiency of the algorithm. So, in the first page, it will be the name of the two tables and some brief intro to these tables. It is better off including the name of the datasets in the first page. In the second page, it is the time table, which could be the running time of the algorithm, and the running time of some other strawman algorithms. Also, it could include the preprocessing time of the algorithm...

Another question is how to manage the raw data. Raw data is the data that is not yet processed.
For each raw data table, we will need to record the original source of the data, the name of the data.

Sunday, July 4, 2010

今天我们为什么不成功?

问题:今天我们为什么不成功?


1、首先我们没有定义好自己的成功标准是什么(是票子、车子、房子、妻子?),不清楚自己的真正目标,是为了理想、爱好、钱、事业、家庭、权利、欲望、还是人生价值的体现,因此我们每天依旧重复过着糊里糊涂的日子。生活是那么单调、枯燥。


2、我们不清楚已所之长,己所之短,完全不了解自己,到底缺什么、需补什么、擅长什么、有哪些资源,是知识、钱、关系、项目、人脉、还是时势。我们缺乏核心竞争力和不可替代性(即唯一性),所以我们往往不知道该做什么,不该做什么。永远盲目着、彷徨着。


3、我们很容易围着别人转、被别人感染、而不能让别人围着自己转、去感染和影响别人,所以注定了把别人的思想放进自己的脑袋,把自己的钱包放进别人的口袋,自己的命运被别人牢牢把握着,我们的灵魂和思想早被洗窃一空,剩下的只是行尸走肉。既然如此那么我们还能指望自己成就点什么。


4、我们习惯了肤浅的东西,看表面的文章、百般无聊、如出一辙的电视,挂QQ、玩游戏、搓麻将、泡馆子、蹲酒吧、守休闲场所、谈论众说纷纭的炒作新闻等,却少读了几本有价值的书、少见了几个有价值的人,少给了自己几分钟静夜思,严重缺乏看透事物本质的能力。所以我们今天被这个专家、明天被那个大师、后天还有某个名人,前后左右、上上下下、媒体广告、报纸、杂志、电视、网络、轮番轰炸着,在这个混淆视听的环境里,我们缺乏起码的判断力、分析力、概括力、我们往往被迷失了方向,迷失了自我。


5、我们缺乏勇气和魄力,习惯了三点一线的生活,没有了当年的匹夫之勇,不敢走回头路和不归路。我们觉得生活很无奈、工作很单调,发展很受限,却往往詹前顾后,不敢改变自己,懒于学习、不敢做领导、不敢换职位、不敢换工作、不敢创业、不敢质疑、不敢反抗、不敢发表自己的意见、不敢主动交流、不敢创新,因此我们依旧平淡无奇、素然寡味的过平凡人的日子,因为我们人生的旅途缺乏过程、缺乏那种能够品位真正酸、甜、苦、辣的勇气。


6、我们缺乏信任、合作、资源整合,我们总在猜测和矛盾中生活,仍在学着一个人打天下。我们很少拥有真正的朋友、能帮到上忙、借的上钱、铁的了心、有心灵感应时常挂念的朋友、平时不烧香临时抱佛脚,我们不太懂得相互宽容、理解、互补、平衡、分享、互利这些道理,所谓的兄弟、酒肉朋友太多,危难之中,我们可信任的人太少、信任的程度太低、信任的成本太高、我们都在相互猜忌着,力量相互内耗着。我们找不到资源的整合点,其实不会合理利用,仍在感叹我能点做什么,到底怎么办,我们认识的人层次太低,我们的胸怀太狭隘,所以很多道理,真相明白不透,我们默默的做了垫脚石而已。


7、我们缺乏行动力、执行力、做人、处事方法,仍在日复一日,年复一年平淡、懵懂的过日子。我们每时每刻都有美妙的想法,唯独缺没有做法,没有持之以恒的信心和耐力。我们不能时常的照镜子,予以自醒、禅悟。


8、我们缺乏总结力,纠正力,失败了,还是失败了,错误了,依旧错误着。我们的习惯依旧没有改变,由此养成了这种性格,最终决定了这种命运。


9、我们不懂得编织关系网,其实关系网是网状结构,先从你认识和了解的人开始,然后从认识你的人开始,最后从你朋友的朋友开始,依次类推,记得要用心和以诚相待,人与人之间其实是平等的,没有高、低、贵、贱之分(除非你真的有求于他 /她),没有什么了不起的,注意了解他/她人背景和整合资料很重要。


10、我们缺乏理财,常常不知道该买什么,该卖什么,什么是收入,什么是支出,什么是负债,什么是资产,何谓投资,如何开源节流,我们忽视了细节,量变成了质变,因此我们的现金数字依然很尴尬。我们不清楚如何找钱、挣钱、存钱、借钱、还钱、花钱。


11、我们严重缺知识,基础知识+社会知识,即学历太低、经历太少,缺乏不断学习补充、虚心请教、拜师学艺的能力


缺乏海纳百川、中西合璧、文理交融的素质、缺乏一技之长、专攻和全面,我们还是怀着陈旧的思想和笨拙的方法,我们不敢怀疑、挑战、创新新思维。


12、我们早被这个灯红酒绿、物欲横流的世界弄得焦躁不安,不能静下心来,反复,认真的思考自己的人生,稳重走好自己的每一步。我们不懂得管理时间、合理利用时间、守时。以至老大涂伤悲。


13、我们缺乏快乐感、幸福感、安全感,人与人之间太冷漠、太现实,许多家庭支离破碎、许多交际带着有色眼镜,许多圈子旁人所不能及,许多婚姻夹着交易,许多爱情不是爱情,许多亲情缺乏关心、许多兄弟背后插刀、我们害怕房奴、车奴、结婚、生子、生病、失业、人情、意外、整日惶惶,我们不知道什么是快乐幸福,不知道如何寻找、不知道调整自己的心态和位置、不明白取、舍、知足常乐、超越、分享、顺其自然这些东西。


14、我们不懂得把握时势,不懂得政治、经济的厉害关系,不明白风水轮流转、天地合一、互利互惠的道理,不明白红海和蓝海战略,不明白水能载舟亦能覆舟、没有绝对的朋友和绝对的敌人。不懂得顺应潮流和创造潮流,我们依旧固步自封,停滞不前。


15、最后我们看准了方向,做好充分的准备(破斧沉舟),请立即开始行动,坚持、坚持、再坚持!熬过了今天,明天会很美好!其间我们不断的完善自我,调整自我。愿所有有心人能成功!天道酬勤!

Monday, June 21, 2010

Notes for Using Imported Graphics in Latex

1. Some tools to generate the eps files.
a. ImageMagick and GraphicsMagick
The ImageMagick, program convert can convert a BMP, CGM, FIG, FITS, GIF, JPG, PBM, PDF, PGM, PNG, PNM, PPM, PS, RGB, TIF, XBM or XPM file to EPS format.
b. jpeg2eps

Wide figures in two column documents
If you are writing a document using two columns (i.e. you started your document with something like \documentclass[twocolumn]{article}), you might have noticed that you can't use floating elements that are wider than the width of a column (using a LaTeX notation, wider than 0.5\textwidth), otherwise you will see the image overlapping with text. If you really have to use such wide elements, the only solution is to use the "starred" variants of the floating environments, that are {figure*} and {table*}. Those "starred" versions work exactly like the standard ones, but they will be as wide as the page, so you will get no overlapping.

A bad point of those environments is that they can be placed only at the top of the page or on their own page. If you try to specify their position using modifiers like b or h they will be ignored. Add \usepackage{stfloats} to the preamble in order to alleviate this problem with regard to placing these floats at the bottom of a page, using the optional specifier [b]. Default is [tbp]. However, h still does not work.

To prevent the figures from being placed out-of-order with respect to their "non-starred" counterparts, the package fixltx2e [2] should be used (e.g. \usepackage{fixltx2e}).

\wide?
using figure* environment.

c. inserting subfigs
Subfloats
A useful extension is the subfig package [3], which uses subfloats within a single float. This gives the author the ability to have subfigures within figures, or subtables within table floats. Subfloats have their own caption, and an optional global caption. An example will best illustrate the usage of this package:

\usepackage{subfig}

\begin{figure}
\centering
\subfloat[A gull]{\label{fig:gull}\includegraphics[width=0.3\textwidth]{gull}}
\subfloat[A tiger]{\label{fig:tiger}\includegraphics[width=0.3\textwidth]{tiger}}
\subfloat[A mouse]{\label{fig:mouse}\includegraphics[width=0.3\textwidth]{mouse}}
\caption{Pictures of animals}
\label{fig:animals}
\end{figure}

d. The dia is a good tool to generate the vector-based figs.