tiny thoughts: August 2009

Monday, August 31, 2009

the groups in Ubuntu

Here are the most commonly used group management command-line tools:

groupadd—This command creates and adds a new group.
groupdel—This command removes an existing group.
groupmod—This command creates a group name or GIDs but doesn't add or delete members from a group.
gpasswd—This command creates a group password. Every group can have a group password and an administrator. Use the -A argument to assign a user as group administrator.
useradd -G—The -G argument adds a user to a group during the initial user creation. (More arguments are used to create a user.)
usermod -G—This command allows you to add a user to a group so long as the user is not logged in at the time.
grpck—A command for checking the /etc/group file for typos.

As an example, there is a DVD-RW device (/dev/scd0) on our computer that the sysadmin wants a regular user named john to have access to. To grant john that access, we would follow these steps:

1.	Add a new group with the `groupadd` command: # groupadd dvdrw
2.	Change the group ownership of the device to the new group with the `chgrp` command: # chgrp dvdrw /dev/scd0
3.	Add the approved user to the group with the `usermod` command: # usermod -G dvdrw john
4.	Make user `john` the group administrator with the `gpasswd` command so that she can add new users to the group: # gpasswd -A john

Friday, August 28, 2009

About the convience variable in GDB and how to print arrays in GDB

GDB provides convenience variables that you can use within GDB to hold on to a value and refer to it later. These variables exist entirely within GDB; they are not part of your program, and setting a convenience variable has no direct effect on further execution of your program. That is why you can use them freely.

Convenience variables are prefixed with `$'. Any name preceded by `$' can be used for a convenience variable, unless it is one of the predefined machine-specific register names (see section Registers). (Value history references, in contrast, are numbers preceded by `$'. See section Value history.)

You can save a value in a convenience variable with an assignment expression, just as you would set a variable in your program. For example:

set $foo = *object_ptr

would save in $foo the value contained in the object pointed to by object_ptr.

Using a convenience variable for the first time creates it, but its value is void until you assign a new value. You can alter the value with another assignment at any time.

Convenience variables have no fixed types. You can assign a convenience variable any type of value, including structures and arrays, even if that variable already has a value of a different type. The convenience variable, when used as an expression, has the type of its current value.

show convenience: Print a list of convenience variables used so far, and their values. Abbreviated show con.

One of the ways to use a convenience variable is as a counter to be incremented or a pointer to be advanced. For example, to print a field from successive elements of an array of structures:

set $i = 0
print bar[$i++]->contents

Repeat that command by typing RET.

How to print Arrays in GDB?

You can do this by referring to a contiguous span of memory as an artificial array, using the binary operator `@'. The left operand of `@' should be the first element of the desired array and be an individual object. The right operand should be the desired length of the array. The result is an array value whose elements are all of the type of the left argument. The first element is actually the left argument; the second element comes from bytes of memory immediately following those that hold the first element, and so on. Here is an example. If a program says

int *array = (int *) malloc (len * sizeof (int));

you can print the contents of array with

p *array@len

investigating a doubling algorithm

Hope this would be helpful.

Tuesday, August 25, 2009

some plans in the following days

I would write something about the stream/one-pass clustering algorithms summary in the following few days.

Monday, August 17, 2009

Two ways to get the index and the value from List in Python

There are two ways to get both the index and the value from the List.
One is :

    for index, item in enumerate(L):
      print index, item

The other is:

    for index in range(len(L)):
      print index, L[index]

Still need to work hard on research. Seems the proof is wrong.

The index method does a linear search, and stops at the first matching item. If no matching item is found, it raises a ValueError exception.

try:
i = L.index(value)
except ValueError:
i = -1 # no match

To get the index for all matching items, you can use a loop, and pass in a start index:

i = -1
try:
while 1:
i = L.index(value, i+1)
print "match at", i
except ValueError:
pass

Thursday, August 13, 2009

proximity

Main Entry: prox·im·i·ty
Pronunciation: \präk-ˈsi-mə-tē\

Tuesday, August 11, 2009

the locality sensitive hashing method for nearest neighbor search

the locality sensitive hashing method for NNS:
1. The approximation method, not an exact one.
2. Could be used when approximate answer is acceptable.
3. A very simplified description of the LSH alg.
A hash function is called locality sensitive if for any two points p, q, Pr(h[p] = h[q]) is strictly decreasing as the distance between p and q increases.
An example, please look at the CACM paper about LSH.

1. Generate a set of vectors. The dimension of the vectors should be the same as the one of the data points. Each value in these vectors follows a certain distribution(normal distribution) and these values are independent to each other.
2. Compute a fingerprint for each data point using the above vectors. We need to make sure that the closest the two points, the more likely the corresponding fingerprints would be the same.

For the sake of the NNSs,
collect all the data that has the same hash values, and calculate the real distance to check whether they are NNs or not.

Continuing the summary in 8.10

The R-tree spatial indexing.
This is similar to the B/B+-tree indexing. The difference is, in B+-tree, the keys are one dimension value, while in R+-tree, the keys are bounding boxes. All the data in R-tree is in the leaf nodes.

Insertion in R-tree: If a region is not included in the current bounding boxes, insert it to the bounding boxes that would cause least changes. When a node becomes too full, split it(many variants here).

R+-tree is better in querying. An insertion may go down along many paths as a region R must be inserted to all bounding boxes that overlaps with it. However, the searching could be much fast as you could choose any path to check.

plan 8.11

1. finish the research summary on the NNS.

2. Continue on the outlier detection, prove that the lazy calculation would generate better results than the proposed one.

3. Finish the english recording, the feedback, the class observation and so on.

4. Update the code for the backup&recovery system.

5. Read three news in English

Monday, August 10, 2009

青春无悔

Youth, no regret

Youth is not a time of life; it is a state of mind. It is not a matter of rosy cheeks, red lips and supple knees. It is a matter of the will, a quality of the imagination, vigor of the emotions; it is the freshness of the deep spring of life.

Youth means a temperamental predominance of courage over timidity, of the appetite for adventure over the love of ease. This often exits in a man of 60, more than a boy of 20.nobody grows merely by the number of years; we grow old by deserting our ideas. Years may wrinkle the skin, but to give up enthusiasm wrinkles the soul. Worry, fear, self-distrust1 bows the heart and turns the spirit back to dust.

Whether 60 or 16, there is in every human being’s heart the lure of wonders, the unfailing childlike appetite of what’s next and the joy of the game of living. In the center of your heart and my heart there is a wireless station; so long as it receives messages of beauty, hope, cheer, courage and power from men and from infinite, so long as you are young.

When the aerials are down, and your spirit is covered with the snows of cynicism and the ice of pessimism, then you’ve grown old, even at 20, but as long as your aerials are up, to catch waves of optimism, there’s hope you may die young at 80.

a summary about nearest neighbor search

As I planned yesterday, I would give a summary about the nearest neighbor search today. Now comes it.

Problem statement: Given a dataset, find the nearest neighbor(s) for a certain point q.

1. Linear search
Scan the dataset linearly, and find the point that has the shortest distance to q. Although the time complexity is linear to the data size, it is not scalable when the size is extremly large, i.e., billions of webpages.

2. The use of tree structures
Two methods would be covered here. When the dimension is k:

kd-tree: short for k-dimensional tree.

The construction of the kd-tree is quite similar to the binary tree, except that each node is a k dimensional data point. When we split the data points at the ith depth, we select the median of the (i mod k) dimension of these points as node, and put the data points that are less than the node to the left, otherwise to the right. We keep on splitting the data points until we have no point to split.

Adding/deleting element, and updating the tree is omitted for simplicity.

When it is used for NNS, the pruning technique is employed(applied, used). The search is in a depth-first manner. We try to see whether the data points under a certain node could be neighbors of the query point q. If not possible, we prune the whole subtree.

It is pointed out the kd-tree is not good for the high dimensional search for the NNS.

R-tree: it is similar to B-tree. All the data points are put in the leaf nodes, while in kd-tree the internal nodes could also be data points. It also has efficient updating algorithm, which makes it suitable for the dynamically changing data. Each non-leaf node stores two pieces of data: one is pointers to other data points, the other is the bounding box for these data points.

The creation of the R-tree: to be continued.

Reference one: http://en.wikipedia.org/wiki/Nearest_neighbor_search

Friday, August 7, 2009

the friday panic

Friday is coming...

Panic...

This week is so terrable.

Wednesday, August 5, 2009

read in English

–read/listen more example news/newspaper
–if you are still reading/writing in your mother tongue more than 50% of your time, then your English will hardly improve

Monday, August 3, 2009

my 100th post

Congrats! I have finished 100 posts already!

This is a fruitful day, I like it.

whole word matching in vim

The way we do whole word matching could be as follows:
suppose you want to match fname in a c file, but not the fnames.
You could try /fname\W, where \W can match any non-letter.

If you want to match fnames rather than fname, use /fnames

A link about the vim re can be found here:
http://www.geocities.com/volontir/#substitute

Sunday, August 2, 2009

everyone is unique

Saturday, August 1, 2009

没想到我还是曾粉

虽然她的声音那么烂。。

iterating over datastructures in python

1. To iterate over a dictionary in Python:
>>> params = {"server":"mpilgrim", "database":"master", "uid":"sa", "pwd":"secret"}
>>> for (u, v) in params:
... print u+" " +v
...
pwd secret
database master
uid sa
server mpilgrim

2. To iterate over a list:

>>> params = ["server", "mpilgrim", "database", "master", "uid", "sa", "pwd", "secret"]
>>> for elem in params:
... print elem
server
mpilgrim
database
master
uid
sa
pwd
secret

To get the index of the element at the same time, use the following format:
>>> params = ["server", "mpilgrim", "database", "master", "uid", "sa", "pwd", "secret"]
>>> for i in range(len(params)):
... print (i, params[i])
...
(0, 'server')
(1, 'mpilgrim')
(2, 'database')
(3, 'master')
(4, 'uid')
(5, 'sa')
(6, 'pwd')
(7, 'secret')

3. To iterate over a tuple:
the same as the list

4. To iterate over a set:
The same as the list

lack of mathematics and statistics knowledge

This morning I reviewed the hypothesis testing and confidence intervals, I felt really upset. Next quarter I would register for the 620 and study hard on it!

tiny thoughts