Seems I waste too much time every day!
Today's to do list:
1. Finish reading the paper "efficient mining frequent trees in a forest: algorithms
and applications"
2. Review the 621 and STAT620 class.
3. Write some codes to test the effectiveness of various pruning techniques based
on the clustering.
Wednesday, September 30, 2009
Sunday, September 27, 2009
how to shuffle data in out-of-core manner?
When the data is very huge and we cannot put all the data in the main
memory, the current shuffle methods that assume all the data resides in memory could no longer be used directly.
Here we use a method similar to the way in shuffling the cards ( similar to mergesort):
while( iteration < setvalue)
{
tmpfile_set = split(datafile)
datafile = merge(tmpfile_set)
delete(tmpfile_set)
iteration <= iteration + 1
}
Another way to do this is to use fopen64 function. This function is used to
open large files that could not be loaded into the memory at once. We can find
a way to shuffle the data as follows:
1. find a permutation of the [1...N], where N is the total number of records.
Suppose the permutation is [n1, n2, n3,..., nN]
2. try to put the ith record in the original file to the ni th place in the new file.
In this method, we would need to consider the following issues:
a. the fseek function is needed in step 2. If missing values exist, then the starting point of a record is difficult to find.
b. the running time of this algorithm seems a problem coz fseek may cross several blocks? I do not know how to analysis the time now.
memory, the current shuffle methods that assume all the data resides in memory could no longer be used directly.
Here we use a method similar to the way in shuffling the cards ( similar to mergesort):
while( iteration < setvalue)
{
tmpfile_set = split(datafile)
datafile = merge(tmpfile_set)
delete(tmpfile_set)
iteration <= iteration + 1
}
Another way to do this is to use fopen64 function. This function is used to
open large files that could not be loaded into the memory at once. We can find
a way to shuffle the data as follows:
1. find a permutation of the [1...N], where N is the total number of records.
Suppose the permutation is [n1, n2, n3,..., nN]
2. try to put the ith record in the original file to the ni th place in the new file.
In this method, we would need to consider the following issues:
a. the fseek function is needed in step 2. If missing values exist, then the starting point of a record is difficult to find.
b. the running time of this algorithm seems a problem coz fseek may cross several blocks? I do not know how to analysis the time now.
Friday, September 18, 2009
vim advanced
1. How to move the windows
CTRL-W r *CTRL-W_r* *CTRL-W_CTRL-R* *E443*
CTRL-W CTRL-R Rotate windows downwards/rightwards. The first window becomes
the second one, the second one becomes the third one, etc.
The last window becomes the first window. The cursor remains
in the same window.
This only works within the row or column of windows that the
current window is in.
*CTRL-W_R*
CTRL-W R Rotate windows upwards/leftwards. The second window becomes
the first one, the third one becomes the second one, etc. The
first window becomes the last window. The cursor remains in
the same window.
This only works within the row or column of windows that the
current window is in.
CTRL-W x *CTRL-W_x* *CTRL-W_CTRL-X*
CTRL-W CTRL-X Without count: Exchange current window with next one. If there
is no next window, exchange with previous window.
With count: Exchange current window with Nth window (first
window is 1). The cursor is put in the other window.
When vertical and horizontal window splits are mixed, the
exchange is only done in the row or column of windows that the
current window is in.
The following commands can be used to change the window layout. For example,
when there are two vertically split windows, CTRL-W K will change that in
horizontally split windows. CTRL-W H does it the other way around.
*CTRL-W_K*
CTRL-W K Move the current window to be at the very top, using the full
width of the screen. This works like closing the current
window and then creating another one with ":topleft split",
except that the current window contents is used for the new
window.
*CTRL-W_J*
CTRL-W J Move the current window to be at the very bottom, using the
full width of the screen. This works like closing the current
window and then creating another one with ":botright split",
except that the current window contents is used for the new
window.
*CTRL-W_H*
CTRL-W H Move the current window to be at the far left, using the
full height of the screen. This works like closing the
current window and then creating another one with
":vert topleft split", except that the current window contents
is used for the new window.
{not available when compiled without the +vertsplit feature}
*CTRL-W_L*
CTRL-W L Move the current window to be at the far right, using the full
height of the screen. This works like closing the
current window and then creating another one with
":vert botright split", except that the current window
contents is used for the new window.
{not available when compiled without the +vertsplit feature}
Some about tabs
Move between tabs:
using "gt"!
Create new tabs:
tabedit
Let's say you're editing six or seven files in Vim and realize that you need to replace a variable name with a new one. Using the :tabdo
command, you can run a search and replace through all of the tabs at once rather than changing each file individually. For instance, if you want to replace foo with bar, you'd run this:
:tabdo %s/foo/bar/g
That will run through each open tab and run the search and replace command (%s/foo/bar/g
) in each one.
Tabs can be extremely useful, and it only takes a short while to become proficient with them. For more on working with tabs in Vim, run :help tab-page-intro
within Vim.
Monday, September 14, 2009
Some key concepts in information theory
Entropy:
- Joint Entropy:
- Mutual Information:
Sunday, September 13, 2009
plan 9.13
1. Read some papers about the outlier detection and the survey.
2. Try to integrate the code of B&R. Almost done!
2. Try to integrate the code of B&R. Almost done!
Friday, September 11, 2009
the check archive command for tar and gzip
TAR:
tar -tvf file.tar WILL LIST ALL the files in the file.tar archive.
GZIP
gzip -l
tar -tvf file.tar WILL LIST ALL the files in the file.tar archive.
GZIP
gzip -l
After several days' testing and running
The backup system could work smoothly now. I made several changes to the system, including storing the filelist in the server and retrieve the filelist from the sever to the client for comparison.
Tuesday, September 8, 2009
Monday, September 7, 2009
the to do list for 9/7
1. A little nervous today. The task seems difficult.
2. The job this week is implement the LSH scheme.
3. Today I need to figure out how to generate the gaussian distribution
2. The job this week is implement the LSH scheme.
3. Today I need to figure out how to generate the gaussian distribution
Saturday, September 5, 2009
python exceptions
python: try except raise
java : try catch throw finally?
The mechnisms of the exceptions are the same.
If we want to catch multiple exceptions:
A try statement may have more than one except clause, to specify handlers for different exceptions. At most one handler will be executed. Handlers only handle exceptions that occur in the corresponding try clause, not in other handlers of the same try statement. An except clause may name multiple exceptions as a parenthesized tuple, for example:
... except (RuntimeError, TypeError, NameError):
... pass
How to print the exception info inside the except statement
>>> try:... raise Exception('spam', 'eggs')
... except Exception as inst:
... print type(inst) # the exception instance
... print inst.args # arguments stored in .args
... print inst # __str__ allows args to printed directly
... x, y = inst # __getitem__ allows args to be unpacked directly
... print 'x =', x
... print 'y =', y
java : try catch throw finally?
The mechnisms of the exceptions are the same.
If we want to catch multiple exceptions:
A try statement may have more than one except clause, to specify handlers for different exceptions. At most one handler will be executed. Handlers only handle exceptions that occur in the corresponding try clause, not in other handlers of the same try statement. An except clause may name multiple exceptions as a parenthesized tuple, for example:
... except (RuntimeError, TypeError, NameError):
... pass
How to print the exception info inside the except statement
>>> try:... raise Exception('spam', 'eggs')
... except Exception as inst:
... print type(inst) # the exception instance
... print inst.args # arguments stored in .args
... print inst # __str__ allows args to printed directly
... x, y = inst # __getitem__ allows args to be unpacked directly
... print 'x =', x
... print 'y =', y
Subscribe to:
Posts (Atom)