Tuesday, July 21, 2009

the summary about the backup&recovery system

I think I have already finished about 70% of the system. Besides, I also gain a feeling about python. This is really a good thing.

The overall picture for this system:
1. On the client, every day the backup system runs at a given time and store all the changes in the system or a given location, compress them, and send them to the cse stdsun server.

2. On stdsun, a recovery module also runs regularly. After certain time, it would set a checkpoint(recover point). It also can remove the compressed files that are too old and of no use. Finally, it can restore the file system structure at any given time.

The modules that have been finished:

The file system for this module:
We store all the information via files in the whole project.
a. the file list of the previous day. In this list, each record contains the following attributes: document type(f means file, and d means directory), file location ( the absolute location of the files, here some optimization could be applied by using relevant location), last access time (if the file type is directory, we can omit this attribute).

b. we also need to write a change log for the system every day. This change log is put together with the backup files, and it contains important information about how to recover the file to its latest version. The format is different from the file list. It includes five attributes:
document type( f or d), file location(use absolute location of the files), last access time, renaming and the operation(U update, N newly added, D delete). We do not need to write the unchanged document record in the change log.
The renaming is needed. Here we want to put all the files into a single directory, so this will destroy the original structure. If two file with the same name and in different directories would have a name conflict. So we need to rename it and accommodate all the files with the same name into a single directory.



c. the targzfile list in the server side( the place where you store the backups, in my case, it is the cse stdsun server.
This file contains all the files that are compressed ordered by time. We regularly set checkpoints on the backups. Each checkpoint is a recovery point. After we make a checkpoint, we write a line "checkpoint" in this targzfile list.\

d. The file list in the server. This could be the same as the file list in the client. In fact , the last access time attribute is not needed. We still have it in case we may need it in the future.

The size of the file is not needed here. The main purpose of maintaining this list is comparing the current list with the previous one, and decide whether each file is a newly added one, or a updated one, or the same document of the previous one. We also need to identify the removed files through the comparison. For file type, we say two files are the same if they are at the same location and the last access times are the same. If not, then the two data files are not the same. For directories, they are identical if they have the same locations. Updates and newly added files can be identified in a similar way by using location and last access time attributes. For removal, we try to find the files or directories that appear in the previous list but not the current list. Be sure to compare the file type! For example, if in the previous list, we have a record (f, /home/ye/aaa, 111) and in the current list we have a record (d, /home/ye/aaa, 222). Then the data file in the previous location is deleted and a new directory with the same name is created.

For performance considerations, we frequently use the hashing tables.

1. The automatic backup module
This module runs daily at midnight.
It compares the current file list with the previous one, write the change logs and copy all the changed files into a directory name with the timestamp. In the last, it sends the compressed files to the backup directory in the stdsun server.

2. The recovery module
It has three tasks:
a. Remove the old compressed files
b. Set checkpoint for the system regularly
c. Reconstruct the file system structure when required.

1 comment:

  1. Something is wrong about the checkpoint file list format(file system part d)in the server. The renaming must be recorded.

    ReplyDelete

Note: Only a member of this blog may post a comment.