Thursday, March 23, 2006

Using the split command to break up large files

The split command allows you to break up a large file into several smaller files. A file can split up based on size or lines. The split command is available on UNIX, Linux, and Windows (through Cygwin). Below is an example of a large file a user may want to split:


/cygdrive/c/temp/split> ls -l
total 1571886

-rw-r--r-- 1 userid group 1609611264 Mar 23
10:39 test.xml

The following command will split up the file into 300 MB chunks:

/cygdrive/c/temp/split> split -b300m test.xml
/cygdrive/c/temp/split> ls -lrt
total 3143772
-rw-r--r-- 1 userid group 1609611264 Mar 23 10:39 test.xml
-rw-r--r-- 1 userid group 314572800 Mar 23 10:43 xaa
-rw-r--r-- 1 userid group 314572800 Mar 23 10:43 xab
-rw-r--r-- 1 userid group 314572800 Mar 23 10:44 xac
-rw-r--r-- 1 userid group 314572800 Mar 23 10:44 xad
-rw-r--r-- 1 userid group 314572800 Mar 23 10:45 xae
-rw-r--r-- 1 userid group 36747264 Mar 23 10:45 xaf


In this example the large file was separated into six smaller files(xaa, xab, xac, xad, xae, xaf). The length of the files names generated by split is configurable with the –a option. Note that the split command will keep the original file intact. For splitting up larger files, the verbose option is useful for monitoring progress.

2 Comments:

Blogger Doug said...

How do you configure the filenames of the split?

2:35 PM  
Blogger Java Man said...

I dont think its possible to change the filenames generated by split. You can specify the length of the filename generated. The example below creates filenames with a suffix length of 5:

/cygdrive/c/temp/split> split -b300m --verbose -a5 test.xml
creating file `xaaaaa'
creating file `xaaaab'
creating file `xaaaac'
creating file `xaaaad'
creating file `xaaaae'
creating file `xaaaaf'

/cygdrive/c/temp/split> ls -l
total 3143772
-rw-r--r-- 1 userid group 1609611264 Mar 23 10:39 test.xml
-rw-r--r-- 1 userid group 314572800 Mar 23 17:04 xaaaaa
-rw-r--r-- 1 userid group 314572800 Mar 23 18:01 xaaaab
-rw-r--r-- 1 userid group 314572800 Mar 23 18:01 xaaaac
-rw-r--r-- 1 userid group 314572800 Mar 23 18:02 xaaaad
-rw-r--r-- 1 userid group 314572800 Mar 23 18:03 xaaaae
-rw-r--r-- 1 userid group 36747264 Mar 23 18:03 xaaaaf

4:29 PM  

Post a Comment

Subscribe to Post Comments [Atom]

<< Home