Skip to content

Tool for compressing files within a time frame into Year-Month Archives.

License

Notifications You must be signed in to change notification settings

bisscay/archiving-script

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python script for lossless compression of processed files into archives using Gzip

Description

Create a script which will be run weekly.

Script should:

Find all files created during previous month in source location, compress them into single file named YYYY_MM.gz and place it in output location.

Source files should then be deleted

Those directories have some files already there for testing.

Note:

Script runs on Python 2.6.6 and above.

Gzip implements the DEFLATE algorithm (ZIP_DEFLATED).

Runtime error results if zlib module is absent.

POSIX does not mandate Birth time, modification time(mtime) is used.

If we are certain permission and ownership of the file remains thesame, C-Time will be a good option.

M-Time is currently used, C-Time will be adopted to improve security.

Copying of files should be done with the p(--preserve) flag to retain timestamps. cp -a for recursive copying.

Usage:

Place script in directory of choice, then run:

./compressionScript.py -s <source-location> -o <output-location> -m <start-month> -y <start-year> --endMonth <end-month> --endYear <end-year>

Source location is mandatory

Compression destination defaults to an archive folder in the source directory

By default compression starts from April, 2021

Months and years are one-base indexed; i.e January == 1

End-year if specified must be greater or equal to start-year, else current-year is used

If end-year is equal, end-month must be equal or greater than start-month, else current-month is used

Negative month(s) or year(s) are seen as invalid options

Month(s) over 12-th will default value(s)

Knowledge Base:

A volume is gotten from formating a filesystem on a partition.

Ideally;

Windows Filesystem (NTFS) has a max of 2e32 - 1 i.e 4,294,967,295 files per folder or volume

Linux Filesystem (ext4) has a max of 2,796,202 files per folder or 4,294,967,285 files per volume

There’s really no limit of the aggregate size of the files in a folder, though there are limits on the number of files in a folder.

Partition Table Schemes:

(Consideration is based on primary and not logical disks) bAe's deduction

MBR - Master Boot Record has a max volume size of 2TB

GPT - GUID Partition Table volume size is >= 2TB

Timestamps in Unix:

You can stat a file stat my_file_name to see each time or use the corresponding commands for individual views.

Access time - atime: ls -lu

When a file or directory is read from or written to.

Change time - ctime: ls -lc

Metadata changes - file's ownership (username and/or group), access permissions and file content changes.

Modify time - mtime: ls -l

When a file is written to(content changes).

Source

Shabang:

Script was created on a Windows machine with an editor's end-of-line set for MSDOS/Windows.

A dos2unix tool was used to curb all trailing ^ M tags that result in a windows environment.

This way, a shabang can be used to manipulate execution of script on linux environments.

Ensure the script has the right execution permissions and the shabang matches your python location.

ls -l to verify permission, chmod +x compressionScript.py to change permission and which python to verify location.

File Copy:

-p same as --preserve=mode,ownership,timestamps

--preserve[=attr_list] preserves the specified attributes, separated by a comma

(Default: mode,ownership,timestamps).

Additional options: links, context, xattr, all.

For recursive copy: cp -a same as cp --archive or cp -dR --preserve=ALL which additionally preserves symbolic links.

Source

About

Tool for compressing files within a time frame into Year-Month Archives.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages