11/12/2023 0 Comments Extract tar xzSo use the -I/ -use-compress-program=COMMAND option for tar instead note this option might not work on macOS but placing here anyway just in case. That said, that only seems to work with GZip 1.8 and is depreciated on later versions. I know you are asking about xz but explained in this answer here, on older versions of GZip you can set the compression level with an environment variable like this: GZIP=-9 tar cf folderpath Try setting the compression levels in the macOS command line. Every minute I read out all the locations, but only a few of these locations have a different value from minute to minute.īy sorting the files by name, two subsequent files have little different characters between them.Īpparently this is very favourable for the compression efficiency. My JSON files contain locations of hundreds of vehicles. I think the reason sorting has such an impact in my case is as follows: This can be avoided by setting recursive to False. Python tarfile sortingįinally, the documentation of the Python TarFile.add function confirms that Python tarfile sorts by default:ĭirectories are added recursively by default. The effect sorting has on the final archive size is further demonstrated by first concatenating all the JSON files sorted by name (which has the creation unixtime at the beginning of it) and then tarring with BSD tar: cat *.json > all.txt The archive is 1.5 MB, equal to the size of the archive created by the Python library. To test this I installed GNU tar on my Mac with: brew install gnu-tarĪnd then tarred the same folder, but with the -sort option: gtar -sort='name' -cJf /Users/user/Desktop/temp/tar/ The default is -sort=none, which stores archive members in the same order as returned by the operating system. Sort directory entries according to ORDER, which is one of none, name, or inode. I think the underlying issue is that BSD tar and GNU tar without any sort options put the files in the archive in an undefined order. Short answer: yes, it is safe to use Python tarlib to compress the data, nothing is lost compared to BSD tar. What is going on? Am I losing something by using the Python library to compress my data? Is the 15-fold difference in size an indicator of some issue? Or can I safely go ahead and use the efficient Python implementation? The zsh archive uses an unknown order, and the Python archive orders the file by modification date. If I inspect the archives with Quicklook (and the Betterzip plugin) I see that the files in the archive are ordered in a different way: If I compare the two tar archives directly, they seem different: ➜ diff īinary files and differ tar on Raspbian 10: xz (XZ Utils) 5.2.4 liblzma 5.2.4Īfter compression, I've extracted both archives and compared the resulting folder with: diff -r py-archive-expanded zsh-archive-expanded.With tarfile.open(py_out, "w:xz") as tar: This script compares both methods: #!/usr/bin/env python3įullpath = Path("/Users/user/Desktop/temp/tar/") I'm compressing ~1.3 GB folders each filled with 1440 JSON files and find that there's a 15-fold difference between using the tar command and Python's built-in tarfile library on macOS or Raspbian 10 (Buster) Minimal working example
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |