module Crystar
Overview
Crystar
module contains readers and writers for tar archive.
Tape archives (tar) are a file format for storing a sequence of files that can be read and written in a streaming manner.
This module aims to cover most variations of the format, including those produced by GNU and BSD tar tools.
Example
files = [
{"readme.txt", "This archive contains some text files."},
{"minerals.txt", "Mineral names:\nalunite\nchromium\nvlasovite"},
{"todo.txt", "Get crystal mining license."},
]
buf = IO::Memory.new
Crystar::Writer.open(buf) do |tw|
files.each_with_index do |f, UNDERSCORE|
hdr = Header.new(
name: f[0],
mode: 0o600_i64,
size: f[1].size.to_i64
)
tw.write_header(hdr)
tw.write(f[1].to_slice)
end
end
# Open and iterate through the files in the archive
buf.pos = 0
Crystar::Reader.open(buf) do |tar|
tar.each_entry do |entry|
p "Contents of #{entry.name}"
IO.copy entry.io, STDOUT
p "\n"
end
end
Extended Modules
Defined in:
tar/format.crtar/helper.cr
tar/header.cr
tar/reader.cr
tar/writer.cr
crystar.cr
Constant Summary
-
BASIC_KEYS =
{PAX_PATH => true, PAX_LINK_PATH => true, PAX_SIZE => true, PAX_UID => true, PAX_GID => true, PAX_UNAME => true, PAX_GNAME => true, PAX_MTIME => true, PAX_ATIME => true, PAX_CTIME => true}
-
BLOCK =
'4'
-
Character device node
-
BLOCK_SIZE =
512
-
CHAR =
'3'
-
Symbolic link
-
CONT =
'7'
-
DIR =
'5'
-
Block device node
-
FIFO =
'6'
-
Directory
-
GNU_LONGLINK =
'K'
-
GNU_LONGNAME =
'L'
-
'L' and 'K' are used by teh GNU format for a meta file used to store the path or link name for the next file.
-
GNU_SPARSE =
'S'
-
Used by PAX format to key-value records that are relevant to all subsequent files.
-
ISBLK =
24576
-
Symbolic link
-
ISCHR =
8192
-
Block special file
-
ISDIR =
16384
-
Save text (sticky bit) Common Unix mode constants; these are not defined in any common tar standard. Header.FileInfo understands these, but FileInfoHeader will never produce these. Common Unix mode constants; these are not defined in any common tar standard. Header.FileInfo understands these, but FileInfoHeader will never produce these.
-
ISFIFO =
4096
-
Directory
-
ISGID =
1024
-
Set uid
-
ISLINK =
40960
-
Regular file
-
ISREG =
32768
-
FIFO
-
ISSOCK =
49152
-
Character special file
-
ISUID =
2048
-
Mode constants from USTAR spec: See http://pubs.opengroup.org/onlinepubs/9699919799/utilities/pax.html#tag_20_92_13_06
-
ISVTX =
512
-
Set gid
-
LINK =
'1'
-
'1' to '6' are header-only flags and may not have a data body.
-
MAGIC_GNU =
"ustar "
-
Magics used to identify various formats.
-
MAGIC_USTAR =
"ustar\u0000"
-
MAX_NANO_SECOND_DIGITS =
9
-
NAME_SIZE =
100
-
Size of each block in a tar stream
-
PADDING =
3
-
PAX_ATIME =
"atime"
-
PAX_CHARSET =
"charset"
-
Removed from later revision of PAX spec, but was valid
-
PAX_COMMENT =
"comment"
-
Currently unused
-
PAX_CTIME =
"ctime"
-
PAX_GID =
"gid"
-
PAX_GNAME =
"gname"
-
PAX_GNU_SPARSE =
"GNU.sparse."
-
Keywords for GNU sparse files in a PAX extended header.
-
PAX_GNU_SPARSE_MAJOR =
"GNU.sparse.major"
-
PAX_GNU_SPARSE_MAP =
"GNU.sparse.map"
-
PAX_GNU_SPARSE_MINOR =
"GNU.sparse.minor"
-
PAX_GNU_SPARSE_NAME =
"GNU.sparse.name"
-
PAX_GNU_SPARSE_NUMBLOCKS =
"GNU.sparse.numblocks"
-
PAX_GNU_SPARSE_NUMBYTES =
"GNU.sparse.numbytes"
-
PAX_GNU_SPARSE_OFFSET =
"GNU.sparse.offset"
-
PAX_GNU_SPARSE_REALSIZE =
"GNU.sparse.realsize"
-
PAX_GNU_SPARSE_SIZE =
"GNU.sparse.size"
-
PAX_LINK_PATH =
"linkpath"
-
PAX_MTIME =
"mtime"
-
PAX_NONE =
""
-
Keywords for PAX extended header records
-
PAX_PATH =
"path"
-
indicates that no PAX key is suitable
-
PAX_SCHILY_XATTR =
"SCHILY.xattr."
-
PAX_SIZE =
"size"
-
PAX_UID =
"uid"
-
PAX_UNAME =
"uname"
-
PREFIX_SIZE =
155
-
Max length of the name in USTAR format
-
REG =
'0'
-
Type flags for Header#flag
-
REGA =
'\u{0}'
-
SYMLINK =
'2'
-
Hard link
-
TRAILER_STAR =
"tar\u0000"
-
VERSION =
"0.1.0"
-
VERSION_GNU =
" \u0000"
-
VERSION_USTAR =
"00"
-
XGLOBAL_HEADER =
'g'
-
Used by PAX format to store key-value records that are only relevant to the next file.
-
XHEADER =
'x'
-
reserved
Class Method Summary
-
.parse_pax(r : IO) : Hash(String, String)
parse_pax parses PAX headers If an extended header (type 'x') is invalid, exception is raised
-
.read_gnu_sparse_map0x1(pax_hdrs : Hash(String, String))
read_gnu_sparse_map0x1 reads the sparse map as stored in GNU's PAX sparse format version 0.1.
-
.read_gnu_sparse_map1x0(r : IO)
read_gnu_sparse_map1x0 reads the sparse map as stored in GNU's PAX sparse format version 1.0.
-
.try_read_full(r : IO, b : Bytes)
is like read_fully except it returns EOF when it is hit before b.size bytes are read
Instance Method Summary
-
#align_sparse_entries(src : Array(SparseEntry), size : Int64)
align_sparse_entries mutates src and returns dst where each fragment's starting offset is aligned up to the nearest block edge, and each ending offset is aligned down to the nearest block edge.
-
#block_padding(offset : Int)
blockPadding computes the number of bytes needed to pad offset up to the nearest block edge where 0 <= n < blockSize.
- #byte_index(s : String, c : Char)
- #byte_index(bytes : Bytes, b : Int)
-
#file_info_header(fi : File, link : String)
file_info_header creates a partially-populated Header from fi.
- #fits_in_base256(n : Int32, x : Int64)
-
#fits_in_octal(n : Int32, x : Int64)
fits_in_octal reports whether the integer x fits in a field n-bytes long using octal encoding with the appropriate NUL terminator.
-
#format_pax_record(k : String, v : String)
format_pax_record formats a single PAX record, prefixing it with the appropriate length
-
#format_pax_time(ts : Time)
format_pax_time converts ts into a time of the form %d.%d as described in the PAX specification.
-
#has_nul(s : String)
checks whether NUL character exists within s
- #header_only_type?(flag)
-
#invert_sparse_entries(src : Array(SparseEntry), size : Int64)
invert_sparse_entries converts a sparse map from one form to the other.
- #ltrim(b : Bytes, s : String)
-
#merge_pax(hdr : Header, pax_hdrs : Hash(String, String))
merg_pax merges paxHdrs into hdr for all relevant fields of Header.
-
#parse_pax_record(s : String) : ::Tuple(String, String, String)
parse_pax_record parses the input PAX record string into a key-value pair.
-
#parse_pax_time(s : String)
parse_pax_time takes a string of the form %d.%d as described in the PAX specification.
- #rtrim(b : Bytes, s : String)
- #split_ustar_path(name : String)
-
#to_ascii(s : String)
to_ascii converts the input to an ASCII C-style string.
- #trim_bytes(b : Bytes, s : String)
- #unix_time(sec : Int, nsec : Int)
- #unix_time(sec, nsec)
-
#valid_pax_record(k : String, v : String)
valid_pax_record reports whether the key-value pair is valid where each record is formatted as: "%d %s=%s\n" % (size, key, value)
- #validate_sparse_entries(sp : Array(SparseEntry), size : Int64)
Class Method Detail
parse_pax parses PAX headers If an extended header (type 'x') is invalid, exception is raised
read_gnu_sparse_map0x1 reads the sparse map as stored in GNU's PAX sparse format version 0.1. The sparse map is stored in the PAX headers.
read_gnu_sparse_map1x0 reads the sparse map as stored in GNU's PAX sparse format version 1.0. The format of the sparse map consists of a series of newline-terminated numeric fields. The first field is the number of entries and is always present. Following this are the entries, consisting of two fields (offset, length). This function must stop reading at the end boundary of the block containing the last newline.
Note that the GNU manual says that numeric values should be encoded in octal format. However, the GNU tar utility itself outputs these values in decimal. As such, this library treats values as being encoded in decimal.
is like read_fully except it returns EOF when it is hit before b.size bytes are read
Instance Method Detail
align_sparse_entries mutates src and returns dst where each fragment's starting offset is aligned up to the nearest block edge, and each ending offset is aligned down to the nearest block edge.
Even though the Crystar Reader and the BSD tar utility can handle entries with arbitrary offsets and lengths, the GNU tar utility can only handle offsets and lengths that are multiples of blockSize.
blockPadding computes the number of bytes needed to pad offset up to the nearest block edge where 0 <= n < blockSize.
file_info_header creates a partially-populated Header from fi. If fi describes a symlink, this records link as the link target. If fi describes a directory, a slash is appended to the name.
fits_in_octal reports whether the integer x fits in a field n-bytes long using octal encoding with the appropriate NUL terminator.
format_pax_record formats a single PAX record, prefixing it with the appropriate length
format_pax_time converts ts into a time of the form %d.%d as described in the PAX specification. This function is capable of negative timestamps.
invert_sparse_entries converts a sparse map from one form to the other. If the input is sparseHoles, then it will output sparseDatas and vice-versa. The input must have been already validated.
This function mutates src and returns a normalized map where:
- adjacent fragments are coalesced together
- only the last fragment may be empty
- the endOffset of the last fragment is the total size
merg_pax merges paxHdrs into hdr for all relevant fields of Header.
parse_pax_record parses the input PAX record string into a key-value pair. If parsing is successful, it will slice off the currently read record and return the remainder as r.
parse_pax_time takes a string of the form %d.%d as described in the PAX specification. Note that this implementation allows for negative timestamps, which is allowed for by the PAX specification, but not always portable.
to_ascii converts the input to an ASCII C-style string. This a best effort conversion, so invalid characters are dropped.
valid_pax_record reports whether the key-value pair is valid where each record is formatted as: "%d %s=%s\n" % (size, key, value)
Keys and values should be UTF-8, but the number of bad writers out there forces us to be a more liberal. Thus, we only reject all keys with NUL, and only reject NULs in values for the PAX version of the USTAR string fields. The key must not contain an '=' character.