module Crystar

Overview

Crystar module contains readers and writers for tar archive. Tape archives (tar) are a file format for storing a sequence of files that can be read and written in a streaming manner. This module aims to cover most variations of the format, including those produced by GNU and BSD tar tools.

Example

files = [
  {"readme.txt", "This archive contains some text files."},
  {"minerals.txt", "Mineral names:\nalunite\nchromium\nvlasovite"},
  {"todo.txt", "Get crystal mining license."},
]
buf = IO::Memory.new
Crystar::Writer.open(buf) do |tw|
  files.each_with_index do |f, UNDERSCORE|
    hdr = Header.new(
      name: f[0],
      mode: 0o600_i64,
      size: f[1].size.to_i64
    )
    tw.write_header(hdr)
    tw.write(f[1].to_slice)
  end
end

# Open and iterate through the files in the archive
buf.pos = 0
Crystar::Reader.open(buf) do |tar|
  tar.each_entry do |entry|
    p "Contents of #{entry.name}"
    IO.copy entry.io, STDOUT
    p "\n"
  end
end

Extended Modules

Crystar

Defined in:

tar/format.cr
tar/helper.cr
tar/header.cr
tar/reader.cr
tar/writer.cr
crystar.cr

Constant Summary

BASIC_KEYS = {PAX_PATH => true, PAX_LINK_PATH => true, PAX_SIZE => true, PAX_UID => true, PAX_GID => true, PAX_UNAME => true, PAX_GNAME => true, PAX_MTIME => true, PAX_ATIME => true, PAX_CTIME => true}
BLOCK = '4': Character device node
BLOCK_SIZE = 512
CHAR = '3': Symbolic link
CONT = '7'
DIR = '5': Block device node
FIFO = '6': Directory
GNU_LONGLINK = 'K'
GNU_LONGNAME = 'L': 'L' and 'K' are used by teh GNU format for a meta file used to store the path or link name for the next file.
GNU_SPARSE = 'S': Used by PAX format to key-value records that are relevant to all subsequent files.
ISBLK = 24576: Symbolic link
ISCHR = 8192: Block special file
ISDIR = 16384: Save text (sticky bit) Common Unix mode constants; these are not defined in any common tar standard. Header.FileInfo understands these, but FileInfoHeader will never produce these. Common Unix mode constants; these are not defined in any common tar standard. Header.FileInfo understands these, but FileInfoHeader will never produce these.
ISFIFO = 4096: Directory
ISGID = 1024: Set uid
ISLINK = 40960: Regular file
ISREG = 32768: FIFO
ISSOCK = 49152: Character special file
ISUID = 2048: Mode constants from USTAR spec: See http://pubs.opengroup.org/onlinepubs/9699919799/utilities/pax.html#tag_20_92_13_06
ISVTX = 512: Set gid
LINK = '1': '1' to '6' are header-only flags and may not have a data body.
MAGIC_GNU = "ustar ": Magics used to identify various formats.
MAGIC_USTAR = "ustar\u0000"
MAX_NANO_SECOND_DIGITS = 9
NAME_SIZE = 100: Size of each block in a tar stream
PADDING = 3
PAX_ATIME = "atime"
PAX_CHARSET = "charset": Removed from later revision of PAX spec, but was valid
PAX_COMMENT = "comment": Currently unused
PAX_CTIME = "ctime"
PAX_GID = "gid"
PAX_GNAME = "gname"
PAX_GNU_SPARSE = "GNU.sparse.": Keywords for GNU sparse files in a PAX extended header.
PAX_GNU_SPARSE_MAJOR = "GNU.sparse.major"
PAX_GNU_SPARSE_MAP = "GNU.sparse.map"
PAX_GNU_SPARSE_MINOR = "GNU.sparse.minor"
PAX_GNU_SPARSE_NAME = "GNU.sparse.name"
PAX_GNU_SPARSE_NUMBLOCKS = "GNU.sparse.numblocks"
PAX_GNU_SPARSE_NUMBYTES = "GNU.sparse.numbytes"
PAX_GNU_SPARSE_OFFSET = "GNU.sparse.offset"
PAX_GNU_SPARSE_REALSIZE = "GNU.sparse.realsize"
PAX_GNU_SPARSE_SIZE = "GNU.sparse.size"
PAX_LINK_PATH = "linkpath"
PAX_MTIME = "mtime"
PAX_NONE = "": Keywords for PAX extended header records
PAX_PATH = "path": indicates that no PAX key is suitable
PAX_SCHILY_XATTR = "SCHILY.xattr."
PAX_SIZE = "size"
PAX_UID = "uid"
PAX_UNAME = "uname"
PREFIX_SIZE = 155: Max length of the name in USTAR format
REG = '0': Type flags for Header#flag
REGA = '\u{0}'
SYMLINK = '2': Hard link
TRAILER_STAR = "tar\u0000"
VERSION = "0.1.0"
VERSION_GNU = " \u0000"
VERSION_USTAR = "00"
XGLOBAL_HEADER = 'g': Used by PAX format to store key-value records that are only relevant to the next file.
XHEADER = 'x': reserved

Class Method Summary

.parse_pax(r : IO) : Hash(String, String)
parse_pax parses PAX headers If an extended header (type 'x') is invalid, exception is raised
.read_gnu_sparse_map0x1(pax_hdrs : Hash(String, String))
read_gnu_sparse_map0x1 reads the sparse map as stored in GNU's PAX sparse format version 0.1.
.read_gnu_sparse_map1x0(r : IO)
read_gnu_sparse_map1x0 reads the sparse map as stored in GNU's PAX sparse format version 1.0.
.try_read_full(r : IO, b : Bytes)
is like read_fully except it returns EOF when it is hit before b.size bytes are read

Instance Method Summary

#align_sparse_entries(src : Array(SparseEntry), size : Int64)
align_sparse_entries mutates src and returns dst where each fragment's starting offset is aligned up to the nearest block edge, and each ending offset is aligned down to the nearest block edge.
#block_padding(offset : Int)
blockPadding computes the number of bytes needed to pad offset up to the nearest block edge where 0 <= n < blockSize.
#byte_index(s : String, c : Char)
#byte_index(bytes : Bytes, b : Int)
#file_info_header(fi : File, link : String)
file_info_header creates a partially-populated Header from fi.
#fits_in_base256(n : Int32, x : Int64)
#fits_in_octal(n : Int32, x : Int64)
fits_in_octal reports whether the integer x fits in a field n-bytes long using octal encoding with the appropriate NUL terminator.
#format_pax_record(k : String, v : String)
format_pax_record formats a single PAX record, prefixing it with the appropriate length
#format_pax_time(ts : Time)
format_pax_time converts ts into a time of the form %d.%d as described in the PAX specification.
#has_nul(s : String)
checks whether NUL character exists within s
#header_only_type?(flag)
#invert_sparse_entries(src : Array(SparseEntry), size : Int64)
invert_sparse_entries converts a sparse map from one form to the other.
#ltrim(b : Bytes, s : String)
#merge_pax(hdr : Header, pax_hdrs : Hash(String, String))
merg_pax merges paxHdrs into hdr for all relevant fields of Header.
#parse_pax_record(s : String) : ::Tuple(String, String, String)
parse_pax_record parses the input PAX record string into a key-value pair.
#parse_pax_time(s : String)
parse_pax_time takes a string of the form %d.%d as described in the PAX specification.
#rtrim(b : Bytes, s : String)
#split_ustar_path(name : String)
#to_ascii(s : String)
to_ascii converts the input to an ASCII C-style string.
#trim_bytes(b : Bytes, s : String)
#unix_time(sec : Int, nsec : Int)
#unix_time(sec, nsec)
#valid_pax_record(k : String, v : String)
valid_pax_record reports whether the key-value pair is valid where each record is formatted as: "%d %s=%s\n" % (size, key, value)
#validate_sparse_entries(sp : Array(SparseEntry), size : Int64)

Class Method Detail

def self.parse_pax(r : IO) : Hash(String, String) #

parse_pax parses PAX headers If an extended header (type 'x') is invalid, exception is raised

[View source]

def self.read_gnu_sparse_map0x1(pax_hdrs : Hash(String, String)) #

read_gnu_sparse_map0x1 reads the sparse map as stored in GNU's PAX sparse format version 0.1. The sparse map is stored in the PAX headers.

[View source]

def self.read_gnu_sparse_map1x0(r : IO) #

read_gnu_sparse_map1x0 reads the sparse map as stored in GNU's PAX sparse format version 1.0. The format of the sparse map consists of a series of newline-terminated numeric fields. The first field is the number of entries and is always present. Following this are the entries, consisting of two fields (offset, length). This function must stop reading at the end boundary of the block containing the last newline.

Note that the GNU manual says that numeric values should be encoded in octal format. However, the GNU tar utility itself outputs these values in decimal. As such, this library treats values as being encoded in decimal.

[View source]

def self.try_read_full(r : IO, b : Bytes) #

is like read_fully except it returns EOF when it is hit before b.size bytes are read

[View source]

Instance Method Detail

def align_sparse_entries(src : Array(SparseEntry), size : Int64) #

align_sparse_entries mutates src and returns dst where each fragment's starting offset is aligned up to the nearest block edge, and each ending offset is aligned down to the nearest block edge.

Even though the Crystar Reader and the BSD tar utility can handle entries with arbitrary offsets and lengths, the GNU tar utility can only handle offsets and lengths that are multiples of blockSize.

[View source]

def block_padding(offset : Int) #

blockPadding computes the number of bytes needed to pad offset up to the nearest block edge where 0 <= n < blockSize.

[View source]

def byte_index(s : String, c : Char) #

[View source]

def byte_index(bytes : Bytes, b : Int) #

[View source]

def file_info_header(fi : File, link : String) #

file_info_header creates a partially-populated Header from fi. If fi describes a symlink, this records link as the link target. If fi describes a directory, a slash is appended to the name.

[View source]

def fits_in_base256(n : Int32, x : Int64) #

[View source]

def fits_in_octal(n : Int32, x : Int64) #

fits_in_octal reports whether the integer x fits in a field n-bytes long using octal encoding with the appropriate NUL terminator.

[View source]

def format_pax_record(k : String, v : String) #

format_pax_record formats a single PAX record, prefixing it with the appropriate length

[View source]

def format_pax_time(ts : Time) #

format_pax_time converts ts into a time of the form %d.%d as described in the PAX specification. This function is capable of negative timestamps.

[View source]

def has_nul(s : String) #

checks whether NUL character exists within s

[View source]

def header_only_type?(flag) #

[View source]

def invert_sparse_entries(src : Array(SparseEntry), size : Int64) #

invert_sparse_entries converts a sparse map from one form to the other. If the input is sparseHoles, then it will output sparseDatas and vice-versa. The input must have been already validated.

This function mutates src and returns a normalized map where:

adjacent fragments are coalesced together
only the last fragment may be empty
the endOffset of the last fragment is the total size

[View source]

def ltrim(b : Bytes, s : String) #

[View source]

def merge_pax(hdr : Header, pax_hdrs : Hash(String, String)) #

merg_pax merges paxHdrs into hdr for all relevant fields of Header.

[View source]

def parse_pax_record(s : String) : ::Tuple(String, String, String) #

parse_pax_record parses the input PAX record string into a key-value pair. If parsing is successful, it will slice off the currently read record and return the remainder as r.

[View source]

def parse_pax_time(s : String) #

parse_pax_time takes a string of the form %d.%d as described in the PAX specification. Note that this implementation allows for negative timestamps, which is allowed for by the PAX specification, but not always portable.

[View source]

def rtrim(b : Bytes, s : String) #

[View source]

def split_ustar_path(name : String) #

[View source]

def to_ascii(s : String) #

to_ascii converts the input to an ASCII C-style string. This a best effort conversion, so invalid characters are dropped.

[View source]

def trim_bytes(b : Bytes, s : String) #

[View source]

def unix_time(sec : Int, nsec : Int) #

[View source]

def unix_time(sec, nsec) #

[View source]

def valid_pax_record(k : String, v : String) #

valid_pax_record reports whether the key-value pair is valid where each record is formatted as: "%d %s=%s\n" % (size, key, value)

Keys and values should be UTF-8, but the number of bad writers out there forces us to be a more liberal. Thus, we only reject all keys with NUL, and only reject NULs in values for the PAX version of the USTAR string fields. The key must not contain an '=' character.

[View source]

def validate_sparse_entries(sp : Array(SparseEntry), size : Int64) #

[View source]