module Crystar

Overview

Crystar module contains readers and writers for tar archive. Tape archives (tar) are a file format for storing a sequence of files that can be read and written in a streaming manner. This module aims to cover most variations of the format, including those produced by GNU and BSD tar tools.

Example

files = [
  {"readme.txt", "This archive contains some text files."},
  {"minerals.txt", "Mineral names:\nalunite\nchromium\nvlasovite"},
  {"todo.txt", "Get crystal mining license."},
]
buf = IO::Memory.new
Crystar::Writer.open(buf) do |tw|
  files.each_with_index do |f, UNDERSCORE|
    hdr = Header.new(
      name: f[0],
      mode: 0o600_i64,
      size: f[1].size.to_i64
    )
    tw.write_header(hdr)
    tw.write(f[1].to_slice)
  end
end

# Open and iterate through the files in the archive
buf.pos = 0
Crystar::Reader.open(buf) do |tar|
  tar.each_entry do |entry|
    p "Contents of #{entry.name}"
    IO.copy entry.io, STDOUT
    p "\n"
  end
end

Extended Modules

Defined in:

tar/format.cr
tar/helper.cr
tar/header.cr
tar/reader.cr
tar/writer.cr
crystar.cr

Constant Summary

BASIC_KEYS = {PAX_PATH => true, PAX_LINK_PATH => true, PAX_SIZE => true, PAX_UID => true, PAX_GID => true, PAX_UNAME => true, PAX_GNAME => true, PAX_MTIME => true, PAX_ATIME => true, PAX_CTIME => true}
BLOCK = '4'

Character device node

BLOCK_SIZE = 512
CHAR = '3'

Symbolic link

CONT = '7'
DIR = '5'

Block device node

FIFO = '6'

Directory

GNU_LONGLINK = 'K'
GNU_LONGNAME = 'L'

'L' and 'K' are used by teh GNU format for a meta file used to store the path or link name for the next file.

GNU_SPARSE = 'S'

Used by PAX format to key-value records that are relevant to all subsequent files.

ISBLK = 24576

Symbolic link

ISCHR = 8192

Block special file

ISDIR = 16384

Save text (sticky bit) Common Unix mode constants; these are not defined in any common tar standard. Header.FileInfo understands these, but FileInfoHeader will never produce these. Common Unix mode constants; these are not defined in any common tar standard. Header.FileInfo understands these, but FileInfoHeader will never produce these.

ISFIFO = 4096

Directory

ISGID = 1024

Set uid

ISLINK = 40960

Regular file

ISREG = 32768

FIFO

ISSOCK = 49152

Character special file

ISUID = 2048

Mode constants from USTAR spec: See http://pubs.opengroup.org/onlinepubs/9699919799/utilities/pax.html#tag_20_92_13_06

ISVTX = 512

Set gid

LINK = '1'

'1' to '6' are header-only flags and may not have a data body.

MAGIC_GNU = "ustar "

Magics used to identify various formats.

MAGIC_USTAR = "ustar\u0000"
MAX_NANO_SECOND_DIGITS = 9
NAME_SIZE = 100

Size of each block in a tar stream

PADDING = 3
PAX_ATIME = "atime"
PAX_CHARSET = "charset"

Removed from later revision of PAX spec, but was valid

PAX_COMMENT = "comment"

Currently unused

PAX_CTIME = "ctime"
PAX_GID = "gid"
PAX_GNAME = "gname"
PAX_GNU_SPARSE = "GNU.sparse."

Keywords for GNU sparse files in a PAX extended header.

PAX_GNU_SPARSE_MAJOR = "GNU.sparse.major"
PAX_GNU_SPARSE_MAP = "GNU.sparse.map"
PAX_GNU_SPARSE_MINOR = "GNU.sparse.minor"
PAX_GNU_SPARSE_NAME = "GNU.sparse.name"
PAX_GNU_SPARSE_NUMBLOCKS = "GNU.sparse.numblocks"
PAX_GNU_SPARSE_NUMBYTES = "GNU.sparse.numbytes"
PAX_GNU_SPARSE_OFFSET = "GNU.sparse.offset"
PAX_GNU_SPARSE_REALSIZE = "GNU.sparse.realsize"
PAX_GNU_SPARSE_SIZE = "GNU.sparse.size"
PAX_LINK_PATH = "linkpath"
PAX_MTIME = "mtime"
PAX_NONE = ""

Keywords for PAX extended header records

PAX_PATH = "path"

indicates that no PAX key is suitable

PAX_SCHILY_XATTR = "SCHILY.xattr."
PAX_SIZE = "size"
PAX_UID = "uid"
PAX_UNAME = "uname"
PREFIX_SIZE = 155

Max length of the name in USTAR format

REG = '0'

Type flags for Header#flag

REGA = '\u{0}'
SYMLINK = '2'

Hard link

TRAILER_STAR = "tar\u0000"
VERSION = "0.1.0"
VERSION_GNU = " \u0000"
VERSION_USTAR = "00"
XGLOBAL_HEADER = 'g'

Used by PAX format to store key-value records that are only relevant to the next file.

XHEADER = 'x'

reserved

Class Method Summary

Instance Method Summary

Class Method Detail

def self.parse_pax(r : IO) : Hash(String, String) #

parse_pax parses PAX headers If an extended header (type 'x') is invalid, exception is raised


[View source]
def self.read_gnu_sparse_map0x1(pax_hdrs : Hash(String, String)) #

read_gnu_sparse_map0x1 reads the sparse map as stored in GNU's PAX sparse format version 0.1. The sparse map is stored in the PAX headers.


[View source]
def self.read_gnu_sparse_map1x0(r : IO) #

read_gnu_sparse_map1x0 reads the sparse map as stored in GNU's PAX sparse format version 1.0. The format of the sparse map consists of a series of newline-terminated numeric fields. The first field is the number of entries and is always present. Following this are the entries, consisting of two fields (offset, length). This function must stop reading at the end boundary of the block containing the last newline.

Note that the GNU manual says that numeric values should be encoded in octal format. However, the GNU tar utility itself outputs these values in decimal. As such, this library treats values as being encoded in decimal.


[View source]
def self.try_read_full(r : IO, b : Bytes) #

is like read_fully except it returns EOF when it is hit before b.size bytes are read


[View source]

Instance Method Detail

def align_sparse_entries(src : Array(SparseEntry), size : Int64) #

align_sparse_entries mutates src and returns dst where each fragment's starting offset is aligned up to the nearest block edge, and each ending offset is aligned down to the nearest block edge.

Even though the Crystar Reader and the BSD tar utility can handle entries with arbitrary offsets and lengths, the GNU tar utility can only handle offsets and lengths that are multiples of blockSize.


[View source]
def block_padding(offset : Int) #

blockPadding computes the number of bytes needed to pad offset up to the nearest block edge where 0 <= n < blockSize.


[View source]
def byte_index(s : String, c : Char) #

[View source]
def byte_index(bytes : Bytes, b : Int) #

[View source]
def file_info_header(fi : File, link : String) #

file_info_header creates a partially-populated Header from fi. If fi describes a symlink, this records link as the link target. If fi describes a directory, a slash is appended to the name.


[View source]
def fits_in_base256(n : Int32, x : Int64) #

[View source]
def fits_in_octal(n : Int32, x : Int64) #

fits_in_octal reports whether the integer x fits in a field n-bytes long using octal encoding with the appropriate NUL terminator.


[View source]
def format_pax_record(k : String, v : String) #

format_pax_record formats a single PAX record, prefixing it with the appropriate length


[View source]
def format_pax_time(ts : Time) #

format_pax_time converts ts into a time of the form %d.%d as described in the PAX specification. This function is capable of negative timestamps.


[View source]
def has_nul(s : String) #

checks whether NUL character exists within s


[View source]
def header_only_type?(flag) #

[View source]
def invert_sparse_entries(src : Array(SparseEntry), size : Int64) #

invert_sparse_entries converts a sparse map from one form to the other. If the input is sparseHoles, then it will output sparseDatas and vice-versa. The input must have been already validated.

This function mutates src and returns a normalized map where:

  • adjacent fragments are coalesced together
  • only the last fragment may be empty
  • the endOffset of the last fragment is the total size

[View source]
def ltrim(b : Bytes, s : String) #

[View source]
def merge_pax(hdr : Header, pax_hdrs : Hash(String, String)) #

merg_pax merges paxHdrs into hdr for all relevant fields of Header.


[View source]
def parse_pax_record(s : String) : ::Tuple(String, String, String) #

parse_pax_record parses the input PAX record string into a key-value pair. If parsing is successful, it will slice off the currently read record and return the remainder as r.


[View source]
def parse_pax_time(s : String) #

parse_pax_time takes a string of the form %d.%d as described in the PAX specification. Note that this implementation allows for negative timestamps, which is allowed for by the PAX specification, but not always portable.


[View source]
def rtrim(b : Bytes, s : String) #

[View source]
def split_ustar_path(name : String) #

[View source]
def to_ascii(s : String) #

to_ascii converts the input to an ASCII C-style string. This a best effort conversion, so invalid characters are dropped.


[View source]
def trim_bytes(b : Bytes, s : String) #

[View source]
def unix_time(sec : Int, nsec : Int) #

[View source]
def unix_time(sec, nsec) #

[View source]
def valid_pax_record(k : String, v : String) #

valid_pax_record reports whether the key-value pair is valid where each record is formatted as: "%d %s=%s\n" % (size, key, value)

Keys and values should be UTF-8, but the number of bad writers out there forces us to be a more liberal. Thus, we only reject all keys with NUL, and only reject NULs in values for the PAX version of the USTAR string fields. The key must not contain an '=' character.


[View source]
def validate_sparse_entries(sp : Array(SparseEntry), size : Int64) #

[View source]