2 Minutes to tfs-pandas

Yes, 2 minutes. That’s how little it takes!

Hint

You can click the function names in the code examples below to go directly to their documentation.

Basic Usage

The package is imported as tfs, and exports top-level functions for reading and writing:

import tfs

# Loading a TFS file is simple
df = tfs.read("path_to_input.tfs", index="index_column")

# Writing out to disk is simple too
tfs.write("path_to_output.tfs", df, save_index="index_column")

Once loaded, you get your data in a TfsDataFrame, which is a pandas.DataFrame with a dict of headers attached to it. You can access and manipulate all data as you would with a DataFrame:

# Access and modify the headers with the .headers attribute
useful_variable = data_frame.headers["SOME_KEY"]
data_frame.headers["NEW_KEY"] = some_variable

# Manipulate data as you do with pandas DataFrames
data_frame["NEWCOL"] = data_frame.COLUMN_A * data_frame.COLUMN_B

# You can check the TfsDataFrame validity, and choose the behavior in case of errors
tfs.frame.validate(data_frame, non_unique_behavior="raise")  # or choose "warn"

Compression

A TFS file being text-based, it benefits heavily from compression. Thankfully, tfs-pandas supports automatic reading and writing of various compression formats. Just use the API as you would normally, and the compression will be handled automatically:

# Compression format is inferred from the file extension
df = tfs.read("filename.tfs.gz", index="index_column")

# Same thing when writing to disk
tfs.write("path_to_output.tfs.zip", df)

A special module is provided to interface to the HDF5 format. First though, one needs to install the package with the hdf5 extra requirements:

python -m pip install --upgrade tfs-pandas[hdf5]

Then, access the functionality from tfs.hdf.

from tfs.hdf import read_hdf, write_hdf

# Read a TfsDataFrame from an HDF5 file
df = tfs.hdf.read("path_to_input.hdf5", key="key_in_hdf5_file")

# Write a TfsDataFrame to an HDF5 file
tfs.hdf.write("path_to_output.hdf5", df, key="key_in_hdf5_file")

Compatibility

Finally, some replacement functions are provided for some pandas operations which, if used, would return a pandas.DataFrame instead of a TfsDataFrame.

df1 = tfs.read("file1.tfs")
df2 = tfs.read("file2.tfs")

# This returns a pandas.DataFrame and makes you lose the headers
result = pd.concat([df1, df2])

# Instead, use our own
result = tfs.frame.concat([df1, df2])  # you can choose how to merge headers too
assert isinstance(result, tfs.TfsDataFrame)  # that's ok!

That’s it! Happy using :)