2 Minutes to tfs-pandas
Yes, 2 minutes. That’s how little it takes!
Hint
You can click the function names in the code examples below to go directly to their documentation.
Basic Usage
The package is imported as tfs
, and exports top-level functions for reading and writing:
import tfs
# Loading a TFS file is simple
df = tfs.read("path_to_input.tfs", index="index_column")
# Writing out to disk is simple too
tfs.write("path_to_output.tfs", df, save_index="index_column")
Once loaded, you get your data in a TfsDataFrame
, which is a pandas.DataFrame
with a dict
of headers attached to it.
You can access and manipulate all data as you would with a DataFrame
:
# Access and modify the headers with the .headers attribute
useful_variable = data_frame.headers["SOME_KEY"]
data_frame.headers["NEW_KEY"] = some_variable
# Manipulate data as you do with pandas DataFrames
data_frame["NEWCOL"] = data_frame.COLUMN_A * data_frame.COLUMN_B
# You can check the TfsDataFrame validity, and choose the behavior in case of errors
tfs.frame.validate(data_frame, non_unique_behavior="raise") # or choose "warn"
Compression
A TFS file being text-based, it benefits heavily from compression.
Thankfully, tfs-pandas
supports automatic reading and writing of various compression formats.
Just use the API as you would normally, and the compression will be handled automatically:
# Compression format is inferred from the file extension
df = tfs.read("filename.tfs.gz", index="index_column")
# Same thing when writing to disk
tfs.write("path_to_output.tfs.zip", df)
A special module is provided to interface to the HDF5
format.
First though, one needs to install the package with the hdf5
extra requirements:
python -m pip install --upgrade tfs-pandas[hdf5]
Then, access the functionality from tfs.hdf
.
from tfs.hdf import read_hdf, write_hdf
# Read a TfsDataFrame from an HDF5 file
df = tfs.hdf.read("path_to_input.hdf5", key="key_in_hdf5_file")
# Write a TfsDataFrame to an HDF5 file
tfs.hdf.write("path_to_output.hdf5", df, key="key_in_hdf5_file")
Compatibility
Finally, some replacement functions are provided for some pandas
operations which, if used, would return a pandas.DataFrame
instead of a TfsDataFrame
.
df1 = tfs.read("file1.tfs")
df2 = tfs.read("file2.tfs")
# This returns a pandas.DataFrame and makes you lose the headers
result = pd.concat([df1, df2])
# Instead, use our own
result = tfs.frame.concat([df1, df2]) # you can choose how to merge headers too
assert isinstance(result, tfs.TfsDataFrame) # that's ok!
That’s it! Happy using :)