API Reference

Collection

Advanced TFS files reading and writing functionality.

class tfs.collection.Tfs(*args, **kwargs)[source]

Class to mark attributes as TFS attributes.

Any parameter given to this class will be passed to the _get_filename() method, together with the plane if two_planes=False is not present.

class tfs.collection.TfsCollection(directory: Path, allow_write: bool | None = None)[source]

Abstract class to lazily load and write TFS files.

Classes inheriting from this abstract class will be able to define TFS files as readable or writable, and read or write them just as attribute access or assignments. All attributes will be read and written as TfsDataFrame objects.

Example

If ./example is a directory that contains two TFS files beta_phase_x.tfs and beta_phase_y.tfs with BETX and BETY columns respectively:

>>> # All TFS attributes must be marked with the Tfs(...) class,
... # and generated attribute names will be appended with _x / _y
... # depending on files found in "./example"
... class ExampleCollection(TfsCollection):
...     beta = Tfs("beta_phase_{}.tfs")  # A TFS attribute
...     other_value = 7  # A traditional attribute.

...     def get_filename(template: str, plane: str) -> str:
...         return template.format(plane)

>>> example = ExampleCollection("./example")

>>> # Get the BETX / BETY column from "beta_phase_x.tfs":
>>> beta_x_column = example.beta_x.BETX  # / example.beta_x.BETY

>>> # Get the BETY column from "beta_phase_y.tfs":
>>> beta_y_column = example.beta_y.BETY

>>> # The planes can also be accessed as items (both examples below work):
>>> beta_y_column = example.beta["y"].BETY
>>> beta_y_column = example.beta["Y"].BETY

>>> # This will write an empty DataFrame to "beta_phase_y.tfs":
>>> example.allow_write = True
>>> example.beta["y"] = DataFrame()

If the file to be loaded is not defined for two planes then the attribute can be declared and accessed as:

>>> coupling = Tfs("getcouple.tfs", two_planes=False)  # declaration
>>> f1001w_column = example.coupling.F1001W  # access

No file will be loaded until the corresponding attribute is accessed and the loaded TfsDataFrame will be buffered, thus the user should expect an IOError if the requested file is not in the provided directory (only the first time, but is better to always take it into account!).

When a TfsDataFrame is assigned to one attribute, it will be set as the buffer value. If the self.allow_write attribute is set to True, an assignment on one of the attributes will trigger the corresponding file write.

clear()[source]

Clear the file buffer.

Any subsequent attribute access will try to load the corresponding file again.

flush()[source]

Write the current state of the TFSDataFrames into their respective files.

get_filename(name: str) str[source]

Return the actual filename of the property name.

Parameters:

name (str) -- Property name of the file.

Returns:

A str of the actual name of the file in directory. The path to the file is then self.directory / filename.

get_path(name: str) Path[source]

Return the actual file path of the property name (convenience function).

Parameters:

name (str) -- Property name of the file.

Returns:

A pathlib.Path of the actual name of the file in directory. The path to the file is then self.directory / filename.

read_tfs(filename: str) TfsDataFrame[source]

Reads the TFS file from self.directory with the given filename.

This function can be overwritten to use something instead of tfs-pandas to load the files. It does not set the TfsDataframe into the buffer (that is the job of _load_tfs)!

Parameters:

filename (str) -- The name of the file to load.

Returns:

A TfsDataFrame built from reading the requested file.

write_tfs(filename: str, data_frame: DataFrame)[source]

Write the TFS file to self.directory with the given filename.

This function can be overwritten to use something instead of tfs-pandas to write out the files. It does not check for allow_write and does not set the Dataframe into the buffer (that is the job of _write_tfs)!

Parameters:
  • filename (str) -- The name of the file to load.

  • data_frame (TfsDataFrame) -- TfsDataframe to write

Constants

General constants used throughout tfs-pandas, relating to the standard of TFS files.

Errors

Errors that can be raised during the handling of TFS files.

exception tfs.errors.TfsFormatError[source]

Raised when a wrong format is detected in the TFS file.

Frame

Contains the class definition of a TfsDataFrame, inherited from the pandas DataFrame, as well as a utility function to validate the correctness of a TfsDataFrame.

class tfs.frame.TfsDataFrame(*args, **kwargs)[source]

Class to hold the information of the built an extended pandas DataFrame, together with a way of getting the headers of the TFS file. The file headers are stored in a dictionary upon read. To get a header value use data_frame.headers["header_name"], or data_frame["header_name"] if it does not conflict with a column name in the dataframe.

merge(right: TfsDataFrame | DataFrame, how_headers: str | None = None, new_headers: dict | None = None, **kwargs) TfsDataFrame[source]

Merge TfsDataFrame objects with a database-style join. Data manipulation is done by the pandas.Dataframe method of the same name. Resulting headers are either merged according to the provided how_headers method or as given via new_headers.

Parameters:
  • right (TfsDataFrame | pd.DataFrame) -- The TfsDataFrame to merge with the caller.

  • how_headers (str) -- Type of merge to be performed for the headers. Either left or right. Refer to tfs.frame.merge_headers() for behavior. If None is provided and new_headers is not provided, the final headers will be empty. Case insensitive, defaults to None.

  • new_headers (dict) -- If provided, will be used as headers for the merged TfsDataFrame. Otherwise these are determined by merging the headers from the caller and the other TfsDataFrame according to the method defined by the how_headers argument.

Keyword Arguments:
  • these (Any keyword argument is given to pandas.DataFrame.merge(). The default values for all)

  • pandas (parameters are left as set in the pandas codebase. To see these, refer to the)

  • documentation](https ([DataFrame.merge) -- //pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html).

Returns:

A new TfsDataFrame with the merged data and merged headers.

tfs.frame.concat(objs: Sequence[TfsDataFrame | pd.DataFrame], how_headers: str | None = None, new_headers: dict | None = None, **kwargs) TfsDataFrame[source]

Concatenate TfsDataFrame objects along a particular axis with optional set logic along the other axes. Data manipulation is done by the pandas.concat function. Resulting headers are either merged according to the provided how_headers method or as given via new_headers.

Warning

Please note that when using this function on many TfsDataFrames, leaving the contents of the final headers dictionary to the automatic merger can become unpredictable. In this case it is recommended to provide the new_headers argument to ensure the final result, or leave both how_headers and new_headers as None (their defaults) to end up with empty headers.

Parameters:
  • objs (Sequence[TfsDataFrame | pd.DataFrame]) -- the TfsDataFrame objects to be concatenated.

  • how_headers (str) -- Type of merge to be performed for the headers. Either left or right. Refer to tfs.frame.merge_headers() for behavior. If None is provided and new_headers is not provided, the final headers will be empty. Case insensitive, defaults to None.

  • new_headers (dict) -- If provided, will be used as headers for the merged TfsDataFrame. Otherwise these are determined by successively merging the headers from all concatenated TfsDataFrames according to the method defined by the how_headers argument.

Keyword Arguments:
  • parameters (Any keyword argument is given to pandas.concat(). The default values for all these)

  • [pandas.concat (are left as set in the pandas codebase. To see these, refer to the)

  • documentation](https -- //pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html).

Returns:

A new TfsDataFrame with the merged data and merged headers.

tfs.frame.merge_headers(headers_left: dict, headers_right: dict, how: str) dict[source]

Merge headers of two TfsDataFrames together.

Parameters:
  • headers_left (dict) -- Headers of caller (left) TfsDataFrame when calling .append, .join or .merge. Headers of the left (preceeding) TfsDataFrame when calling tfs.frame.concat.

  • headers_right (dict) -- Headers of other (right) TfsDataFrame when calling .append, .join or .merge. Headers of the left (preceeding) TfsDataFrame when calling tfs.frame.concat.

  • how (str) -- Type of merge to be performed, either left or right. If left, prioritize keys from headers_left in case of duplicate keys. If right, prioritize keys from headers_right in case of duplicate keys. Case-insensitive. If None is given, an empty dictionary will be returned.

Returns:

A new dictionary as the merge of the two provided dictionaries.

tfs.frame.validate(data_frame: TfsDataFrame | DataFrame, info_str: str = '', non_unique_behavior: str = 'warn') None[source]

Check if a data frame contains finite values only, strings as column names and no empty headers or column names.

Methodology

This function performs several different checks on the provided dataframe:
  1. Checking no single element is a list or tuple, which is done with a custom vectorized function applied column-by-column on the dataframe.

  2. Checking for non-physical values in the dataframe, which is done by applying the isna function with the right option context.

  3. Checking for duplicates in either indices or columns.

  4. Checking for column names that are not strings.

  5. Checking for column names including spaces.

Parameters:
  • data_frame (TfsDataFrame | pd.DataFrame) -- the dataframe to check on.

  • info_str (str) -- additional information to include in logging statements.

  • non_unique_behavior (str) -- behavior to adopt if non-unique indices or columns are found in the dataframe. Accepts warn and raise as values, case-insensitively, which dictates to respectively issue a warning or raise an error if non-unique elements are found.

HDF5 I/O

Additional tools for reading and writing TfsDataFrames into hdf5 files.

tfs.hdf.read_hdf(path: Path | str) TfsDataFrame[source]

Read TfsDataFrame from hdf5 file. The DataFrame needs to be stored in a group named data, while the headers are stored in headers.

Parameters:

path (Path, str) -- Path of the file to read.

Returns:

A TfsDataFrame object with the loaded data from the file.

tfs.hdf.write_hdf(path: Path | str, df: TfsDataFrame, **kwargs) None[source]

Write TfsDataFrame to hdf5 file. The dataframe will be written into the group data, the headers into the group headers. Only one dataframe per file is allowed.

Parameters:
  • path (Path, str) -- Path of the output file.

  • df (TfsDataFrame) -- TfsDataFrame to write.

  • kwargs -- kwargs to be passed to pandas DataFrame.to_hdf(). key is not allowed and mode needs to be w if the output file already exists (w will be used in any case, even if the file does not exist, but only a warning is logged in that case).

Reader

Reading functionalty for TFS files.

tfs.reader.read_headers(tfs_file_path: Path | str) dict[source]

Parses the top of the tfs_file_path and returns the headers.

Parameters:

tfs_file_path (pathlib.Path | str) -- Path object to the TFS file to read. Can be a string, in which case it will be cast to a Path object.

Returns:

An dictionary with the headers read from the file.

Examples

>>> headers = read_headers("filename.tfs")

Just as with the read_tfs function, it is possible to load from compressed files if the compression format is supported by pandas. The compression format detection is handled automatically from the extension of the provided tfs_file_path suffix. For instance:

>>> headers = read_headers("filename.tfs.gz")
tfs.reader.read_tfs(tfs_file_path: Path | str, index: str | None = None, non_unique_behavior: str = 'warn', validate: bool = True) TfsDataFrame[source]

Parses the TFS table present in tfs_file_path and returns a TfsDataFrame.

Note

Loading and reading compressed files is possible. Any compression format supported by pandas is accepted, which includes: .gz, .bz2, .zip, .xz, .zst, .tar, .tar.gz, .tar.xz or .tar.bz2. See below for examples.

Warning

Through the validate argument, one can skip dataframe validation after loading it from a file. While this can speed-up the execution time of this function, it is not recommended and is not the default behavior of this function. The option, however, is left for the user to use at their own risk should they wish to avoid lengthy validation of large TfsDataFrames (such as for instance a sliced FCC lattice).

Methodology

This function first calls a helper which parses and returns all metadata from the file (headers content, column names & types, number of lines parsed). The rest of the file (dataframe part) is given to parse to pandas.read_csv with the right options to make use of its C engine’s speed. After this, conversion to TfsDataDrame is made, proper types are applied to columns, the index is set and the frame is eventually validated before being returned.

Parameters:
  • tfs_file_path (pathlib.Path | str) -- Path object to the TFS file to read. Can be a string, in which case it will be cast to a Path object.

  • index (str) -- Name of the column to set as index. If not given, looks in tfs_file_path for a column starting with INDEX&&&.

  • non_unique_behavior (str) -- behavior to adopt if non-unique indices or columns are found in the dataframe. Accepts warn and raise as values, case-insensitively, which dictates to respectively issue a warning or raise an error if non-unique elements are found.

  • validate (bool) -- Whether to validate the dataframe after reading it. Defaults to False.

Returns:

A TfsDataFrame object with the loaded data from the file.

Examples

Reading from a file is simple, as most arguments have sane default values. The simplest usage goes as follows:

>>> tfs.read("filename.tfs")

One can also pass a Path object to the function:

>>> tfs.read(pathlib.Path("filename.tfs"))

If one wants to set a specific column as index, this is done as:

>>> tfs.read("filename.tfs", index="COLUMN_NAME")

If one wants to, for instance, raise and error on non-unique indices or columns, one can do so as:

>>> tfs.read("filename.tfs", non_unique_behavior="raise")

One can choose to skip dataframe validation at one’s own risk after reading from file. This is done as:

>>> tfs.read("filename.tfs", validate=False)

It is possible to load compressed files if the compression format is supported by pandas. (see above). The compression format detection is handled automatically from the extension of the provided tfs_file_path suffix. For instance:

>>> tfs.read("filename.tfs.gz")
>>> tfs.read("filename.tfs.bz2")
>>> tfs.read("filename.tfs.zip")

Testing

Testing functionalty for TfsDataFrames.

tfs.testing.assert_tfs_frame_equal(df1: TfsDataFrame, df2: TfsDataFrame, compare_keys: bool = True, **kwargs)[source]

Compare two TfsDataFrame objects, with df1 being the reference that df2 is compared to. This is mostly intended for unit tests. Comparison is done on both the contents of the headers dictionaries (with pandas’s assert_dict_equal) as well as the data itself (with pandas’s assert_frame_equal).

Note

The compare_keys argument is inherited from pandas’s assert_dict_equal function and is quite unintuitive. It means to check that both dictionaries have the exact same set of keys.

Whether this is given as True or False, the values are compared anyway for all keys in the first (reference) dict. In the case of this helper function, all keys present in df1’s headers will be checked for in df2’s headers and their corresponding values compared. If given as True, then both headers should be the exact same dictionary.

Parameters:
  • df1 (TfsDataFrame) -- The first TfsDataFrame to compare.

  • df2 (TfsDataFrame) -- The second TfsDataFrame to compare.

  • compare_keys (bool) -- If True, checks that both headers have the exact same set of keys. See the above note for exact meaning and caveat. Defaults to True.

  • **kwargs -- Additional keyword arguments are transmitted to

  • of (pandas.testing.assert_frame_equal for the comparison)

  • themselves. (the dataframe parts)

Example

reference_df = tfs.read("path/to/file.tfs")
new_df = some_function(*args, **kwargs)
assert_tfs_frame_equal(reference_df, new_df)

Tools

Additional functions to modify TFS files.

tfs.tools.remove_header_comments_from_files(list_of_files: list[str | Path]) None[source]

Check the files in the provided list for invalid headers (no type defined) and removes those inplace when found.

Parameters:

list_of_files (list[str | Path]) -- list of Paths to TFS files meant to be checked. The entries of the list can be strings or Path objects.

tfs.tools.remove_nan_from_files(list_of_files: list[str | Path], replace: bool = False) None[source]

Remove NaN entries from files in list_of_files.

Parameters:
  • list_of_files (list[str | Path]) -- list of Paths to TFS files meant to be sanitized. The elements of the list can be strings or Path objects.

  • replace (bool) -- if True, the provided files will be overwritten. Otherwise new files with dropna appended to the original filenames will be written to disk. Defaults to False.

tfs.tools.significant_digits(value: float, error: float, return_floats: bool = False) tuple[str, str] | tuple[float, float][source]

Computes value and its error properly rounded with respect to the size of error.

Parameters:
  • value (float) -- a number.

  • error (float) -- the error on the number.

  • return_floats (bool) -- if True, returns significant digits as floats. Otherwise as strings. Defaults to False.

Returns:

A tuple of the rounded value and error with regards to the size of the error.

Writer

Writing functionalty for TFS files.

tfs.writer.write_tfs(tfs_file_path: Path | str, data_frame: TfsDataFrame | DataFrame, headers_dict: dict | None = None, save_index: str | bool = False, colwidth: int = 20, headerswidth: int = 20, non_unique_behavior: str = 'warn', validate: bool = True) None[source]

Writes the provided DataFrame to disk at tfs_file_path, eventually with the headers_dict as headers dictionary.

Note

Compression of the output file is possible, by simply providing a valid compression extension as the tfs_file_path suffix. Any compression format supported by pandas is accepted, which includes: .gz, .bz2, .zip, .xz, .zst, .tar, .tar.gz, .tar.xz or .tar.bz2. See below for examples.

Warning

Through the validate argument, one can skip dataframe validation before writing it to file. While this can speed-up the execution time of this function , it is not recommended and is not the default behavior of this function. The option, however, is left for the user to use at their own risk should they wish to avoid lengthy validation of large TfsDataFrames (such as for instance a sliced FCC lattice).

Parameters:
  • tfs_file_path (pathlib.Path | str) -- Path object to the output TFS file. Can be a string, in which case it will be cast to a Path object.

  • data_frame (TfsDataFrame | pd.DataFrame) -- TfsDataFrame or pandas.DataFrame to write to file.

  • headers_dict (dict) -- Headers for the data_frame. If not provided, assumes a TfsDataFrame was given and tries to use data_frame.headers.

  • save_index (str | bool) -- bool or string. Default to False. If True, saves the index of data_frame to a column identifiable by INDEX&&&. If given as string, saves the index of data_frame to a column named by the provided value.

  • colwidth (int) -- Column width, can not be smaller than MIN_COLUMN_WIDTH.

  • headerswidth (int) -- Used to format the header width for both keys and values.

  • non_unique_behavior (str) -- behavior to adopt if non-unique indices or columns are found in the dataframe. Accepts warn and raise as values, case-insensitively, which dictates to respectively issue a warning or raise an error if non-unique elements are found.

  • validate (bool) -- Whether to validate the dataframe before writing it to file. Defaults to True.

Examples

Writing to file is simple, as most arguments have sane default values. The simplest usage goes as follows:

>>> tfs.write("filename.tfs", dataframe)

If one wants to, for instance, raise and error on non-unique indices or columns, one can do so as:

>>> tfs.write("filename.tfs", dataframe, non_unique_behavior="raise")

One can choose to skip dataframe validation at one’s own risk before writing it to file. This is done as:

>>> tfs.write("filename.tfs", dataframe, validate=False)

It is possible to directly have the output file be compressed, by specifying a valid compression extension as the tfs_file_path suffix. The detection and compression is handled automatically. For instance:

>>> tfs.write("filename.tfs.gz", dataframe)