API Reference
Collection
Advanced TFS files reading and writing functionality.
- class tfs.collection.Tfs(*args, **kwargs)[source]
Class to mark attributes as TFS attributes.
Any parameter given to this class will be passed to the
_get_filename()
method, together with the plane iftwo_planes=False
is not present.
- class tfs.collection.TfsCollection(directory: Path, allow_write: bool | None = None)[source]
Abstract class to lazily load and write TFS files.
Classes inheriting from this abstract class will be able to define TFS files as readable or writable, and read or write them just as attribute access or assignments. All attributes will be read and written as
TfsDataFrame
objects.Example
If ./example is a directory that contains two TFS files beta_phase_x.tfs and beta_phase_y.tfs with
BETX
andBETY
columns respectively:>>> # All TFS attributes must be marked with the Tfs(...) class, ... # and generated attribute names will be appended with _x / _y ... # depending on files found in "./example" ... class ExampleCollection(TfsCollection): ... beta = Tfs("beta_phase_{}.tfs") # A TFS attribute ... other_value = 7 # A traditional attribute. ... def get_filename(template: str, plane: str) -> str: ... return template.format(plane) >>> example = ExampleCollection("./example") >>> # Get the BETX / BETY column from "beta_phase_x.tfs": >>> beta_x_column = example.beta_x.BETX # / example.beta_x.BETY >>> # Get the BETY column from "beta_phase_y.tfs": >>> beta_y_column = example.beta_y.BETY >>> # The planes can also be accessed as items (both examples below work): >>> beta_y_column = example.beta["y"].BETY >>> beta_y_column = example.beta["Y"].BETY >>> # This will write an empty DataFrame to "beta_phase_y.tfs": >>> example.allow_write = True >>> example.beta["y"] = DataFrame()
If the file to be loaded is not defined for two planes then the attribute can be declared and accessed as:
>>> coupling = Tfs("getcouple.tfs", two_planes=False) # declaration >>> f1001w_column = example.coupling.F1001W # access
No file will be loaded until the corresponding attribute is accessed and the loaded
TfsDataFrame
will be buffered, thus the user should expect anIOError
if the requested file is not in the provided directory (only the first time, but is better to always take it into account!).When a
TfsDataFrame
is assigned to one attribute, it will be set as the buffer value. If theself.allow_write
attribute is set toTrue
, an assignment on one of the attributes will trigger the corresponding file write.- clear()[source]
Clear the file buffer.
Any subsequent attribute access will try to load the corresponding file again.
- flush()[source]
Write the current state of the TFSDataFrames into their respective files.
- get_path(name: str) Path [source]
Return the actual file path of the property
name
(convenience function).- Parameters:
name (
str
) -- Property name of the file.- Returns:
A
pathlib.Path
of the actual name of the file indirectory
. The path to the file is thenself.directory / filename
.
- read_tfs(filename: str) TfsDataFrame [source]
Reads the TFS file from
self.directory
with the given filename.This function can be overwritten to use something instead of
tfs-pandas
to load the files. It does not set the TfsDataframe into the buffer (that is the job of_load_tfs
)!- Parameters:
filename (
str
) -- The name of the file to load.- Returns:
A
TfsDataFrame
built from reading the requested file.
- write_tfs(filename: str, data_frame: DataFrame)[source]
Write the TFS file to
self.directory
with the given filename.This function can be overwritten to use something instead of
tfs-pandas
to write out the files. It does not check forallow_write
and does not set the Dataframe into the buffer (that is the job of_write_tfs
)!- Parameters:
filename (
str
) -- The name of the file to load.data_frame (
TfsDataFrame
) -- TfsDataframe to write
Constants
General constants used throughout tfs-pandas
, relating to the standard of TFS files.
Errors
Errors that can be raised during the handling of TFS files.
- exception tfs.errors.TfsFormatError[source]
Raised when a wrong format is detected in the TFS file.
Frame
Contains the class definition of a TfsDataFrame
, inherited from the pandas
DataFrame
, as well
as a utility function to validate the correctness of a TfsDataFrame
.
- class tfs.frame.TfsDataFrame(*args, **kwargs)[source]
Class to hold the information of the built an extended
pandas
DataFrame
, together with a way of getting the headers of the TFS file. The file headers are stored in a dictionary upon read. To get a header value usedata_frame.headers["header_name"]
, ordata_frame["header_name"]
if it does not conflict with a column name in the dataframe.- merge(right: TfsDataFrame | DataFrame, how_headers: str | None = None, new_headers: dict | None = None, **kwargs) TfsDataFrame [source]
Merge
TfsDataFrame
objects with a database-style join. Data manipulation is done by thepandas.Dataframe
method of the same name. Resulting headers are either merged according to the provided how_headers method or as given via new_headers.- Parameters:
right (
TfsDataFrame | pd.DataFrame
) -- TheTfsDataFrame
to merge with the caller.how_headers (
str
) -- Type of merge to be performed for the headers. Either left or right. Refer totfs.frame.merge_headers()
for behavior. IfNone
is provided and new_headers is not provided, the final headers will be empty. Case insensitive, defaults toNone
.new_headers (
dict
) -- If provided, will be used as headers for the mergedTfsDataFrame
. Otherwise these are determined by merging the headers from the caller and the otherTfsDataFrame
according to the method defined by the how_headers argument.
- Keyword Arguments:
these (Any keyword argument is given to pandas.DataFrame.merge(). The default values for all)
pandas (parameters are left as set in the pandas codebase. To see these, refer to the)
documentation](https ([DataFrame.merge) -- //pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html).
- Returns:
A new
TfsDataFrame
with the merged data and merged headers.
- tfs.frame.concat(objs: Sequence[TfsDataFrame | pd.DataFrame], how_headers: str | None = None, new_headers: dict | None = None, **kwargs) TfsDataFrame [source]
Concatenate
TfsDataFrame
objects along a particular axis with optional set logic along the other axes. Data manipulation is done by thepandas.concat
function. Resulting headers are either merged according to the provided how_headers method or as given via new_headers.Warning
Please note that when using this function on many
TfsDataFrames
, leaving the contents of the final headers dictionary to the automatic merger can become unpredictable. In this case it is recommended to provide the new_headers argument to ensure the final result, or leave both how_headers and new_headers asNone
(their defaults) to end up with empty headers.- Parameters:
objs (
Sequence[TfsDataFrame | pd.DataFrame]
) -- theTfsDataFrame
objects to be concatenated.how_headers (
str
) -- Type of merge to be performed for the headers. Either left or right. Refer totfs.frame.merge_headers()
for behavior. IfNone
is provided and new_headers is not provided, the final headers will be empty. Case insensitive, defaults toNone
.new_headers (
dict
) -- If provided, will be used as headers for the mergedTfsDataFrame
. Otherwise these are determined by successively merging the headers from all concatenatedTfsDataFrames
according to the method defined by the how_headers argument.
- Keyword Arguments:
parameters (Any keyword argument is given to pandas.concat(). The default values for all these)
[pandas.concat (are left as set in the pandas codebase. To see these, refer to the)
documentation](https -- //pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html).
- Returns:
A new
TfsDataFrame
with the merged data and merged headers.
- tfs.frame.merge_headers(headers_left: dict, headers_right: dict, how: str) dict [source]
Merge headers of two
TfsDataFrames
together.- Parameters:
headers_left (
dict
) -- Headers of caller (left)TfsDataFrame
when calling.append
,.join
or.merge
. Headers of the left (preceeding)TfsDataFrame
when callingtfs.frame.concat
.headers_right (
dict
) -- Headers of other (right)TfsDataFrame
when calling.append
,.join
or.merge
. Headers of the left (preceeding)TfsDataFrame
when callingtfs.frame.concat
.how (
str
) -- Type of merge to be performed, either left or right. If left, prioritize keys from headers_left in case of duplicate keys. If right, prioritize keys from headers_right in case of duplicate keys. Case-insensitive. IfNone
is given, an empty dictionary will be returned.
- Returns:
A new dictionary as the merge of the two provided dictionaries.
- tfs.frame.validate(data_frame: TfsDataFrame | DataFrame, info_str: str = '', non_unique_behavior: str = 'warn') None [source]
Check if a data frame contains finite values only, strings as column names and no empty headers or column names.
Methodology
- This function performs several different checks on the provided dataframe:
Checking no single element is a
list
ortuple
, which is done with a custom vectorized function applied column-by-column on the dataframe.Checking for non-physical values in the dataframe, which is done by applying the
isna
function with the right option context.Checking for duplicates in either indices or columns.
Checking for column names that are not strings.
Checking for column names including spaces.
- Parameters:
data_frame (
TfsDataFrame | pd.DataFrame
) -- the dataframe to check on.info_str (
str
) -- additional information to include in logging statements.non_unique_behavior (
str
) -- behavior to adopt if non-unique indices or columns are found in the dataframe. Acceptswarn
andraise
as values, case-insensitively, which dictates to respectively issue a warning or raise an error if non-unique elements are found.
HDF5 I/O
Additional tools for reading and writing TfsDataFrames
into hdf5
files.
- tfs.hdf.read_hdf(path: Path | str) TfsDataFrame [source]
Read TfsDataFrame from hdf5 file. The DataFrame needs to be stored in a group named
data
, while the headers are stored inheaders
.- Parameters:
path (
Path, str
) -- Path of the file to read.- Returns:
A
TfsDataFrame
object with the loaded data from the file.
- tfs.hdf.write_hdf(path: Path | str, df: TfsDataFrame, **kwargs) None [source]
Write TfsDataFrame to hdf5 file. The dataframe will be written into the group
data
, the headers into the groupheaders
. Only one dataframe per file is allowed.- Parameters:
path (
Path, str
) -- Path of the output file.df (
TfsDataFrame
) -- TfsDataFrame to write.kwargs -- kwargs to be passed to pandas
DataFrame.to_hdf()
.key
is not allowed andmode
needs to bew
if the output file already exists (w
will be used in any case, even if the file does not exist, but only a warning is logged in that case).
Reader
Reading functionalty for TFS files.
- tfs.reader.read_headers(tfs_file_path: Path | str) dict [source]
Parses the top of the tfs_file_path and returns the headers.
- Parameters:
tfs_file_path (
pathlib.Path | str
) -- Path object to the TFS file to read. Can be a string, in which case it will be cast to a Path object.- Returns:
An dictionary with the headers read from the file.
Examples
>>> headers = read_headers("filename.tfs")
Just as with the
read_tfs
function, it is possible to load from compressed files if the compression format is supported bypandas
. The compression format detection is handled automatically from the extension of the provided tfs_file_path suffix. For instance:>>> headers = read_headers("filename.tfs.gz")
- tfs.reader.read_tfs(tfs_file_path: Path | str, index: str | None = None, non_unique_behavior: str = 'warn', validate: bool = True) TfsDataFrame [source]
Parses the TFS table present in tfs_file_path and returns a
TfsDataFrame
.Note
Loading and reading compressed files is possible. Any compression format supported by
pandas
is accepted, which includes:.gz
,.bz2
,.zip
,.xz
,.zst
,.tar
,.tar.gz
,.tar.xz
or.tar.bz2
. See below for examples.Warning
Through the validate argument, one can skip dataframe validation after loading it from a file. While this can speed-up the execution time of this function, it is not recommended and is not the default behavior of this function. The option, however, is left for the user to use at their own risk should they wish to avoid lengthy validation of large
TfsDataFrames
(such as for instance a sliced FCC lattice).Methodology
This function first calls a helper which parses and returns all metadata from the file (headers content, column names & types, number of lines parsed). The rest of the file (dataframe part) is given to parse to
pandas.read_csv
with the right options to make use of its C engine’s speed. After this, conversion toTfsDataDrame
is made, proper types are applied to columns, the index is set and the frame is eventually validated before being returned.- Parameters:
tfs_file_path (
pathlib.Path | str
) -- Path object to the TFS file to read. Can be a string, in which case it will be cast to a Path object.index (
str
) -- Name of the column to set as index. If not given, looks in tfs_file_path for a column starting withINDEX&&&
.non_unique_behavior (
str
) -- behavior to adopt if non-unique indices or columns are found in the dataframe. Acceptswarn
andraise
as values, case-insensitively, which dictates to respectively issue a warning or raise an error if non-unique elements are found.validate (
bool
) -- Whether to validate the dataframe after reading it. Defaults toFalse
.
- Returns:
A
TfsDataFrame
object with the loaded data from the file.
Examples
Reading from a file is simple, as most arguments have sane default values. The simplest usage goes as follows:
>>> tfs.read("filename.tfs")
One can also pass a
Path
object to the function:>>> tfs.read(pathlib.Path("filename.tfs"))
If one wants to set a specific column as index, this is done as:
>>> tfs.read("filename.tfs", index="COLUMN_NAME")
If one wants to, for instance, raise and error on non-unique indices or columns, one can do so as:
>>> tfs.read("filename.tfs", non_unique_behavior="raise")
One can choose to skip dataframe validation at one’s own risk after reading from file. This is done as:
>>> tfs.read("filename.tfs", validate=False)
It is possible to load compressed files if the compression format is supported by pandas. (see above). The compression format detection is handled automatically from the extension of the provided tfs_file_path suffix. For instance:
>>> tfs.read("filename.tfs.gz") >>> tfs.read("filename.tfs.bz2") >>> tfs.read("filename.tfs.zip")
Testing
Testing functionalty for TfsDataFrames.
- tfs.testing.assert_tfs_frame_equal(df1: TfsDataFrame, df2: TfsDataFrame, compare_keys: bool = True, **kwargs)[source]
Compare two
TfsDataFrame
objects, withdf1
being the reference thatdf2
is compared to. This is mostly intended for unit tests. Comparison is done on both the contents of the headers dictionaries (withpandas
’sassert_dict_equal
) as well as the data itself (withpandas
’sassert_frame_equal
).Note
The
compare_keys
argument is inherited frompandas
’sassert_dict_equal
function and is quite unintuitive. It means to check that both dictionaries have the exact same set of keys.Whether this is given as
True
orFalse
, the values are compared anyway for all keys in the first (reference) dict. In the case of this helper function, all keys present indf1
’s headers will be checked for indf2
’s headers and their corresponding values compared. If given asTrue
, then both headers should be the exact same dictionary.- Parameters:
df1 (
TfsDataFrame
) -- The firstTfsDataFrame
to compare.df2 (
TfsDataFrame
) -- The secondTfsDataFrame
to compare.compare_keys (
bool
) -- IfTrue
, checks that both headers have the exact same set of keys. See the above note for exact meaning and caveat. Defaults toTrue
.**kwargs -- Additional keyword arguments are transmitted to
of (pandas.testing.assert_frame_equal for the comparison)
themselves. (the dataframe parts)
Example
reference_df = tfs.read("path/to/file.tfs") new_df = some_function(*args, **kwargs) assert_tfs_frame_equal(reference_df, new_df)
Tools
Additional functions to modify TFS files.
- tfs.tools.remove_header_comments_from_files(list_of_files: list[str | Path]) None [source]
Check the files in the provided list for invalid headers (no type defined) and removes those inplace when found.
- Parameters:
list_of_files (
list[str | Path]
) -- list of Paths to TFS files meant to be checked. The entries of the list can be strings or Path objects.
- tfs.tools.remove_nan_from_files(list_of_files: list[str | Path], replace: bool = False) None [source]
Remove
NaN
entries from files inlist_of_files
.- Parameters:
list_of_files (
list[str | Path]
) -- list of Paths to TFS files meant to be sanitized. The elements of the list can be strings or Path objects.replace (
bool
) -- ifTrue
, the provided files will be overwritten. Otherwise new files withdropna
appended to the original filenames will be written to disk. Defaults toFalse
.
- tfs.tools.significant_digits(value: float, error: float, return_floats: bool = False) tuple[str, str] | tuple[float, float] [source]
Computes
value
and its error properly rounded with respect to the size oferror
.
Writer
Writing functionalty for TFS files.
- tfs.writer.write_tfs(tfs_file_path: Path | str, data_frame: TfsDataFrame | DataFrame, headers_dict: dict | None = None, save_index: str | bool = False, colwidth: int = 20, headerswidth: int = 20, non_unique_behavior: str = 'warn', validate: bool = True) None [source]
Writes the provided
DataFrame
to disk at tfs_file_path, eventually with theheaders_dict
as headers dictionary.Note
Compression of the output file is possible, by simply providing a valid compression extension as the tfs_file_path suffix. Any compression format supported by
pandas
is accepted, which includes:.gz
,.bz2
,.zip
,.xz
,.zst
,.tar
,.tar.gz
,.tar.xz
or.tar.bz2
. See below for examples.Warning
Through the validate argument, one can skip dataframe validation before writing it to file. While this can speed-up the execution time of this function , it is not recommended and is not the default behavior of this function. The option, however, is left for the user to use at their own risk should they wish to avoid lengthy validation of large
TfsDataFrames
(such as for instance a sliced FCC lattice).- Parameters:
tfs_file_path (
pathlib.Path | str
) -- Path object to the output TFS file. Can be a string, in which case it will be cast to a Path object.data_frame (
TfsDataFrame | pd.DataFrame
) --TfsDataFrame
orpandas.DataFrame
to write to file.headers_dict (
dict
) -- Headers for thedata_frame
. If not provided, assumes aTfsDataFrame
was given and tries to usedata_frame.headers
.save_index (
str | bool
) -- bool or string. Default toFalse
. IfTrue
, saves the index ofdata_frame
to a column identifiable byINDEX&&&
. If given as string, saves the index ofdata_frame
to a column named by the provided value.colwidth (
int
) -- Column width, can not be smaller thanMIN_COLUMN_WIDTH
.headerswidth (
int
) -- Used to format the header width for both keys and values.non_unique_behavior (
str
) -- behavior to adopt if non-unique indices or columns are found in the dataframe. Acceptswarn
andraise
as values, case-insensitively, which dictates to respectively issue a warning or raise an error if non-unique elements are found.validate (
bool
) -- Whether to validate the dataframe before writing it to file. Defaults toTrue
.
Examples
Writing to file is simple, as most arguments have sane default values. The simplest usage goes as follows:
>>> tfs.write("filename.tfs", dataframe)
If one wants to, for instance, raise and error on non-unique indices or columns, one can do so as:
>>> tfs.write("filename.tfs", dataframe, non_unique_behavior="raise")
One can choose to skip dataframe validation at one’s own risk before writing it to file. This is done as:
>>> tfs.write("filename.tfs", dataframe, validate=False)
It is possible to directly have the output file be compressed, by specifying a valid compression extension as the tfs_file_path suffix. The detection and compression is handled automatically. For instance:
>>> tfs.write("filename.tfs.gz", dataframe)