Submitter

HTCondor Utilities

This module provides functionality to create HTCondor jobs and submit them to HTCondor.

write_bash creates bash scripts executing either a python or madx script. Takes as input Dataframe, job type, and optional additional commandline arguments for the script. A shell script is created in each job directory in the dataframe.

make_subfile takes the job dataframe and creates the .sub files required for submissions to HTCondor. The .sub file will be put in the working directory. The maximum runtime of one job can be specified, standard is 8h.

pylhc_submitter.submitter.htc_utils.create_multijob_for_bashfiles(job_df: DataFrame, **kwargs) str[source]

Function to create an HTCondor submission content for all job-scripts, i.e. bash-files, in the job_df.

Keyword Arguments:
  • output_dir (str) -- output directory that will be transferred. Defaults to None.

  • jobflavour (str) -- max duration of the job. Needs to be one of the HTCondor Jobflavours. Defaults to workday.

  • group (str) -- force use of accounting group. Defaults to None.

  • retries (int) -- maximum amount of retries. Default to 3.

  • notification (str) -- Notify under certain conditions. Defaults to error.

  • priority (int) -- Priority to order your jobs. Defaults to None.

Returns:

HTCondor submission definition.

Return type:

str

pylhc_submitter.submitter.htc_utils.create_subfile_from_job(cwd: Path, submission: str | Submit) Path[source]

Write file to submit to HTCondor.

Parameters:
  • cwd (Path) -- working directory

  • submission (str, htcondor.Submit) -- HTCondor submission definition (i.e. content of the file)

Returns:

path to sub-file

Return type:

Path

pylhc_submitter.submitter.htc_utils.make_subfile(cwd: Path, job_df: DataFrame, **kwargs) Path[source]

Creates submit-file for HTCondor. For kwargs, see create_multijob_for_bashfiles.

Parameters:
  • cwd (Path) -- working directory

  • job_df (DataFrame) -- DataFrame containing all the job-information

Returns:

path to the submit-file

Return type:

Path

pylhc_submitter.submitter.htc_utils.map_kwargs(add_dict: Dict[str, Any]) Dict[str, Any][source]

Maps the kwargs for the job-file. Some arguments have pre-defined choices and defaults, the remaining ones are just passed on.

Parameters:

add_dict (Dict[str, Any]) -- additional kwargs to add to the defaults.

Returns:

The mapped kwargs.

Return type:

Dict[str, Any]

pylhc_submitter.submitter.htc_utils.submit_jobfile(jobfile: Path, ssh: str) None[source]

Submit subfile to HTCondor via subprocess.

Parameters:
  • jobfile (Path) -- path to sub-file

  • ssh (str) -- ssh target

pylhc_submitter.submitter.htc_utils.write_bash(job_df: DataFrame, output_dir: Path | None = None, executable: str = 'madx', cmdline_arguments: dict | None = None, mask: str | Path | None = None) DataFrame[source]

Write the bash-files to be called by HTCondor, which in turn call the executable. Takes as input Dataframe, job type, and optional additional commandline arguments for the script. A shell script is created in each job directory in the dataframe.

Parameters:
  • job_df (DataFrame) -- DataFrame containing all the job-information

  • output_dir (str) -- output directory that will be transferred. Defaults to None.

  • executable (str) -- name of the executable. Defaults to madx.

  • cmdline_arguments (dict) -- additional commandline arguments for the executable

  • mask (Union[str, Path]) -- string or path to the mask-file. Defaults to None.

Returns:

The provided job_df but with added path to the scripts.

Return type:

DataFrame

Job Submitter IO-Tools

Tools for input and output for the job-submitter.

class pylhc_submitter.submitter.iotools.CreationOpts(working_directory: Path, mask: Path | str, jobid_mask: str, replace_dict: Dict[str, Any], output_dir: Path, output_destination: Path | str, append_jobs: bool, resume_jobs: bool, executable: str, check_files: Sequence[str], script_arguments: Dict[str, Any], script_extension: str)[source]

Options for creating jobs.

should_drop_jobs() bool[source]

Check if jobs should be dropped after creating the whole parameter space, e.g. because they already exist.

pylhc_submitter.submitter.iotools.create_folders(job_df: TfsDataFrame, working_directory: Path, destination_directory: str | Path | None = None) TfsDataFrame[source]

Create the folder-structure in the given working directory and the destination directory if given. This creates a folder per job in which then the job-scripts and bash-scripts can be stored later.

Parameters:
  • job_df (tfs.TfsDataFrame) -- DataFrame containing all the job-information

  • working_directory (Path) -- Path to the working directory

  • destination_directory (Path, optional) -- Path to the destination directory,

  • None. (i.e. the directory to copy the outputs to manually. Defaults to) --

Returns:

The job-dataframe again, but with the added paths to the job-dirs.

Return type:

tfs.TfsDataFrame

pylhc_submitter.submitter.iotools.create_jobs(opt: CreationOpts) TfsDataFrame[source]

Main function to prepare all the jobs and folder structure. This greates the value-grid based on the replace-dict and checks for existing jobs (if so desired). A job-dataframe is created - and written out - containing all the information and its values are used to generate the job-scripts. It also creates bash-scripts to call the executable for the job-scripts.

Parameters:

opt (CreationOpts) -- Options for creating jobs

Returns:

The job-dataframe containing information for all jobs.

Return type:

tfs.TfsDataFrame

pylhc_submitter.submitter.iotools.get_server_from_uri(path: Path | str) str[source]

Get server information from a path. E.g.: root://eosuser.cern.ch//eos/user/a/ -> root://eosuser.cern.ch/

pylhc_submitter.submitter.iotools.is_eos_uri(path: Path | str | None) bool[source]

Check if the given path is an EOS-URI as eos cp only works with those. E.g.: root://eosuser.cern.ch//eos/user/a/anabramo/banana.txt

This function does not check the double slashes, to avoid having the user pass a malformed path by accident and then assuming it is just a path. This is tested for in pylhc_submitter.job_submitter.check_opts().

pylhc_submitter.submitter.iotools.print_stats(new_jobs: Sequence[str | int], finished_jobs: Sequence[str | int])[source]

Print some quick statistics.

pylhc_submitter.submitter.iotools.uri_to_path(path: Path | str) Path[source]

Strip EOS path information from a path. EOS paths for HTCondor can be given as URI. Strip for direct writing. E.g.: root://eosuser.cern.ch//eos/user/a/anabramo/banana.txt

Mask Resolver

This module provides functionality to resolve and write script masks for HTCondor jobs submission.

pylhc_submitter.submitter.mask.check_percentage_signs_in_mask(mask: str) None[source]

Checks for ‘%’ in the mask, that are not replacement variables.

pylhc_submitter.submitter.mask.create_job_scripts_from_mask(job_df: DataFrame, maskfile: Path, replace_keys: dict, file_ext: str) DataFrame[source]

Takes path to mask file, list of parameter to be replaced and pandas dataframe containg per job the job directory where processed mask is to be put, and columns containing the parameter values with column named like replace parameters. Job directories have to be created beforehand. Processed (madx) mask has the same filename as mask but with the given file extension. Input Dataframe is returned with additional column containing path to the processed script files.

Parameters:
  • job_df (pd.DataFrame) -- Job parameters as defined in description.

  • maskfile -- Path object to the mask file.

  • replace_keys -- keys to be replaced (must correspond to columns in job_df).

  • file_ext -- file extention to use (defaults to madx).

Returns:

The provided job_df but with added path to the scripts.

pylhc_submitter.submitter.mask.find_named_variables_in_mask(mask: str) Set[str][source]

Find all variable-names in the mask.

pylhc_submitter.submitter.mask.generate_jobdf_index(old_df: DataFrame, jobid_mask: str, keys: Sequence[str], values: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes]) List[str] | Iterable[int][source]

Generates index for jobdf from mask for job_id naming.

Parameters:
  • old_df (pd.DataFrame) -- Existing jobdf.

  • jobid_mask (str) -- Mask for naming the jobs.

  • keys (Sequence[str]) -- Keys to be replaced in the mask.

  • values (np.array_like) -- Values-Grid to be replaced in the mask.

Returns:

Index for jobdf, either list of strings (the filled jobid_masks) or integer-range.

Return type:

List[str]

pylhc_submitter.submitter.mask.is_mask_file(mask: str) bool[source]

Check if given string points to a file.

pylhc_submitter.submitter.mask.is_mask_string(mask: str) bool[source]

Checks that given string does not point to a file.

Job Submitter Runners

Defines the methods to run the job-submitter, locally or on HTC.

class pylhc_submitter.submitter.runners.RunnerOpts(working_directory: ~pathlib.Path, jobflavour: str | None = None, output_dir: str | None = None, ssh: str | None = None, dryrun: bool | None = False, htc_arguments: ~typing.Dict[str, ~typing.Any] | None = <factory>, run_local: bool | None = False, num_processes: int | None = 4)[source]

Options for running the submission.

pylhc_submitter.submitter.runners.run_htc(job_df: TfsDataFrame, opt: RunnerOpts) None[source]

Create submission file and submit the jobs to HTCondor.

Parameters:
  • job_df (tfs.TfsDataFrame) -- DataFrame containing all the job-information

  • opt (RunnerOpts) -- Parameters for the runner

pylhc_submitter.submitter.runners.run_jobs(job_df: TfsDataFrame, opt: RunnerOpts) None[source]

Selects how to run the jobs.

Parameters:
  • job_df (tfs.TfsDataFrame) -- DataFrame containing all the job-information

  • opt (RunnerOpts) -- Parameters for the runner

pylhc_submitter.submitter.runners.run_local(job_df: TfsDataFrame, opt: RunnerOpts) None[source]

Run all jobs locally.

Parameters:
  • job_df (tfs.TfsDataFrame) -- DataFrame containing all the job-information

  • opt (RunnerOpts) -- Parameters for the runner