API Reference¶
This page provides an auto-generated summary of ecgtools’s API. For more details and examples, refer to the relevant chapters in the main part of the documentation.
Builder¶
|
Generates a catalog from a list of netCDF files or zarr stores |
- class ecgtools.Builder(paths, storage_options=None, depth=0, exclude_patterns=None, include_patterns=None, joblib_parallel_kwargs=None)[source]¶
Generates a catalog from a list of netCDF files or zarr stores
- Parameters:
paths (
list
ofstr
) – List of paths to crawl for assets/files.storage_options (
dict
, optional) – Parameters passed to the backend file-system such as Google Cloud Storage, Amazon Web Service S3depth (
int
, optional) – Maximum depth to crawl for assets. Default is 0.exclude_patterns (
list
ofstr
, optional) – List of glob patterns to exclude from crawling.include_patterns (
list
ofstr
, optional) – List of glob patterns to include from crawling.joblib_parallel_kwargs (
dict
, optional) – Parameters passed to joblib.Parallel. Default is {}.
- build(*, parsing_func, parsing_func_kwargs=None, postprocess_func=None, postprocess_func_kwargs=None)[source]¶
Builds a catalog from a list of netCDF files or zarr stores.
- Parameters:
parsing_func (
callable
) – Function that parses the asset and returns a dictionary of metadata.parsing_func_kwargs (
dict
, optional) – Parameters passed to the parsing function. Default is {}.postprocess_func (
callable
, optional) – Function that post-processes the built dataframe and returns a pandas dataframe. Default is None.postprocess_func_kwargs (
dict
, optional) – Parameters passed to the post-processing function. Default is {}.
- Returns:
Builder
– The builder object.
- clean_dataframe()[source]¶
Clean the dataframe by excluding invalid assets and removing duplicate entries.
- save(*, name, path_column_name, variable_column_name, data_format, groupby_attrs=None, aggregations=None, esmcat_version='0.0.1', description=None, directory=None, catalog_type='file', to_csv_kwargs=None, json_dump_kwargs=None)[source]¶
Persist catalog contents to files.
- Parameters:
name (
str
) – The name of the file to save the catalog to.path_column_name (
str
) – The name of the column containing the path to the asset. Must be in the header of the CSV file.variable_column_name (
str
) – Name of the attribute column in csv file that contains the variable name.data_format (
str
) – The data format. Valid values are netcdf and zarr.aggregations (
List[dict]
) – List of aggregations to apply to query results, default Noneesmcat_version (
str
) – The ESM Catalog version the collection implements, default Nonedescription (
str
) – Detailed multi-line description to fully explain the collection, default Nonedirectory (
str
) – The directory to save the catalog to. If None, use the current directorycatalog_type (
str
) – The type of catalog to save. Whether to save the catalog table as a dictionary in the JSON file or as a separate CSV file. Valid options are ‘dict’ and ‘file’.to_csv_kwargs (
dict
, optional) – Additional keyword arguments passed through to theto_csv()
method.json_dump_kwargs (
dict
, optional) – Additional keyword arguments passed through to thedump()
function.
- Returns:
Builder
– The builder object.
Notes
See https://github.com/NCAR/esm-collection-spec/blob/master/collection-spec/collection-spec.md for more