5. Client Tools

The PDC Client Tools provide a set of command-line utilities for interacting with PDC metadata and datasets. These tools allow users to inspect metadata checkpoints, import data from HDF5 files into PDC, and export PDC checkpoints to external formats. They are intended to simplify common client-side operations and to provide insight into the contents and organization of PDC-managed objects. This section describes how to build the tools and details the primary commands available to users.

5.1 Building PDC Tools

$ cd tools
$ cmake .
$ make

5.2 PDC Tool Commands

  • pdc_ls - List objects in a PDC metadata checkpoint or directory of checkpoints.

  • pdc_import - Import HDF5 files into PDC by converting them to a PDC checkpoint.

  • pdc_export - Export a PDC metadata checkpoint to a specified file format (currently HDF5).

pdc_ls

Takes in a directory containing PDC metadata checkpoints or an individual metadata checkpoint file and outputs information on objects saved in the checkpoint(s).

Usage: ./pdc_ls <checkpoint> <additional arguments>:

Arguments:

  • -json <filename>: Save the output to a specified file <filename> in JSON format.

  • -n <objectname>: Only display objects with a specific object name <objectname>. Regex matching of object names is supported.

  • -i <objectid>: Only display objects with a specific object ID <objectid>. Regex matching of object IDs is supported.

  • -ln: List all object names as an additional field in the output.

  • -li: List all object IDs as an additional field in the output.

  • -s: Display summary statistics (number of objects, containers, regions) as an additional field in the output.

Examples:

$ ./pdc_ls pdc_tmp -n id.*
[INFO] File [pdc_tmp/metadata_checkpoint.0] last modified at: Fri Mar 11 14:19:13 2022

{
    "cont_id: 1000000": [
        {
            "obj_id": 1000007,
            "app_name": "VPICIO",
            "obj_name": "id11",
            "user_id": 0,
            "tags": "ag0=1",
            "data_type": "PDC_INT",
            "num_dims": 1,
            "dims": [8388608],
            "time_step": 0,
            "region_list_info": [
                {
                    "storage_loc": "/user/pdc_data/1000007/server0/s0000.bin",
                    "offset": 33554432,
                    "num_dims": 1,
                    "start": [0],
                    "count": [8388608],
                    "unit_size": 4,
                    "data_loc_type": "PDC_NONE"
                }
            ]
        },
        {
            "obj_id": 1000008,
            "app_name": "VPICIO",
            "obj_name": "id22",
            "user_id": 0,
            "tags": "ag0=1",
            "data_type": "PDC_INT",
            "num_dims": 1,
            "dims": [8388608],
            "time_step": 0,
            "region_list_info": [
                {
                    "storage_loc": "/user/pdc_data/1000008/server0/s0000.bin",
                    "offset": 33554432,
                    "num_dims": 1,
                    "start": [0],
                    "count": [8388608],
                    "unit_size": 4,
                    "data_loc_type": "PDC_NONE"
                }
            ]
        }
    ]
}
$ ./pdc_ls pdc_tmp -n obj-var-p.* -ln -li
[INFO] File [pdc_tmp/metadata_checkpoint.0] last modified at: Fri Mar 11 14:19:13 2022

{
    "cont_id: 1000000": [
        {
            "obj_id": 1000004,
            "app_name": "VPICIO",
            "obj_name": "obj-var-pxx",
            "user_id": 0,
            "tags": "ag0=1",
            "data_type": "PDC_FLOAT",
            "num_dims": 1,
            "dims": [8388608],
            "time_step": 0,
            "region_list_info": [
                {
                    "storage_loc": "/user/pdc_data/1000004/server0/s0000.bin",
                    "offset": 33554432,
                    "num_dims": 1,
                    "start": [0],
                    "count": [8388608],
                    "unit_size": 4,
                    "data_loc_type": "PDC_NONE"
                }
            ]
        },
        {
            "obj_id": 1000005,
            "app_name": "VPICIO",
            "obj_name": "obj-var-pyy",
            "user_id": 0,
            "tags": "ag0=1",
            "data_type": "PDC_FLOAT",
            "num_dims": 1,
            "dims": [8388608],
            "time_step": 0,
            "region_list_info": [
                {
                    "storage_loc": "/user/pdc_data/1000005/server0/s0000.bin",
                    "offset": 33554432,
                    "num_dims": 1,
                    "start": [0],
                    "count": [8388608],
                    "unit_size": 4,
                    "data_loc_type": "PDC_NONE"
                }
            ]
        },
        {
            "obj_id": 1000006,
            "app_name": "VPICIO",
            "obj_name": "obj-var-pzz",
            "user_id": 0,
            "tags": "ag0=1",
            "data_type": "PDC_FLOAT",
            "num_dims": 1,
            "dims": [8388608],
            "time_step": 0,
            "region_list_info": [
                {
                    "storage_loc": "/user/pdc_data/1000006/server0/s0000.bin",
                    "offset": 33554432,
                    "num_dims": 1,
                    "start": [0],
                    "count": [8388608],
                    "unit_size": 4,
                    "data_loc_type": "PDC_NONE"
                }
            ]
        }
    ],
    "all_obj_names": ["obj-var-pxx", "obj-var-pyy", "obj-var-pzz"],
    "all_obj_ids": [1000004, 1000005, 1000006]
}

pdc_import

Takes in file containing line-separated paths to HDF5 files and converts those HDF5 files to a PDC checkpoint.

Usage: ./pdc_import <file_list> <additional arguments>:

Arguments:

  • -a <appname>: Uses the specified <appname> as application name when creating PDC objects.

  • -o: Specifies whether or not to overwrite pre-existing PDC objects when writing a PDC object that already exists.

Examples:

$ srun -N 1 -n 1 -c 2 --mem=25600 --cpu_bind=cores --gres=craynetwork:1 --overlap /path/to/pdc_server &
==PDC_SERVER[0]: using [./pdc_tmp/] as tmp dir, 1 OSTs, 1 OSTs per data file, 0% to BB
==PDC_SERVER[0]: using ofi+tcp
==PDC_SERVER[0]: without multi-thread!
==PDC_SERVER[0]: Read cache enabled!
==PDC_SERVER[0]: Successfully established connection to 0 other PDC servers
==PDC_SERVER[0]: Server ready!

$ srun -N 1 -n 1 -c 2 --mem=25600 --cpu_bind=cores --gres=craynetwork:1 --overlap ./pdc_import file_names_list
==PDC_CLIENT: PDC_DEBUG set to 0!
==PDC_CLIENT[0]: Found 1 PDC Metadata servers, running with 1 PDC clients
==PDC_CLIENT: using ofi+tcp
==PDC_CLIENT[0]: Client lookup all servers at start time!
==PDC_CLIENT[0]: using [./pdc_tmp] as tmp dir, 1 clients per server
Running with 1 clients, 1 files
Importer 0: I will import 1 files
Importer 0: [../../test.h5]
Importer 0: processing [../../test.h5]
Importer 0: Created container [/]

==PDC_SERVER[0]: Checkpoint file [./pdc_tmp/metadata_checkpoint.0]
Import 8 datasets with 1 ranks took 0.93 seconds.

pdc_export

Converts PDC metadata checkpoint to a file of specified format. Currently only HDF5 is supported.

Usage: ./pdc_export <checkpoint> <additional arguments>:

Arguments:

  • -f <format>: Uses the specified export <format>. Currently only supports HDF5 exports.

Examples:

$ srun -N 1 -n 1 -c 2 --mem=25600 --cpu_bind=cores --gres=craynetwork:1 --overlap /path/to/pdc_server &
==PDC_SERVER[0]: using [./pdc_tmp/] as tmp dir, 1 OSTs, 1 OSTs per data file, 0% to BB
==PDC_SERVER[0]: using ofi+tcp
==PDC_SERVER[0]: without multi-thread!
==PDC_SERVER[0]: Read cache enabled!
==PDC_SERVER[0]: Successfully established connection to 0 other PDC servers
==PDC_SERVER[0]: Server ready!

$ srun -N 1 -n 1 -c 2 --mem=25600 --cpu_bind=cores --gres=craynetwork:1 --overlap ./pdc_export pdc_tmp
==PDC_CLIENT: PDC_DEBUG set to 0!
==PDC_CLIENT[0]: Found 1 PDC Metadata servers, running with 1 PDC clients
==PDC_CLIENT: using ofi+tcp
==PDC_CLIENT[0]: Client lookup all servers at start time!
==PDC_CLIENT[0]: using [./pdc_tmp] as tmp dir, 1 clients per server
[INFO] File [pdc_tmp/metadata_checkpoint.0] last modified at: Mon May  9 06:17:18 2022

POSIX read from file offset 117478480, region start = 0, region size = 8388608
POSIX read from file offset 130024208, region start = 0, region size = 8388608
POSIX read from file offset 130057104, region start = 0, region size = 8388608
POSIX read from file offset 130056720, region start = 0, region size = 8388608
POSIX read from file offset 130023696, region start = 0, region size = 8388608
POSIX read from file offset 130056592, region start = 0, region size = 8388608
POSIX read from file offset 130056720, region start = 0, region size = 8388608
POSIX read from file offset 130023696, region start = 0, region size = 8388608

Warning

PDC tools currently do not support compound data types and will have unexpected behavior when attempting to work with compound data types.