Configs

`hgp_lib.configs.boolean_gp_config.BooleanGPConfig` `dataclass`

Configuration for BooleanGP.

Attributes:

Name	Type	Description
`score_fn`	`Callable`	Fitness function `(predictions, labels) -> float`.
`train_data`	`ndarray \| None`	Training data (2-D boolean array). Can be `None` when used as a template in `BenchmarkerConfig` (data provided at benchmarker level). Default: `None`.
`train_labels`	`ndarray \| None`	Training labels (1-D integer array). Can be `None` when used as a template in `BenchmarkerConfig`. Default: `None`.
`population_factory`	`PopulationGeneratorFactory`	Factory that creates the `PopulationGenerator` at runtime. Override `create_strategies` on the factory to use custom strategies (e.g., `BestLiteralStrategy`). Default: `PopulationGeneratorFactory()` (population_size=100, RandomStrategy).
`mutation_factory`	`MutationExecutorFactory`	Factory that creates the `MutationExecutor` at runtime. Override `create_literal_mutations` / `create_operator_mutations` on the factory to use custom mutations. Default: `MutationExecutorFactory()` (`mutation_p=0.1`, `num_tries=1`, `operator_p=0.5`).
`crossover_factory`	`CrossoverExecutorFactory`	Factory that creates the `CrossoverExecutor` at runtime. Default: `CrossoverExecutorFactory()` (`crossover_p=0.7`, `crossover_strategy="random"`, `num_tries=1`, `operator_p=0.9`).
`selection`	`BaseSelection \| None`	Optional; default `TournamentSelection()`. Default: `None`.
`optimize_scorer`	`bool`	Whether to optimize scorer via data deduplication and sample weights. Default: `True`.
`regeneration`	`bool`	Whether to regenerate population on plateau. Default: `False`.
`regeneration_patience`	`int`	Epochs without improvement before regeneration. Default: `100`.
`check_valid`	`Callable[[Rule], bool] \| None`	Optional rule validator for mutation/crossover. If a callable is passed, the callable will be called once for validation. Default: `None`.
`num_child_populations`	`int`	Number of child populations for hierarchical GP. Default: `0`.
`max_depth`	`int`	Maximum hierarchical depth; `0` means no children. Root population has current_depth=0, its children have current_depth=1, etc. Default: `0`.
`sampling_strategy`	`SamplingStrategy \| None`	Strategy for sampling data/features for children. Required when `max_depth > 0`. Default: `None`.
`top_k_transfer`	`int`	Number of top rules to transfer from each child to parent. Default: `10`.
`feedback_type`	`str`	How to apply parent feedback: `"additive"` or `"multiplicative"`. Default: `"multiplicative"`.
`feedback_strength`	`float`	Coefficient for feedback signal. Must be > 0. Default: `0.1`.

Examples:

>>> import numpy as np
>>> from hgp_lib.configs import BooleanGPConfig
>>> data = np.array([[True, False], [False, True], [True, True], [False, False]])
>>> labels = np.array([1, 0, 1, 0])
>>> def accuracy(p, l): return float((p == l).mean())
>>> config = BooleanGPConfig(score_fn=accuracy, train_data=data, train_labels=labels)
>>> config.train_data.shape
(4, 2)
>>> config.optimize_scorer
True
>>> config.population_factory.population_size
100
>>> config.mutation_factory.mutation_p
0.1

Source code in hgp_lib\configs\boolean_gp_config.py

@dataclass
class BooleanGPConfig:
    """
    Configuration for BooleanGP.

    Attributes:
        score_fn (Callable): Fitness function `(predictions, labels) -> float`.
        train_data (ndarray | None): Training data (2-D boolean array). Can be `None` when
            used as a template in `BenchmarkerConfig` (data provided at benchmarker level).
            Default: `None`.
        train_labels (ndarray | None): Training labels (1-D integer array). Can be `None`
            when used as a template in `BenchmarkerConfig`. Default: `None`.
        population_factory (PopulationGeneratorFactory): Factory that creates the
            `PopulationGenerator` at runtime. Override `create_strategies` on the
            factory to use custom strategies (e.g., `BestLiteralStrategy`).
            Default: `PopulationGeneratorFactory()` (population_size=100, RandomStrategy).
        mutation_factory (MutationExecutorFactory): Factory that creates the
            `MutationExecutor` at runtime. Override `create_literal_mutations` /
            `create_operator_mutations` on the factory to use custom mutations.
            Default: `MutationExecutorFactory()` (`mutation_p=0.1`, `num_tries=1`,
            `operator_p=0.5`).
        crossover_factory (CrossoverExecutorFactory): Factory that creates the
            `CrossoverExecutor` at runtime. Default: `CrossoverExecutorFactory()`
            (`crossover_p=0.7`, `crossover_strategy="random"`, `num_tries=1`, `operator_p=0.9`).
        selection (BaseSelection | None): Optional; default `TournamentSelection()`.
            Default: `None`.
        optimize_scorer (bool): Whether to optimize scorer via data deduplication and
            sample weights. Default: `True`.
        regeneration (bool): Whether to regenerate population on plateau. Default: `False`.
        regeneration_patience (int): Epochs without improvement before regeneration.
            Default: `100`.
        check_valid (Callable[[Rule], bool] | None): Optional rule validator for
            mutation/crossover. If a callable is passed, the callable will be called
            once for validation. Default: `None`.
        num_child_populations (int): Number of child populations for hierarchical GP.
            Default: `0`.
        max_depth (int): Maximum hierarchical depth; `0` means no children.
            Root population has current_depth=0, its children have current_depth=1, etc.
            Default: `0`.
        sampling_strategy (SamplingStrategy | None): Strategy for sampling data/features
            for children. Required when `max_depth > 0`. Default: `None`.
        top_k_transfer (int): Number of top rules to transfer from each child to parent.
            Default: `10`.
        feedback_type (str): How to apply parent feedback: `"additive"` or
            `"multiplicative"`. Default: `"multiplicative"`.
        feedback_strength (float): Coefficient for feedback signal. Must be > 0.
            Default: `0.1`.

    Examples:
        >>> import numpy as np
        >>> from hgp_lib.configs import BooleanGPConfig
        >>> data = np.array([[True, False], [False, True], [True, True], [False, False]])
        >>> labels = np.array([1, 0, 1, 0])
        >>> def accuracy(p, l): return float((p == l).mean())
        >>> config = BooleanGPConfig(score_fn=accuracy, train_data=data, train_labels=labels)
        >>> config.train_data.shape
        (4, 2)
        >>> config.optimize_scorer
        True
        >>> config.population_factory.population_size
        100
        >>> config.mutation_factory.mutation_p
        0.1
    """

    # TODO: We should reconsider the ordering of the arguments for score fn. Pred, GT or GT, Pred?
    score_fn: Callable[[ndarray, ndarray], float]
    complexity_penalty: float = 0.0
    train_data: ndarray | None = None
    train_labels: ndarray | None = None
    population_factory: PopulationGeneratorFactory = field(
        default_factory=PopulationGeneratorFactory
    )
    mutation_factory: MutationExecutorFactory = field(
        default_factory=MutationExecutorFactory
    )
    crossover_factory: CrossoverExecutorFactory = field(
        default_factory=CrossoverExecutorFactory
    )
    selection: BaseSelection | None = None
    optimize_scorer: bool = True
    regeneration: bool = False
    regeneration_patience: int = 100
    check_valid: Callable[[Rule], bool] | None = None
    num_child_populations: int = 0
    max_depth: int = 0
    sampling_strategy: SamplingStrategy | None = None
    top_k_transfer: int = 10
    feedback_type: str = "multiplicative"
    feedback_strength: float = 0.1

`hgp_lib.configs.trainer_config.TrainerConfig` `dataclass`

Configuration for GPTrainer. Wraps BooleanGPConfig.

Attributes:

Name	Type	Description
`gp_config`	`BooleanGPConfig`	Configuration for the underlying BooleanGP.
`num_epochs`	`int`	Number of training epochs.
`val_data`	`ndarray \| None`	Validation data; optional.
`val_labels`	`ndarray \| None`	Validation labels; optional.
`val_every`	`int`	Validate every N epochs.
`progress_bar`	`bool`	Whether to show progress bar.
`leave_progress_bar`	`bool`	Whether to show progress bar.
`progress_callback`	`Callable[[int], None] \| None`	Optional callback for progress updates. Called every `progress_update_interval` epochs with the number of epochs completed. Useful for external progress tracking (e.g., multiprocessing progress bars).
`progress_update_interval`	`int`	How often to call progress_callback (in epochs).

Examples:

>>> import numpy as np
>>> from hgp_lib.configs import BooleanGPConfig, TrainerConfig
>>> data = np.array([[True, False], [False, True], [True, True], [False, False]])
>>> labels = np.array([1, 0, 1, 0])
>>> def accuracy(p, l): return float((p == l).mean())
>>> gp_config = BooleanGPConfig(score_fn=accuracy, train_data=data, train_labels=labels)
>>> config = TrainerConfig(gp_config=gp_config, num_epochs=10)
>>> config.num_epochs
10
>>> config.val_every
100

Source code in hgp_lib\configs\trainer_config.py

@dataclass
class TrainerConfig:
    """
    Configuration for GPTrainer. Wraps BooleanGPConfig.

    Attributes:
        gp_config (BooleanGPConfig): Configuration for the underlying BooleanGP.
        num_epochs (int): Number of training epochs.
        val_data (ndarray | None): Validation data; optional.
        val_labels (ndarray | None): Validation labels; optional.
        val_every (int): Validate every N epochs.
        progress_bar (bool): Whether to show progress bar.
        leave_progress_bar (bool): Whether to show progress bar.
        progress_callback (Callable[[int], None] | None): Optional callback for progress updates.
            Called every `progress_update_interval` epochs with the number of epochs completed.
            Useful for external progress tracking (e.g., multiprocessing progress bars).
        progress_update_interval (int): How often to call progress_callback (in epochs).

    Examples:
        >>> import numpy as np
        >>> from hgp_lib.configs import BooleanGPConfig, TrainerConfig
        >>> data = np.array([[True, False], [False, True], [True, True], [False, False]])
        >>> labels = np.array([1, 0, 1, 0])
        >>> def accuracy(p, l): return float((p == l).mean())
        >>> gp_config = BooleanGPConfig(score_fn=accuracy, train_data=data, train_labels=labels)
        >>> config = TrainerConfig(gp_config=gp_config, num_epochs=10)
        >>> config.num_epochs
        10
        >>> config.val_every
        100
    """

    gp_config: BooleanGPConfig
    num_epochs: int
    val_data: ndarray | None = None
    val_labels: ndarray | None = None
    val_every: int = 100
    progress_bar: bool = True
    leave_progress_bar: bool = True
    progress_callback: Callable[[int], None] | None = None
    progress_update_interval: int = 100

`hgp_lib.configs.benchmarker_config.BenchmarkerConfig` `dataclass`

Configuration for GPBenchmarker. Used for multi-run benchmarking with k-fold CV.

Contains a TrainerConfig template that specifies all training settings. The benchmarker will create TrainerConfig instances for each fold, replacing the data with fold-specific train/validation splits.

The benchmarker handles binarization internally: for each fold, it fits a fresh copy of the binarizer on the training fold and transforms validation/test data with it. This avoids data leakage (the binarizer never sees validation or test data during fitting). Pass raw (non-binarized) data as a pandas.DataFrame.

Attributes:

Name	Type	Description
`data`	`DataFrame`	Full dataset as a `pandas.DataFrame`. The benchmarker will binarize it per-fold using the configured `binarizer`, then convert to a boolean numpy array for the GP algorithm. Columns can be boolean, categorical, or numeric.
`labels`	`ndarray`	Labels for the full dataset (1-D numpy array).
`trainer_config`	`TrainerConfig`	Template configuration for training. The nested `gp_config` does not need `train_data`/`train_labels` (they will be set per fold by the benchmarker).
`binarizer`	`StandardBinarizer \| KBinsDiscretizer \| None`	Binarizer to transform features into boolean columns. A fresh `deepcopy` is fitted per fold so the original stays unfitted. When `None` (default), a `StandardBinarizer()` with default settings is used, which applies one-hot-encoding to categorical features and decision-tree-based binarization (5 bins) to numerical features. The binarizer must not be already fitted. Default: `None`.
`num_runs`	`int`	Number of benchmark runs with different random seeds. Default: `30`.
`test_size`	`float`	Fraction of data to hold out for testing. Default: `0.2`.
`n_folds`	`int`	Number of folds for k-fold cross-validation. Default: `5`.
`n_jobs`	`int`	Number of parallel jobs (`-1` = all CPUs, `1` = sequential). Default: `-1`.
`base_seed`	`int`	Base random seed; each run uses `base_seed + run_id`. Default: `0`.
`show_run_progress`	`bool`	Show progress bar for runs. Default: `True`.
`show_fold_progress`	`bool`	Show progress bar for folds within each run. Default: `True`.
`show_epoch_progress`	`bool`	Show progress bar for epochs within each fold. Default: `True`.

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> from hgp_lib.configs import BooleanGPConfig, TrainerConfig, BenchmarkerConfig
>>> data = pd.DataFrame({
...     'feature1': [True, False, True, False],
...     'feature2': [False, True, True, False],
... })
>>> labels = np.array([1, 0, 1, 0])
>>> def accuracy(p, l): return float((p == l).mean())
>>> gp_config = BooleanGPConfig(score_fn=accuracy)
>>> trainer_config = TrainerConfig(gp_config=gp_config, num_epochs=10)
>>> config = BenchmarkerConfig(
...     data=data, labels=labels, trainer_config=trainer_config, n_folds=2
... )
>>> config.num_runs
30
>>> config.n_folds
2

Source code in hgp_lib\configs\benchmarker_config.py

@dataclass
class BenchmarkerConfig:
    """
    Configuration for GPBenchmarker. Used for multi-run benchmarking with k-fold CV.

    Contains a TrainerConfig template that specifies all training settings. The benchmarker
    will create TrainerConfig instances for each fold, replacing the data with fold-specific
    train/validation splits.

    The benchmarker handles binarization internally: for each fold, it fits a fresh copy
    of the binarizer on the training fold and transforms validation/test data with it.
    This avoids data leakage (the binarizer never sees validation or test data during fitting).
    Pass raw (non-binarized) data as a `pandas.DataFrame`.

    Attributes:
        data (DataFrame): Full dataset as a `pandas.DataFrame`. The benchmarker will
            binarize it per-fold using the configured `binarizer`, then convert to a
            boolean numpy array for the GP algorithm. Columns can be boolean, categorical,
            or numeric.
        labels (ndarray): Labels for the full dataset (1-D numpy array).
        trainer_config (TrainerConfig): Template configuration for training. The nested
            `gp_config` does not need `train_data`/`train_labels` (they will be set
            per fold by the benchmarker).
        binarizer (StandardBinarizer | KBinsDiscretizer | None): Binarizer to transform
            features into boolean columns. A fresh `deepcopy` is fitted per fold so
            the original stays unfitted. When `None` (default), a
            `StandardBinarizer()` with default settings is used, which applies
            one-hot-encoding to categorical features and decision-tree-based binarization
            (5 bins) to numerical features. The binarizer must **not** be already fitted.
            Default: `None`.
        num_runs (int): Number of benchmark runs with different random seeds. Default: `30`.
        test_size (float): Fraction of data to hold out for testing. Default: `0.2`.
        n_folds (int): Number of folds for k-fold cross-validation. Default: `5`.
        n_jobs (int): Number of parallel jobs (`-1` = all CPUs, `1` = sequential).
            Default: `-1`.
        base_seed (int): Base random seed; each run uses `base_seed + run_id`.
            Default: `0`.
        show_run_progress (bool): Show progress bar for runs. Default: `True`.
        show_fold_progress (bool): Show progress bar for folds within each run.
            Default: `True`.
        show_epoch_progress (bool): Show progress bar for epochs within each fold.
            Default: `True`.

    Examples:
        >>> import numpy as np
        >>> import pandas as pd
        >>> from hgp_lib.configs import BooleanGPConfig, TrainerConfig, BenchmarkerConfig
        >>> data = pd.DataFrame({
        ...     'feature1': [True, False, True, False],
        ...     'feature2': [False, True, True, False],
        ... })
        >>> labels = np.array([1, 0, 1, 0])
        >>> def accuracy(p, l): return float((p == l).mean())
        >>> gp_config = BooleanGPConfig(score_fn=accuracy)
        >>> trainer_config = TrainerConfig(gp_config=gp_config, num_epochs=10)
        >>> config = BenchmarkerConfig(
        ...     data=data, labels=labels, trainer_config=trainer_config, n_folds=2
        ... )
        >>> config.num_runs
        30
        >>> config.n_folds
        2
    """

    data: DataFrame
    labels: ndarray
    trainer_config: TrainerConfig
    binarizer: StandardBinarizer | KBinsDiscretizer | None = None
    num_runs: int = 30
    test_size: float = 0.2
    n_folds: int = 5
    n_jobs: int = -1
    base_seed: int = 0
    show_run_progress: bool = True
    show_fold_progress: bool = True
    show_epoch_progress: bool = True

Configs

hgp_lib.configs.boolean_gp_config.BooleanGPConfig dataclass

hgp_lib.configs.trainer_config.TrainerConfig dataclass

hgp_lib.configs.benchmarker_config.BenchmarkerConfig dataclass

`hgp_lib.configs.boolean_gp_config.BooleanGPConfig` `dataclass`

`hgp_lib.configs.trainer_config.TrainerConfig` `dataclass`

`hgp_lib.configs.benchmarker_config.BenchmarkerConfig` `dataclass`