Metrics

`hgp_lib.metrics.core.GenerationMetrics` `dataclass`

Metrics captured at a single generation for one population.

Stores per-rule training scores, complexities, the best rule found in this generation, and optionally a validation score. In hierarchical GP, child population metrics are nested via child_population_generation_metrics.

Parameters:

Name	Type	Description	Default
`best_idx`	`int`	Index of the best-scoring rule in `train_scores`.	required
`best_rule`	`Rule`	Copy of the best rule from this generation.	required
`complexities`	`Sequence[int]`	Number of nodes in each rule (same order as `train_scores`).	required
`train_scores`	`Sequence[float]`	Fitness scores for every rule in the population.	required
`child_population_generation_metrics`	`Sequence[GenerationMetrics]`	Metrics from child populations in hierarchical GP. Empty list for flat (non-hierarchical) runs.	required
`val_score`	`float \| None`	Validation score of the global best rule at this generation, or `None` if validation was not performed. Default: `None`.	`None`

Examples:

>>> from hgp_lib.metrics import GenerationMetrics
>>> from hgp_lib.rules import Literal
>>> m = GenerationMetrics.from_population(
...     best_idx=1,
...     best_rule=Literal(value=1),
...     train_scores=[0.7, 0.9, 0.5],
...     complexities=[1, 3, 2],
...     child_population_generation_metrics=[],
... )
>>> m.best_train_score
0.9
>>> m.best_rule_complexity
3
>>> m.population_size
3

Source code in hgp_lib\metrics\core.py

@dataclass()
class GenerationMetrics:
    """
    Metrics captured at a single generation for one population.

    Stores per-rule training scores, complexities, the best rule found in this
    generation, and optionally a validation score. In hierarchical GP, child
    population metrics are nested via ``child_population_generation_metrics``.

    Args:
        best_idx (int):
            Index of the best-scoring rule in ``train_scores``.
        best_rule (Rule):
            Copy of the best rule from this generation.
        complexities (Sequence[int]):
            Number of nodes in each rule (same order as ``train_scores``).
        train_scores (Sequence[float]):
            Fitness scores for every rule in the population.
        child_population_generation_metrics (Sequence[GenerationMetrics]):
            Metrics from child populations in hierarchical GP. Empty list for
            flat (non-hierarchical) runs.
        val_score (float | None):
            Validation score of the global best rule at this generation, or ``None``
            if validation was not performed. Default: `None`.

    Examples:
        >>> from hgp_lib.metrics import GenerationMetrics
        >>> from hgp_lib.rules import Literal
        >>> m = GenerationMetrics.from_population(
        ...     best_idx=1,
        ...     best_rule=Literal(value=1),
        ...     train_scores=[0.7, 0.9, 0.5],
        ...     complexities=[1, 3, 2],
        ...     child_population_generation_metrics=[],
        ... )
        >>> m.best_train_score
        0.9
        >>> m.best_rule_complexity
        3
        >>> m.population_size
        3
    """

    best_idx: int
    best_rule: Rule

    complexities: Sequence[int]
    train_scores: Sequence[float]
    child_population_generation_metrics: Sequence["GenerationMetrics"]

    val_score: float | None = None

    @classmethod
    def from_population(
        cls,
        best_idx: int,
        best_rule: Rule,
        train_scores: Sequence[float],
        complexities: Sequence[int],
        child_population_generation_metrics: Sequence["GenerationMetrics"],
    ) -> "GenerationMetrics":
        """
        Construct a ``GenerationMetrics`` from population-level data.

        This is the preferred constructor used by ``BooleanGP._new_generation``.

        Args:
            best_idx (int):
                Index of the best rule in ``train_scores``.
            best_rule (Rule):
                The best rule (already copied).
            train_scores (Sequence[float]):
                Per-rule fitness scores.
            complexities (Sequence[int]):
                Per-rule node counts.
            child_population_generation_metrics (Sequence[GenerationMetrics]):
                Child metrics (empty list for flat GP).

        Returns:
            GenerationMetrics: A new instance with ``val_score=None``.

        Examples:
            >>> from hgp_lib.metrics import GenerationMetrics
            >>> from hgp_lib.rules import Literal
            >>> m = GenerationMetrics.from_population(
            ...     best_idx=0, best_rule=Literal(value=0),
            ...     train_scores=[0.8], complexities=[1],
            ...     child_population_generation_metrics=[],
            ... )
            >>> m.val_score is None
            True
        """
        return cls(
            best_rule=best_rule,
            best_idx=best_idx,
            complexities=complexities,
            train_scores=train_scores,
            child_population_generation_metrics=child_population_generation_metrics,
        )

    @property
    def best_train_score(self) -> float:
        """
        Training score of the best rule in this generation.

        Examples:
            >>> from hgp_lib.metrics import GenerationMetrics
            >>> from hgp_lib.rules import Literal
            >>> m = GenerationMetrics.from_population(
            ...     best_idx=2, best_rule=Literal(value=0),
            ...     train_scores=[0.1, 0.2, 0.9], complexities=[1, 1, 1],
            ...     child_population_generation_metrics=[],
            ... )
            >>> m.best_train_score
            0.9
        """
        return self.train_scores[self.best_idx]

    @property
    def best_rule_complexity(self) -> int:
        """
        Node count of the best rule in this generation.

        Examples:
            >>> from hgp_lib.metrics import GenerationMetrics
            >>> from hgp_lib.rules import Literal
            >>> m = GenerationMetrics.from_population(
            ...     best_idx=0, best_rule=Literal(value=0),
            ...     train_scores=[0.8], complexities=[5],
            ...     child_population_generation_metrics=[],
            ... )
            >>> m.best_rule_complexity
            5
        """
        return self.complexities[self.best_idx]

    @property
    def population_size(self) -> int:
        """
        Number of rules in the population at this generation.

        Examples:
            >>> from hgp_lib.metrics import GenerationMetrics
            >>> from hgp_lib.rules import Literal
            >>> m = GenerationMetrics.from_population(
            ...     best_idx=0, best_rule=Literal(value=0),
            ...     train_scores=[0.1, 0.2, 0.3], complexities=[1, 2, 3],
            ...     child_population_generation_metrics=[],
            ... )
            >>> m.population_size
            3
        """
        return len(self.train_scores)

`best_train_score` `property`

Training score of the best rule in this generation.

Examples:

>>> from hgp_lib.metrics import GenerationMetrics
>>> from hgp_lib.rules import Literal
>>> m = GenerationMetrics.from_population(
...     best_idx=2, best_rule=Literal(value=0),
...     train_scores=[0.1, 0.2, 0.9], complexities=[1, 1, 1],
...     child_population_generation_metrics=[],
... )
>>> m.best_train_score
0.9

`best_rule_complexity` `property`

Node count of the best rule in this generation.

Examples:

>>> from hgp_lib.metrics import GenerationMetrics
>>> from hgp_lib.rules import Literal
>>> m = GenerationMetrics.from_population(
...     best_idx=0, best_rule=Literal(value=0),
...     train_scores=[0.8], complexities=[5],
...     child_population_generation_metrics=[],
... )
>>> m.best_rule_complexity
5

`population_size` `property`

Number of rules in the population at this generation.

Examples:

>>> from hgp_lib.metrics import GenerationMetrics
>>> from hgp_lib.rules import Literal
>>> m = GenerationMetrics.from_population(
...     best_idx=0, best_rule=Literal(value=0),
...     train_scores=[0.1, 0.2, 0.3], complexities=[1, 2, 3],
...     child_population_generation_metrics=[],
... )
>>> m.population_size
3

`from_population(best_idx, best_rule, train_scores, complexities, child_population_generation_metrics)` `classmethod`

Construct a GenerationMetrics from population-level data.

This is the preferred constructor used by BooleanGP._new_generation.

Parameters:

Name	Type	Description	Default
`best_idx`	`int`	Index of the best rule in `train_scores`.	required
`best_rule`	`Rule`	The best rule (already copied).	required
`train_scores`	`Sequence[float]`	Per-rule fitness scores.	required
`complexities`	`Sequence[int]`	Per-rule node counts.	required
`child_population_generation_metrics`	`Sequence[GenerationMetrics]`	Child metrics (empty list for flat GP).	required

Returns:

Name	Type	Description
`GenerationMetrics`	`GenerationMetrics`	A new instance with `val_score=None`.

Examples:

>>> from hgp_lib.metrics import GenerationMetrics
>>> from hgp_lib.rules import Literal
>>> m = GenerationMetrics.from_population(
...     best_idx=0, best_rule=Literal(value=0),
...     train_scores=[0.8], complexities=[1],
...     child_population_generation_metrics=[],
... )
>>> m.val_score is None
True

Source code in hgp_lib\metrics\core.py

@classmethod
def from_population(
    cls,
    best_idx: int,
    best_rule: Rule,
    train_scores: Sequence[float],
    complexities: Sequence[int],
    child_population_generation_metrics: Sequence["GenerationMetrics"],
) -> "GenerationMetrics":
    """
    Construct a ``GenerationMetrics`` from population-level data.

    This is the preferred constructor used by ``BooleanGP._new_generation``.

    Args:
        best_idx (int):
            Index of the best rule in ``train_scores``.
        best_rule (Rule):
            The best rule (already copied).
        train_scores (Sequence[float]):
            Per-rule fitness scores.
        complexities (Sequence[int]):
            Per-rule node counts.
        child_population_generation_metrics (Sequence[GenerationMetrics]):
            Child metrics (empty list for flat GP).

    Returns:
        GenerationMetrics: A new instance with ``val_score=None``.

    Examples:
        >>> from hgp_lib.metrics import GenerationMetrics
        >>> from hgp_lib.rules import Literal
        >>> m = GenerationMetrics.from_population(
        ...     best_idx=0, best_rule=Literal(value=0),
        ...     train_scores=[0.8], complexities=[1],
        ...     child_population_generation_metrics=[],
        ... )
        >>> m.val_score is None
        True
    """
    return cls(
        best_rule=best_rule,
        best_idx=best_idx,
        complexities=complexities,
        train_scores=train_scores,
        child_population_generation_metrics=child_population_generation_metrics,
    )

`hgp_lib.metrics.history.PopulationHistory` `dataclass`

Complete history of a population across all training generations.

Stores the global best rule, training and validation confusion matrix values, and a list of GenerationMetrics — one per epoch. Used as the return type of GPTrainer.fit() and as fold-level results inside RunResult.

Parameters:

Name	Type	Description	Default
`global_best_rule`	`Rule`	The best rule found across all generations (by validation score when available, otherwise by training score).	required
`tp`	`int`	True positives of the global best rule on training data.	required
`fp`	`int`	False positives of the global best rule on training data.	required
`fn`	`int`	False negatives of the global best rule on training data.	required
`tn`	`int`	True negatives of the global best rule on training data.	required
`val_tp`	`int \| None`	True positives on validation data, or `None`. Default: `None`.	`None`
`val_fp`	`int \| None`	False positives on validation data, or `None`. Default: `None`.	`None`
`val_fn`	`int \| None`	False negatives on validation data, or `None`. Default: `None`.	`None`
`val_tn`	`int \| None`	True negatives on validation data, or `None`. Default: `None`.	`None`
`generations`	`List[GenerationMetrics]`	Per-epoch metrics. Default: empty list.	`list()`

Examples:

>>> from hgp_lib.metrics import PopulationHistory, GenerationMetrics
>>> from hgp_lib.rules import Literal
>>> ph = PopulationHistory(
...     global_best_rule=Literal(value=0), tp=5, fp=1, fn=2, tn=7,
... )
>>> len(ph.generations)
0
>>> ph.best_val_score is None
True

Source code in hgp_lib\metrics\history.py

@dataclass
class PopulationHistory:
    """
    Complete history of a population across all training generations.

    Stores the global best rule, training and validation confusion matrix values,
    and a list of ``GenerationMetrics`` — one per epoch. Used as the return type
    of ``GPTrainer.fit()`` and as fold-level results inside ``RunResult``.

    Args:
        global_best_rule (Rule):
            The best rule found across all generations (by validation score when
            available, otherwise by training score).
        tp (int): True positives of the global best rule on training data.
        fp (int): False positives of the global best rule on training data.
        fn (int): False negatives of the global best rule on training data.
        tn (int): True negatives of the global best rule on training data.
        val_tp (int | None): True positives on validation data, or ``None``. Default: `None`.
        val_fp (int | None): False positives on validation data, or ``None``. Default: `None`.
        val_fn (int | None): False negatives on validation data, or ``None``. Default: `None`.
        val_tn (int | None): True negatives on validation data, or ``None``. Default: `None`.
        generations (List[GenerationMetrics]):
            Per-epoch metrics. Default: empty list.

    Examples:
        >>> from hgp_lib.metrics import PopulationHistory, GenerationMetrics
        >>> from hgp_lib.rules import Literal
        >>> ph = PopulationHistory(
        ...     global_best_rule=Literal(value=0), tp=5, fp=1, fn=2, tn=7,
        ... )
        >>> len(ph.generations)
        0
        >>> ph.best_val_score is None
        True
    """

    global_best_rule: Rule
    tp: int
    fp: int
    fn: int
    tn: int
    val_tp: int | None = None
    val_fp: int | None = None
    val_fn: int | None = None
    val_tn: int | None = None
    generations: List[GenerationMetrics] = field(default_factory=list)

    @property
    def __len__(self) -> int:
        return len(self.generations)

    @cached_property
    def best_val_score(self):
        """
        Maximum validation score across all generations, or ``None`` if no
        generation has a validation score.

        Examples:
            >>> from dataclasses import replace
            >>> from hgp_lib.metrics import PopulationHistory, GenerationMetrics
            >>> from hgp_lib.rules import Literal
            >>> g1 = GenerationMetrics.from_population(
            ...     best_idx=0, best_rule=Literal(value=0),
            ...     train_scores=[0.8], complexities=[1],
            ...     child_population_generation_metrics=[],
            ... )
            >>> g2 = replace(g1, val_score=0.6)
            >>> g3 = replace(g1, val_score=0.9)
            >>> ph = PopulationHistory(
            ...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
            ...     generations=[g1, g2, g3],
            ... )
            >>> ph.best_val_score
            0.9
        """
        val_scores = [x.val_score for x in self.generations if x.val_score is not None]
        if len(val_scores) == 0:
            return None
        return max(val_scores)

    @cached_property
    def best_train_score(self):
        """
        Maximum training score across all generations, or ``None`` if there are
        no generations.

        Examples:
            >>> from hgp_lib.metrics import PopulationHistory, GenerationMetrics
            >>> from hgp_lib.rules import Literal
            >>> g1 = GenerationMetrics.from_population(
            ...     best_idx=0, best_rule=Literal(value=0),
            ...     train_scores=[0.6], complexities=[1],
            ...     child_population_generation_metrics=[],
            ... )
            >>> g2 = GenerationMetrics.from_population(
            ...     best_idx=0, best_rule=Literal(value=0),
            ...     train_scores=[0.9], complexities=[1],
            ...     child_population_generation_metrics=[],
            ... )
            >>> ph = PopulationHistory(
            ...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
            ...     generations=[g1, g2],
            ... )
            >>> ph.best_train_score
            0.9
        """
        if len(self.generations) == 0:
            return None
        return max([g.best_train_score for g in self.generations])

`best_val_score` `cached` `property`

Maximum validation score across all generations, or None if no generation has a validation score.

Examples:

>>> from dataclasses import replace
>>> from hgp_lib.metrics import PopulationHistory, GenerationMetrics
>>> from hgp_lib.rules import Literal
>>> g1 = GenerationMetrics.from_population(
...     best_idx=0, best_rule=Literal(value=0),
...     train_scores=[0.8], complexities=[1],
...     child_population_generation_metrics=[],
... )
>>> g2 = replace(g1, val_score=0.6)
>>> g3 = replace(g1, val_score=0.9)
>>> ph = PopulationHistory(
...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
...     generations=[g1, g2, g3],
... )
>>> ph.best_val_score
0.9

`best_train_score` `cached` `property`

Maximum training score across all generations, or None if there are no generations.

Examples:

>>> from hgp_lib.metrics import PopulationHistory, GenerationMetrics
>>> from hgp_lib.rules import Literal
>>> g1 = GenerationMetrics.from_population(
...     best_idx=0, best_rule=Literal(value=0),
...     train_scores=[0.6], complexities=[1],
...     child_population_generation_metrics=[],
... )
>>> g2 = GenerationMetrics.from_population(
...     best_idx=0, best_rule=Literal(value=0),
...     train_scores=[0.9], complexities=[1],
...     child_population_generation_metrics=[],
... )
>>> ph = PopulationHistory(
...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
...     generations=[g1, g2],
... )
>>> ph.best_train_score
0.9

`hgp_lib.metrics.results.RunResult` `dataclass`

Result of one complete benchmark run with k-fold cross-validation.

Contains per-fold training histories, the test-set evaluation of the best fold's rule, and the confusion matrix on the held-out test set.

Parameters:

Name	Type	Description	Default
`run_id`	`int`	Zero-based index of this run.	required
`seed`	`int`	Random seed used for the stratified split and k-fold.	required
`best_fold_idx`	`int`	Index of the fold with the highest validation score.	required
`folds`	`List[PopulationHistory]`	Training history for each fold.	required
`test_score`	`float`	Score of the best rule on the held-out test set.	required
`test_tp`	`int`	True positives on the test set.	required
`test_fp`	`int`	False positives on the test set.	required
`test_fn`	`int`	False negatives on the test set.	required
`test_tn`	`int`	True negatives on the test set.	required
`feature_names`	`Dict[int, str]`	Mapping from feature index to column name (from the binarizer fitted on the best fold).	required

Examples:

>>> from hgp_lib.metrics import RunResult, PopulationHistory
>>> from hgp_lib.rules import Literal
>>> fold = PopulationHistory(
...     global_best_rule=Literal(value=0), tp=3, fp=1, fn=0, tn=6,
... )
>>> run = RunResult(
...     run_id=0, seed=42, best_fold_idx=0, folds=[fold],
...     test_score=0.85, test_tp=4, test_fp=1, test_fn=1, test_tn=4,
...     feature_names={0: "age", 1: "income"},
... )
>>> run.best_rule
0
>>> run.test_confusion_matrix
'[TP: 4, FP: 1, FN: 1, TN: 4]'

Source code in hgp_lib\metrics\results.py

@dataclass
class RunResult:
    """
    Result of one complete benchmark run with k-fold cross-validation.

    Contains per-fold training histories, the test-set evaluation of the best
    fold's rule, and the confusion matrix on the held-out test set.

    Args:
        run_id (int): Zero-based index of this run.
        seed (int): Random seed used for the stratified split and k-fold.
        best_fold_idx (int): Index of the fold with the highest validation score.
        folds (List[PopulationHistory]): Training history for each fold.
        test_score (float): Score of the best rule on the held-out test set.
        test_tp (int): True positives on the test set.
        test_fp (int): False positives on the test set.
        test_fn (int): False negatives on the test set.
        test_tn (int): True negatives on the test set.
        feature_names (Dict[int, str]): Mapping from feature index to column name
            (from the binarizer fitted on the best fold).

    Examples:
        >>> from hgp_lib.metrics import RunResult, PopulationHistory
        >>> from hgp_lib.rules import Literal
        >>> fold = PopulationHistory(
        ...     global_best_rule=Literal(value=0), tp=3, fp=1, fn=0, tn=6,
        ... )
        >>> run = RunResult(
        ...     run_id=0, seed=42, best_fold_idx=0, folds=[fold],
        ...     test_score=0.85, test_tp=4, test_fp=1, test_fn=1, test_tn=4,
        ...     feature_names={0: "age", 1: "income"},
        ... )
        >>> run.best_rule
        0
        >>> run.test_confusion_matrix
        '[TP: 4, FP: 1, FN: 1, TN: 4]'
    """

    run_id: int
    seed: int
    best_fold_idx: int
    folds: List[PopulationHistory]
    test_score: float
    test_tp: int
    test_fp: int
    test_fn: int
    test_tn: int
    feature_names: Dict[int, str]

    @cached_property
    def best_fold(self) -> PopulationHistory:
        """
        The ``PopulationHistory`` of the fold with the highest validation score.

        Examples:
            >>> from hgp_lib.metrics import RunResult, PopulationHistory
            >>> from hgp_lib.rules import Literal
            >>> f0 = PopulationHistory(
            ...     global_best_rule=Literal(value=0), tp=1, fp=0, fn=0, tn=1,
            ... )
            >>> f1 = PopulationHistory(
            ...     global_best_rule=Literal(value=1), tp=2, fp=0, fn=0, tn=2,
            ... )
            >>> run = RunResult(
            ...     run_id=0, seed=0, best_fold_idx=1, folds=[f0, f1],
            ...     test_score=0.9, test_tp=1, test_fp=0, test_fn=0, test_tn=1,
            ...     feature_names={},
            ... )
            >>> run.best_fold is f1
            True
        """
        return self.folds[self.best_fold_idx]

    @cached_property
    def best_rule(self) -> Rule:
        """
        The global best rule from the best fold.

        Examples:
            >>> from hgp_lib.metrics import RunResult, PopulationHistory
            >>> from hgp_lib.rules import Literal
            >>> fold = PopulationHistory(
            ...     global_best_rule=Literal(value=5), tp=0, fp=0, fn=0, tn=0,
            ... )
            >>> run = RunResult(
            ...     run_id=0, seed=0, best_fold_idx=0, folds=[fold],
            ...     test_score=0.8, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
            ...     feature_names={},
            ... )
            >>> str(run.best_rule)
            '5'
        """
        return self.best_fold.global_best_rule

    @cached_property
    def fold_val_scores(self) -> List[float]:
        """
        Best validation score from each fold (folds without validation are excluded).

        Examples:
            >>> from dataclasses import replace
            >>> from hgp_lib.metrics import RunResult, PopulationHistory, GenerationMetrics
            >>> from hgp_lib.rules import Literal
            >>> g = GenerationMetrics.from_population(
            ...     best_idx=0, best_rule=Literal(value=0),
            ...     train_scores=[0.8], complexities=[1],
            ...     child_population_generation_metrics=[],
            ... )
            >>> g_val = replace(g, val_score=0.7)
            >>> f0 = PopulationHistory(
            ...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
            ...     generations=[g_val],
            ... )
            >>> f1 = PopulationHistory(
            ...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
            ...     generations=[g],
            ... )
            >>> run = RunResult(
            ...     run_id=0, seed=0, best_fold_idx=0, folds=[f0, f1],
            ...     test_score=0.8, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
            ...     feature_names={},
            ... )
            >>> run.fold_val_scores
            [0.7]
        """
        return [
            fold.best_val_score
            for fold in self.folds
            if fold.best_val_score is not None
        ]

    @cached_property
    def fold_train_scores(self) -> List[float]:
        """
        Best training score from each fold (folds without generations are excluded).

        Examples:
            >>> from hgp_lib.metrics import RunResult, PopulationHistory, GenerationMetrics
            >>> from hgp_lib.rules import Literal
            >>> g = GenerationMetrics.from_population(
            ...     best_idx=0, best_rule=Literal(value=0),
            ...     train_scores=[0.8], complexities=[1],
            ...     child_population_generation_metrics=[],
            ... )
            >>> fold = PopulationHistory(
            ...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
            ...     generations=[g],
            ... )
            >>> run = RunResult(
            ...     run_id=0, seed=0, best_fold_idx=0, folds=[fold],
            ...     test_score=0.8, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
            ...     feature_names={},
            ... )
            >>> run.fold_train_scores
            [0.8]
        """
        return [
            fold.best_train_score
            for fold in self.folds
            if fold.best_train_score is not None
        ]

    @cached_property
    def mean_val_score(self) -> float:
        """
        Mean of the best validation scores across all folds. Returns ``0.0`` if no
        fold has a validation score.

        Examples:
            >>> from hgp_lib.metrics import RunResult, PopulationHistory
            >>> from hgp_lib.rules import Literal
            >>> fold = PopulationHistory(
            ...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
            ... )
            >>> run = RunResult(
            ...     run_id=0, seed=0, best_fold_idx=0, folds=[fold],
            ...     test_score=0.8, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
            ...     feature_names={},
            ... )
            >>> run.mean_val_score
            0.0
        """
        scores = self.fold_val_scores
        if len(scores) == 0:
            return 0.0
        return float(np.mean(scores))

    @cached_property
    def mean_train_score(self) -> float:
        """
        Mean of the best training scores across all folds. Returns ``0.0`` if no
        fold has training generations.

        Examples:
            >>> from hgp_lib.metrics import RunResult, PopulationHistory, GenerationMetrics
            >>> from hgp_lib.rules import Literal
            >>> g = GenerationMetrics.from_population(
            ...     best_idx=0, best_rule=Literal(value=0),
            ...     train_scores=[0.85], complexities=[1],
            ...     child_population_generation_metrics=[],
            ... )
            >>> fold = PopulationHistory(
            ...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
            ...     generations=[g],
            ... )
            >>> run = RunResult(
            ...     run_id=0, seed=0, best_fold_idx=0, folds=[fold],
            ...     test_score=0.8, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
            ...     feature_names={},
            ... )
            >>> run.mean_train_score
            0.85
        """
        scores = self.fold_train_scores
        if len(scores) == 0:
            return 0.0
        return float(np.mean(scores))

    @cached_property
    def train_confusion_matrix(self) -> str:
        """
        Formatted confusion matrix string for the best fold's training data.

        Examples:
            >>> from hgp_lib.metrics import RunResult, PopulationHistory
            >>> from hgp_lib.rules import Literal
            >>> fold = PopulationHistory(
            ...     global_best_rule=Literal(value=0), tp=3, fp=1, fn=2, tn=4,
            ... )
            >>> run = RunResult(
            ...     run_id=0, seed=0, best_fold_idx=0, folds=[fold],
            ...     test_score=0.8, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
            ...     feature_names={},
            ... )
            >>> run.train_confusion_matrix
            '[TP: 3, FP: 1, FN: 2, TN: 4]'
        """
        best_fold = self.best_fold
        return f"[TP: {best_fold.tp}, FP: {best_fold.fp}, FN: {best_fold.fn}, TN: {best_fold.tn}]"

    @cached_property
    def val_confusion_matrix(self) -> str:
        """
        Formatted confusion matrix string for the best fold's validation data.
        Returns ``"[]"`` if no validation data was used.

        Examples:
            >>> from hgp_lib.metrics import RunResult, PopulationHistory
            >>> from hgp_lib.rules import Literal
            >>> fold = PopulationHistory(
            ...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
            ... )
            >>> run = RunResult(
            ...     run_id=0, seed=0, best_fold_idx=0, folds=[fold],
            ...     test_score=0.8, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
            ...     feature_names={},
            ... )
            >>> run.val_confusion_matrix
            '[]'
        """
        best_fold = self.best_fold
        if best_fold.val_tp is None:
            return "[]"
        return f"[TP: {best_fold.val_tp}, FP: {best_fold.val_fp}, FN: {best_fold.val_fn}, TN: {best_fold.val_tn}]"

    @cached_property
    def test_confusion_matrix(self) -> str:
        """
        Formatted confusion matrix string for the held-out test set.

        Examples:
            >>> from hgp_lib.metrics import RunResult, PopulationHistory
            >>> from hgp_lib.rules import Literal
            >>> fold = PopulationHistory(
            ...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
            ... )
            >>> run = RunResult(
            ...     run_id=0, seed=0, best_fold_idx=0, folds=[fold],
            ...     test_score=0.8, test_tp=5, test_fp=2, test_fn=1, test_tn=7,
            ...     feature_names={},
            ... )
            >>> run.test_confusion_matrix
            '[TP: 5, FP: 2, FN: 1, TN: 7]'
        """
        return f"[TP: {self.test_tp}, FP: {self.test_fp}, FN: {self.test_fn}, TN: {self.test_tn}]"

`best_fold` `cached` `property`

The PopulationHistory of the fold with the highest validation score.

Examples:

>>> from hgp_lib.metrics import RunResult, PopulationHistory
>>> from hgp_lib.rules import Literal
>>> f0 = PopulationHistory(
...     global_best_rule=Literal(value=0), tp=1, fp=0, fn=0, tn=1,
... )
>>> f1 = PopulationHistory(
...     global_best_rule=Literal(value=1), tp=2, fp=0, fn=0, tn=2,
... )
>>> run = RunResult(
...     run_id=0, seed=0, best_fold_idx=1, folds=[f0, f1],
...     test_score=0.9, test_tp=1, test_fp=0, test_fn=0, test_tn=1,
...     feature_names={},
... )
>>> run.best_fold is f1
True

`best_rule` `cached` `property`

The global best rule from the best fold.

Examples:

>>> from hgp_lib.metrics import RunResult, PopulationHistory
>>> from hgp_lib.rules import Literal
>>> fold = PopulationHistory(
...     global_best_rule=Literal(value=5), tp=0, fp=0, fn=0, tn=0,
... )
>>> run = RunResult(
...     run_id=0, seed=0, best_fold_idx=0, folds=[fold],
...     test_score=0.8, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
...     feature_names={},
... )
>>> str(run.best_rule)
'5'

`fold_val_scores` `cached` `property`

Best validation score from each fold (folds without validation are excluded).

Examples:

>>> from dataclasses import replace
>>> from hgp_lib.metrics import RunResult, PopulationHistory, GenerationMetrics
>>> from hgp_lib.rules import Literal
>>> g = GenerationMetrics.from_population(
...     best_idx=0, best_rule=Literal(value=0),
...     train_scores=[0.8], complexities=[1],
...     child_population_generation_metrics=[],
... )
>>> g_val = replace(g, val_score=0.7)
>>> f0 = PopulationHistory(
...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
...     generations=[g_val],
... )
>>> f1 = PopulationHistory(
...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
...     generations=[g],
... )
>>> run = RunResult(
...     run_id=0, seed=0, best_fold_idx=0, folds=[f0, f1],
...     test_score=0.8, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
...     feature_names={},
... )
>>> run.fold_val_scores
[0.7]

`fold_train_scores` `cached` `property`

Best training score from each fold (folds without generations are excluded).

Examples:

>>> from hgp_lib.metrics import RunResult, PopulationHistory, GenerationMetrics
>>> from hgp_lib.rules import Literal
>>> g = GenerationMetrics.from_population(
...     best_idx=0, best_rule=Literal(value=0),
...     train_scores=[0.8], complexities=[1],
...     child_population_generation_metrics=[],
... )
>>> fold = PopulationHistory(
...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
...     generations=[g],
... )
>>> run = RunResult(
...     run_id=0, seed=0, best_fold_idx=0, folds=[fold],
...     test_score=0.8, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
...     feature_names={},
... )
>>> run.fold_train_scores
[0.8]

`mean_val_score` `cached` `property`

Mean of the best validation scores across all folds. Returns 0.0 if no fold has a validation score.

Examples:

>>> from hgp_lib.metrics import RunResult, PopulationHistory
>>> from hgp_lib.rules import Literal
>>> fold = PopulationHistory(
...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
... )
>>> run = RunResult(
...     run_id=0, seed=0, best_fold_idx=0, folds=[fold],
...     test_score=0.8, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
...     feature_names={},
... )
>>> run.mean_val_score
0.0

`mean_train_score` `cached` `property`

Mean of the best training scores across all folds. Returns 0.0 if no fold has training generations.

Examples:

>>> from hgp_lib.metrics import RunResult, PopulationHistory, GenerationMetrics
>>> from hgp_lib.rules import Literal
>>> g = GenerationMetrics.from_population(
...     best_idx=0, best_rule=Literal(value=0),
...     train_scores=[0.85], complexities=[1],
...     child_population_generation_metrics=[],
... )
>>> fold = PopulationHistory(
...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
...     generations=[g],
... )
>>> run = RunResult(
...     run_id=0, seed=0, best_fold_idx=0, folds=[fold],
...     test_score=0.8, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
...     feature_names={},
... )
>>> run.mean_train_score
0.85

`train_confusion_matrix` `cached` `property`

Formatted confusion matrix string for the best fold's training data.

Examples:

>>> from hgp_lib.metrics import RunResult, PopulationHistory
>>> from hgp_lib.rules import Literal
>>> fold = PopulationHistory(
...     global_best_rule=Literal(value=0), tp=3, fp=1, fn=2, tn=4,
... )
>>> run = RunResult(
...     run_id=0, seed=0, best_fold_idx=0, folds=[fold],
...     test_score=0.8, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
...     feature_names={},
... )
>>> run.train_confusion_matrix
'[TP: 3, FP: 1, FN: 2, TN: 4]'

`val_confusion_matrix` `cached` `property`

Formatted confusion matrix string for the best fold's validation data. Returns "[]" if no validation data was used.

Examples:

>>> from hgp_lib.metrics import RunResult, PopulationHistory
>>> from hgp_lib.rules import Literal
>>> fold = PopulationHistory(
...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
... )
>>> run = RunResult(
...     run_id=0, seed=0, best_fold_idx=0, folds=[fold],
...     test_score=0.8, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
...     feature_names={},
... )
>>> run.val_confusion_matrix
'[]'

`test_confusion_matrix` `cached` `property`

Formatted confusion matrix string for the held-out test set.

Examples:

>>> from hgp_lib.metrics import RunResult, PopulationHistory
>>> from hgp_lib.rules import Literal
>>> fold = PopulationHistory(
...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
... )
>>> run = RunResult(
...     run_id=0, seed=0, best_fold_idx=0, folds=[fold],
...     test_score=0.8, test_tp=5, test_fp=2, test_fn=1, test_tn=7,
...     feature_names={},
... )
>>> run.test_confusion_matrix
'[TP: 5, FP: 2, FN: 1, TN: 7]'

`hgp_lib.metrics.results.ExperimentResult` `dataclass`

Aggregated results across multiple benchmark runs.

Parameters:

Name	Type	Description	Default
`runs`	`List[RunResult]`	Results from each independent run.	required

Examples:

>>> from dataclasses import replace
>>> from hgp_lib.metrics import ExperimentResult, RunResult, PopulationHistory, GenerationMetrics
>>> from hgp_lib.rules import Literal
>>> g = GenerationMetrics.from_population(
...     best_idx=0, best_rule=Literal(value=0),
...     train_scores=[0.8], complexities=[1],
...     child_population_generation_metrics=[],
... )
>>> g_low = replace(g, val_score=0.5)
>>> g_high = replace(g, val_score=0.9)
>>> fold_low = PopulationHistory(
...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
...     generations=[g_low],
... )
>>> fold_high = PopulationHistory(
...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
...     generations=[g_high],
... )
>>> r1 = RunResult(
...     run_id=0, seed=0, best_fold_idx=0, folds=[fold_low],
...     test_score=0.8, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
...     feature_names={},
... )
>>> r2 = RunResult(
...     run_id=1, seed=1, best_fold_idx=0, folds=[fold_high],
...     test_score=0.9, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
...     feature_names={},
... )
>>> exp = ExperimentResult(runs=[r1, r2])
>>> exp.test_scores
[0.8, 0.9]
>>> exp.best_run.run_id
1

Source code in hgp_lib\metrics\results.py

@dataclass
class ExperimentResult:
    """
    Aggregated results across multiple benchmark runs.

    Args:
        runs (List[RunResult]): Results from each independent run.

    Examples:
        >>> from dataclasses import replace
        >>> from hgp_lib.metrics import ExperimentResult, RunResult, PopulationHistory, GenerationMetrics
        >>> from hgp_lib.rules import Literal
        >>> g = GenerationMetrics.from_population(
        ...     best_idx=0, best_rule=Literal(value=0),
        ...     train_scores=[0.8], complexities=[1],
        ...     child_population_generation_metrics=[],
        ... )
        >>> g_low = replace(g, val_score=0.5)
        >>> g_high = replace(g, val_score=0.9)
        >>> fold_low = PopulationHistory(
        ...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
        ...     generations=[g_low],
        ... )
        >>> fold_high = PopulationHistory(
        ...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
        ...     generations=[g_high],
        ... )
        >>> r1 = RunResult(
        ...     run_id=0, seed=0, best_fold_idx=0, folds=[fold_low],
        ...     test_score=0.8, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
        ...     feature_names={},
        ... )
        >>> r2 = RunResult(
        ...     run_id=1, seed=1, best_fold_idx=0, folds=[fold_high],
        ...     test_score=0.9, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
        ...     feature_names={},
        ... )
        >>> exp = ExperimentResult(runs=[r1, r2])
        >>> exp.test_scores
        [0.8, 0.9]
        >>> exp.best_run.run_id
        1
    """

    runs: List[RunResult]

    @cached_property
    def best_run(self) -> RunResult:
        """
        The run with the highest mean validation score across folds. When no run
        has validation scores, falls back to mean training score.

        Returns:
            RunResult: The best-performing run.

        Examples:
            >>> from hgp_lib.metrics import ExperimentResult, RunResult, PopulationHistory, GenerationMetrics
            >>> from hgp_lib.rules import Literal
            >>> g_low = GenerationMetrics.from_population(
            ...     best_idx=0, best_rule=Literal(value=0),
            ...     train_scores=[0.5], complexities=[1],
            ...     child_population_generation_metrics=[],
            ... )
            >>> g_high = GenerationMetrics.from_population(
            ...     best_idx=0, best_rule=Literal(value=0),
            ...     train_scores=[0.9], complexities=[1],
            ...     child_population_generation_metrics=[],
            ... )
            >>> f_low = PopulationHistory(
            ...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
            ...     generations=[g_low],
            ... )
            >>> f_high = PopulationHistory(
            ...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
            ...     generations=[g_high],
            ... )
            >>> r1 = RunResult(
            ...     run_id=0, seed=0, best_fold_idx=0, folds=[f_low],
            ...     test_score=0.7, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
            ...     feature_names={},
            ... )
            >>> r2 = RunResult(
            ...     run_id=1, seed=1, best_fold_idx=0, folds=[f_high],
            ...     test_score=0.9, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
            ...     feature_names={},
            ... )
            >>> ExperimentResult(runs=[r1, r2]).best_run.run_id
            1
        """
        # Use validation scores when available, otherwise fall back to training
        has_val = any(run.mean_val_score > 0.0 for run in self.runs)

        best_run = None
        best_mean = -float("inf")

        for run in self.runs:
            score = run.mean_val_score if has_val else run.mean_train_score
            if score > best_mean:
                best_mean = score
                best_run = run

        return best_run

    @cached_property
    def best_rule(self) -> Rule:
        """
        The best rule from the best fold of the best run.

        Returns:
            Rule: The overall best rule across the entire experiment.

        Examples:
            >>> from hgp_lib.metrics import ExperimentResult, RunResult, PopulationHistory
            >>> from hgp_lib.rules import Literal
            >>> fold = PopulationHistory(
            ...     global_best_rule=Literal(value=7), tp=0, fp=0, fn=0, tn=0,
            ... )
            >>> run = RunResult(
            ...     run_id=0, seed=0, best_fold_idx=0, folds=[fold],
            ...     test_score=0.8, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
            ...     feature_names={},
            ... )
            >>> str(ExperimentResult(runs=[run]).best_rule)
            '7'
        """
        best_run = self.best_run
        return best_run.folds[best_run.best_fold_idx].global_best_rule

    @cached_property
    def test_scores(self) -> list[float]:
        """
        Test scores from all runs.

        Examples:
            >>> from hgp_lib.metrics import ExperimentResult, RunResult, PopulationHistory
            >>> from hgp_lib.rules import Literal
            >>> fold = PopulationHistory(
            ...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
            ... )
            >>> r = RunResult(
            ...     run_id=0, seed=0, best_fold_idx=0, folds=[fold],
            ...     test_score=0.75, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
            ...     feature_names={},
            ... )
            >>> ExperimentResult(runs=[r]).test_scores
            [0.75]
        """
        return [run.test_score for run in self.runs]

`best_run` `cached` `property`

The run with the highest mean validation score across folds. When no run has validation scores, falls back to mean training score.

Returns:

Name	Type	Description
`RunResult`	`RunResult`	The best-performing run.

Examples:

>>> from hgp_lib.metrics import ExperimentResult, RunResult, PopulationHistory, GenerationMetrics
>>> from hgp_lib.rules import Literal
>>> g_low = GenerationMetrics.from_population(
...     best_idx=0, best_rule=Literal(value=0),
...     train_scores=[0.5], complexities=[1],
...     child_population_generation_metrics=[],
... )
>>> g_high = GenerationMetrics.from_population(
...     best_idx=0, best_rule=Literal(value=0),
...     train_scores=[0.9], complexities=[1],
...     child_population_generation_metrics=[],
... )
>>> f_low = PopulationHistory(
...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
...     generations=[g_low],
... )
>>> f_high = PopulationHistory(
...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
...     generations=[g_high],
... )
>>> r1 = RunResult(
...     run_id=0, seed=0, best_fold_idx=0, folds=[f_low],
...     test_score=0.7, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
...     feature_names={},
... )
>>> r2 = RunResult(
...     run_id=1, seed=1, best_fold_idx=0, folds=[f_high],
...     test_score=0.9, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
...     feature_names={},
... )
>>> ExperimentResult(runs=[r1, r2]).best_run.run_id
1

`best_rule` `cached` `property`

The best rule from the best fold of the best run.

Returns:

Name	Type	Description
`Rule`	`Rule`	The overall best rule across the entire experiment.

Examples:

>>> from hgp_lib.metrics import ExperimentResult, RunResult, PopulationHistory
>>> from hgp_lib.rules import Literal
>>> fold = PopulationHistory(
...     global_best_rule=Literal(value=7), tp=0, fp=0, fn=0, tn=0,
... )
>>> run = RunResult(
...     run_id=0, seed=0, best_fold_idx=0, folds=[fold],
...     test_score=0.8, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
...     feature_names={},
... )
>>> str(ExperimentResult(runs=[run]).best_rule)
'7'

`test_scores` `cached` `property`

Test scores from all runs.

Examples:

>>> from hgp_lib.metrics import ExperimentResult, RunResult, PopulationHistory
>>> from hgp_lib.rules import Literal
>>> fold = PopulationHistory(
...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
... )
>>> r = RunResult(
...     run_id=0, seed=0, best_fold_idx=0, folds=[fold],
...     test_score=0.75, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
...     feature_names={},
... )
>>> ExperimentResult(runs=[r]).test_scores
[0.75]

Metrics

hgp_lib.metrics.core.GenerationMetrics dataclass

best_train_score property

best_rule_complexity property

population_size property

from_population(best_idx, best_rule, train_scores, complexities, child_population_generation_metrics) classmethod

hgp_lib.metrics.history.PopulationHistory dataclass

best_val_score cached property

best_train_score cached property

hgp_lib.metrics.results.RunResult dataclass

best_fold cached property

best_rule cached property

fold_val_scores cached property

fold_train_scores cached property

mean_val_score cached property

mean_train_score cached property

train_confusion_matrix cached property

val_confusion_matrix cached property

test_confusion_matrix cached property

hgp_lib.metrics.results.ExperimentResult dataclass

best_run cached property

best_rule cached property

test_scores cached property

`hgp_lib.metrics.core.GenerationMetrics` `dataclass`

`best_train_score` `property`

`best_rule_complexity` `property`

`population_size` `property`

`from_population(best_idx, best_rule, train_scores, complexities, child_population_generation_metrics)` `classmethod`

`hgp_lib.metrics.history.PopulationHistory` `dataclass`

`best_val_score` `cached` `property`

`best_train_score` `cached` `property`

`hgp_lib.metrics.results.RunResult` `dataclass`

`best_fold` `cached` `property`

`best_rule` `cached` `property`

`fold_val_scores` `cached` `property`

`fold_train_scores` `cached` `property`

`mean_val_score` `cached` `property`

`mean_train_score` `cached` `property`

`train_confusion_matrix` `cached` `property`

`val_confusion_matrix` `cached` `property`

`test_confusion_matrix` `cached` `property`

`hgp_lib.metrics.results.ExperimentResult` `dataclass`

`best_run` `cached` `property`

`best_rule` `cached` `property`

`test_scores` `cached` `property`