Skip to content

Metrics

hgp_lib.metrics.core.GenerationMetrics dataclass

Metrics captured at a single generation for one population.

Stores per-rule training scores, complexities, the best rule found in this generation, and optionally a validation score. In hierarchical GP, child population metrics are nested via child_population_generation_metrics.

Parameters:

Name Type Description Default
best_idx int

Index of the best-scoring rule in train_scores.

required
best_rule Rule

Copy of the best rule from this generation.

required
complexities Sequence[int]

Number of nodes in each rule (same order as train_scores).

required
train_scores Sequence[float]

Fitness scores for every rule in the population.

required
child_population_generation_metrics Sequence[GenerationMetrics]

Metrics from child populations in hierarchical GP. Empty list for flat (non-hierarchical) runs.

required
val_score float | None

Validation score of the global best rule at this generation, or None if validation was not performed. Default: None.

None

Examples:

>>> from hgp_lib.metrics import GenerationMetrics
>>> from hgp_lib.rules import Literal
>>> m = GenerationMetrics.from_population(
...     best_idx=1,
...     best_rule=Literal(value=1),
...     train_scores=[0.7, 0.9, 0.5],
...     complexities=[1, 3, 2],
...     child_population_generation_metrics=[],
... )
>>> m.best_train_score
0.9
>>> m.best_rule_complexity
3
>>> m.population_size
3
Source code in hgp_lib\metrics\core.py
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
@dataclass()
class GenerationMetrics:
    """
    Metrics captured at a single generation for one population.

    Stores per-rule training scores, complexities, the best rule found in this
    generation, and optionally a validation score. In hierarchical GP, child
    population metrics are nested via ``child_population_generation_metrics``.

    Args:
        best_idx (int):
            Index of the best-scoring rule in ``train_scores``.
        best_rule (Rule):
            Copy of the best rule from this generation.
        complexities (Sequence[int]):
            Number of nodes in each rule (same order as ``train_scores``).
        train_scores (Sequence[float]):
            Fitness scores for every rule in the population.
        child_population_generation_metrics (Sequence[GenerationMetrics]):
            Metrics from child populations in hierarchical GP. Empty list for
            flat (non-hierarchical) runs.
        val_score (float | None):
            Validation score of the global best rule at this generation, or ``None``
            if validation was not performed. Default: `None`.

    Examples:
        >>> from hgp_lib.metrics import GenerationMetrics
        >>> from hgp_lib.rules import Literal
        >>> m = GenerationMetrics.from_population(
        ...     best_idx=1,
        ...     best_rule=Literal(value=1),
        ...     train_scores=[0.7, 0.9, 0.5],
        ...     complexities=[1, 3, 2],
        ...     child_population_generation_metrics=[],
        ... )
        >>> m.best_train_score
        0.9
        >>> m.best_rule_complexity
        3
        >>> m.population_size
        3
    """

    best_idx: int
    best_rule: Rule

    complexities: Sequence[int]
    train_scores: Sequence[float]
    child_population_generation_metrics: Sequence["GenerationMetrics"]

    val_score: float | None = None

    @classmethod
    def from_population(
        cls,
        best_idx: int,
        best_rule: Rule,
        train_scores: Sequence[float],
        complexities: Sequence[int],
        child_population_generation_metrics: Sequence["GenerationMetrics"],
    ) -> "GenerationMetrics":
        """
        Construct a ``GenerationMetrics`` from population-level data.

        This is the preferred constructor used by ``BooleanGP._new_generation``.

        Args:
            best_idx (int):
                Index of the best rule in ``train_scores``.
            best_rule (Rule):
                The best rule (already copied).
            train_scores (Sequence[float]):
                Per-rule fitness scores.
            complexities (Sequence[int]):
                Per-rule node counts.
            child_population_generation_metrics (Sequence[GenerationMetrics]):
                Child metrics (empty list for flat GP).

        Returns:
            GenerationMetrics: A new instance with ``val_score=None``.

        Examples:
            >>> from hgp_lib.metrics import GenerationMetrics
            >>> from hgp_lib.rules import Literal
            >>> m = GenerationMetrics.from_population(
            ...     best_idx=0, best_rule=Literal(value=0),
            ...     train_scores=[0.8], complexities=[1],
            ...     child_population_generation_metrics=[],
            ... )
            >>> m.val_score is None
            True
        """
        return cls(
            best_rule=best_rule,
            best_idx=best_idx,
            complexities=complexities,
            train_scores=train_scores,
            child_population_generation_metrics=child_population_generation_metrics,
        )

    @property
    def best_train_score(self) -> float:
        """
        Training score of the best rule in this generation.

        Examples:
            >>> from hgp_lib.metrics import GenerationMetrics
            >>> from hgp_lib.rules import Literal
            >>> m = GenerationMetrics.from_population(
            ...     best_idx=2, best_rule=Literal(value=0),
            ...     train_scores=[0.1, 0.2, 0.9], complexities=[1, 1, 1],
            ...     child_population_generation_metrics=[],
            ... )
            >>> m.best_train_score
            0.9
        """
        return self.train_scores[self.best_idx]

    @property
    def best_rule_complexity(self) -> int:
        """
        Node count of the best rule in this generation.

        Examples:
            >>> from hgp_lib.metrics import GenerationMetrics
            >>> from hgp_lib.rules import Literal
            >>> m = GenerationMetrics.from_population(
            ...     best_idx=0, best_rule=Literal(value=0),
            ...     train_scores=[0.8], complexities=[5],
            ...     child_population_generation_metrics=[],
            ... )
            >>> m.best_rule_complexity
            5
        """
        return self.complexities[self.best_idx]

    @property
    def population_size(self) -> int:
        """
        Number of rules in the population at this generation.

        Examples:
            >>> from hgp_lib.metrics import GenerationMetrics
            >>> from hgp_lib.rules import Literal
            >>> m = GenerationMetrics.from_population(
            ...     best_idx=0, best_rule=Literal(value=0),
            ...     train_scores=[0.1, 0.2, 0.3], complexities=[1, 2, 3],
            ...     child_population_generation_metrics=[],
            ... )
            >>> m.population_size
            3
        """
        return len(self.train_scores)

best_train_score property

Training score of the best rule in this generation.

Examples:

>>> from hgp_lib.metrics import GenerationMetrics
>>> from hgp_lib.rules import Literal
>>> m = GenerationMetrics.from_population(
...     best_idx=2, best_rule=Literal(value=0),
...     train_scores=[0.1, 0.2, 0.9], complexities=[1, 1, 1],
...     child_population_generation_metrics=[],
... )
>>> m.best_train_score
0.9

best_rule_complexity property

Node count of the best rule in this generation.

Examples:

>>> from hgp_lib.metrics import GenerationMetrics
>>> from hgp_lib.rules import Literal
>>> m = GenerationMetrics.from_population(
...     best_idx=0, best_rule=Literal(value=0),
...     train_scores=[0.8], complexities=[5],
...     child_population_generation_metrics=[],
... )
>>> m.best_rule_complexity
5

population_size property

Number of rules in the population at this generation.

Examples:

>>> from hgp_lib.metrics import GenerationMetrics
>>> from hgp_lib.rules import Literal
>>> m = GenerationMetrics.from_population(
...     best_idx=0, best_rule=Literal(value=0),
...     train_scores=[0.1, 0.2, 0.3], complexities=[1, 2, 3],
...     child_population_generation_metrics=[],
... )
>>> m.population_size
3

from_population(best_idx, best_rule, train_scores, complexities, child_population_generation_metrics) classmethod

Construct a GenerationMetrics from population-level data.

This is the preferred constructor used by BooleanGP._new_generation.

Parameters:

Name Type Description Default
best_idx int

Index of the best rule in train_scores.

required
best_rule Rule

The best rule (already copied).

required
train_scores Sequence[float]

Per-rule fitness scores.

required
complexities Sequence[int]

Per-rule node counts.

required
child_population_generation_metrics Sequence[GenerationMetrics]

Child metrics (empty list for flat GP).

required

Returns:

Name Type Description
GenerationMetrics GenerationMetrics

A new instance with val_score=None.

Examples:

>>> from hgp_lib.metrics import GenerationMetrics
>>> from hgp_lib.rules import Literal
>>> m = GenerationMetrics.from_population(
...     best_idx=0, best_rule=Literal(value=0),
...     train_scores=[0.8], complexities=[1],
...     child_population_generation_metrics=[],
... )
>>> m.val_score is None
True
Source code in hgp_lib\metrics\core.py
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
@classmethod
def from_population(
    cls,
    best_idx: int,
    best_rule: Rule,
    train_scores: Sequence[float],
    complexities: Sequence[int],
    child_population_generation_metrics: Sequence["GenerationMetrics"],
) -> "GenerationMetrics":
    """
    Construct a ``GenerationMetrics`` from population-level data.

    This is the preferred constructor used by ``BooleanGP._new_generation``.

    Args:
        best_idx (int):
            Index of the best rule in ``train_scores``.
        best_rule (Rule):
            The best rule (already copied).
        train_scores (Sequence[float]):
            Per-rule fitness scores.
        complexities (Sequence[int]):
            Per-rule node counts.
        child_population_generation_metrics (Sequence[GenerationMetrics]):
            Child metrics (empty list for flat GP).

    Returns:
        GenerationMetrics: A new instance with ``val_score=None``.

    Examples:
        >>> from hgp_lib.metrics import GenerationMetrics
        >>> from hgp_lib.rules import Literal
        >>> m = GenerationMetrics.from_population(
        ...     best_idx=0, best_rule=Literal(value=0),
        ...     train_scores=[0.8], complexities=[1],
        ...     child_population_generation_metrics=[],
        ... )
        >>> m.val_score is None
        True
    """
    return cls(
        best_rule=best_rule,
        best_idx=best_idx,
        complexities=complexities,
        train_scores=train_scores,
        child_population_generation_metrics=child_population_generation_metrics,
    )

hgp_lib.metrics.history.PopulationHistory dataclass

Complete history of a population across all training generations.

Stores the global best rule, training and validation confusion matrix values, and a list of GenerationMetrics — one per epoch. Used as the return type of GPTrainer.fit() and as fold-level results inside RunResult.

Parameters:

Name Type Description Default
global_best_rule Rule

The best rule found across all generations (by validation score when available, otherwise by training score).

required
tp int

True positives of the global best rule on training data.

required
fp int

False positives of the global best rule on training data.

required
fn int

False negatives of the global best rule on training data.

required
tn int

True negatives of the global best rule on training data.

required
val_tp int | None

True positives on validation data, or None. Default: None.

None
val_fp int | None

False positives on validation data, or None. Default: None.

None
val_fn int | None

False negatives on validation data, or None. Default: None.

None
val_tn int | None

True negatives on validation data, or None. Default: None.

None
generations List[GenerationMetrics]

Per-epoch metrics. Default: empty list.

list()

Examples:

>>> from hgp_lib.metrics import PopulationHistory, GenerationMetrics
>>> from hgp_lib.rules import Literal
>>> ph = PopulationHistory(
...     global_best_rule=Literal(value=0), tp=5, fp=1, fn=2, tn=7,
... )
>>> len(ph.generations)
0
>>> ph.best_val_score is None
True
Source code in hgp_lib\metrics\history.py
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
@dataclass
class PopulationHistory:
    """
    Complete history of a population across all training generations.

    Stores the global best rule, training and validation confusion matrix values,
    and a list of ``GenerationMetrics`` — one per epoch. Used as the return type
    of ``GPTrainer.fit()`` and as fold-level results inside ``RunResult``.

    Args:
        global_best_rule (Rule):
            The best rule found across all generations (by validation score when
            available, otherwise by training score).
        tp (int): True positives of the global best rule on training data.
        fp (int): False positives of the global best rule on training data.
        fn (int): False negatives of the global best rule on training data.
        tn (int): True negatives of the global best rule on training data.
        val_tp (int | None): True positives on validation data, or ``None``. Default: `None`.
        val_fp (int | None): False positives on validation data, or ``None``. Default: `None`.
        val_fn (int | None): False negatives on validation data, or ``None``. Default: `None`.
        val_tn (int | None): True negatives on validation data, or ``None``. Default: `None`.
        generations (List[GenerationMetrics]):
            Per-epoch metrics. Default: empty list.

    Examples:
        >>> from hgp_lib.metrics import PopulationHistory, GenerationMetrics
        >>> from hgp_lib.rules import Literal
        >>> ph = PopulationHistory(
        ...     global_best_rule=Literal(value=0), tp=5, fp=1, fn=2, tn=7,
        ... )
        >>> len(ph.generations)
        0
        >>> ph.best_val_score is None
        True
    """

    global_best_rule: Rule
    tp: int
    fp: int
    fn: int
    tn: int
    val_tp: int | None = None
    val_fp: int | None = None
    val_fn: int | None = None
    val_tn: int | None = None
    generations: List[GenerationMetrics] = field(default_factory=list)

    @property
    def __len__(self) -> int:
        return len(self.generations)

    @cached_property
    def best_val_score(self):
        """
        Maximum validation score across all generations, or ``None`` if no
        generation has a validation score.

        Examples:
            >>> from dataclasses import replace
            >>> from hgp_lib.metrics import PopulationHistory, GenerationMetrics
            >>> from hgp_lib.rules import Literal
            >>> g1 = GenerationMetrics.from_population(
            ...     best_idx=0, best_rule=Literal(value=0),
            ...     train_scores=[0.8], complexities=[1],
            ...     child_population_generation_metrics=[],
            ... )
            >>> g2 = replace(g1, val_score=0.6)
            >>> g3 = replace(g1, val_score=0.9)
            >>> ph = PopulationHistory(
            ...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
            ...     generations=[g1, g2, g3],
            ... )
            >>> ph.best_val_score
            0.9
        """
        val_scores = [x.val_score for x in self.generations if x.val_score is not None]
        if len(val_scores) == 0:
            return None
        return max(val_scores)

    @cached_property
    def best_train_score(self):
        """
        Maximum training score across all generations, or ``None`` if there are
        no generations.

        Examples:
            >>> from hgp_lib.metrics import PopulationHistory, GenerationMetrics
            >>> from hgp_lib.rules import Literal
            >>> g1 = GenerationMetrics.from_population(
            ...     best_idx=0, best_rule=Literal(value=0),
            ...     train_scores=[0.6], complexities=[1],
            ...     child_population_generation_metrics=[],
            ... )
            >>> g2 = GenerationMetrics.from_population(
            ...     best_idx=0, best_rule=Literal(value=0),
            ...     train_scores=[0.9], complexities=[1],
            ...     child_population_generation_metrics=[],
            ... )
            >>> ph = PopulationHistory(
            ...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
            ...     generations=[g1, g2],
            ... )
            >>> ph.best_train_score
            0.9
        """
        if len(self.generations) == 0:
            return None
        return max([g.best_train_score for g in self.generations])

best_val_score cached property

Maximum validation score across all generations, or None if no generation has a validation score.

Examples:

>>> from dataclasses import replace
>>> from hgp_lib.metrics import PopulationHistory, GenerationMetrics
>>> from hgp_lib.rules import Literal
>>> g1 = GenerationMetrics.from_population(
...     best_idx=0, best_rule=Literal(value=0),
...     train_scores=[0.8], complexities=[1],
...     child_population_generation_metrics=[],
... )
>>> g2 = replace(g1, val_score=0.6)
>>> g3 = replace(g1, val_score=0.9)
>>> ph = PopulationHistory(
...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
...     generations=[g1, g2, g3],
... )
>>> ph.best_val_score
0.9

best_train_score cached property

Maximum training score across all generations, or None if there are no generations.

Examples:

>>> from hgp_lib.metrics import PopulationHistory, GenerationMetrics
>>> from hgp_lib.rules import Literal
>>> g1 = GenerationMetrics.from_population(
...     best_idx=0, best_rule=Literal(value=0),
...     train_scores=[0.6], complexities=[1],
...     child_population_generation_metrics=[],
... )
>>> g2 = GenerationMetrics.from_population(
...     best_idx=0, best_rule=Literal(value=0),
...     train_scores=[0.9], complexities=[1],
...     child_population_generation_metrics=[],
... )
>>> ph = PopulationHistory(
...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
...     generations=[g1, g2],
... )
>>> ph.best_train_score
0.9

hgp_lib.metrics.results.RunResult dataclass

Result of one complete benchmark run with k-fold cross-validation.

Contains per-fold training histories, the test-set evaluation of the best fold's rule, and the confusion matrix on the held-out test set.

Parameters:

Name Type Description Default
run_id int

Zero-based index of this run.

required
seed int

Random seed used for the stratified split and k-fold.

required
best_fold_idx int

Index of the fold with the highest validation score.

required
folds List[PopulationHistory]

Training history for each fold.

required
test_score float

Score of the best rule on the held-out test set.

required
test_tp int

True positives on the test set.

required
test_fp int

False positives on the test set.

required
test_fn int

False negatives on the test set.

required
test_tn int

True negatives on the test set.

required
feature_names Dict[int, str]

Mapping from feature index to column name (from the binarizer fitted on the best fold).

required

Examples:

>>> from hgp_lib.metrics import RunResult, PopulationHistory
>>> from hgp_lib.rules import Literal
>>> fold = PopulationHistory(
...     global_best_rule=Literal(value=0), tp=3, fp=1, fn=0, tn=6,
... )
>>> run = RunResult(
...     run_id=0, seed=42, best_fold_idx=0, folds=[fold],
...     test_score=0.85, test_tp=4, test_fp=1, test_fn=1, test_tn=4,
...     feature_names={0: "age", 1: "income"},
... )
>>> run.best_rule
0
>>> run.test_confusion_matrix
'[TP: 4, FP: 1, FN: 1, TN: 4]'
Source code in hgp_lib\metrics\results.py
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
@dataclass
class RunResult:
    """
    Result of one complete benchmark run with k-fold cross-validation.

    Contains per-fold training histories, the test-set evaluation of the best
    fold's rule, and the confusion matrix on the held-out test set.

    Args:
        run_id (int): Zero-based index of this run.
        seed (int): Random seed used for the stratified split and k-fold.
        best_fold_idx (int): Index of the fold with the highest validation score.
        folds (List[PopulationHistory]): Training history for each fold.
        test_score (float): Score of the best rule on the held-out test set.
        test_tp (int): True positives on the test set.
        test_fp (int): False positives on the test set.
        test_fn (int): False negatives on the test set.
        test_tn (int): True negatives on the test set.
        feature_names (Dict[int, str]): Mapping from feature index to column name
            (from the binarizer fitted on the best fold).

    Examples:
        >>> from hgp_lib.metrics import RunResult, PopulationHistory
        >>> from hgp_lib.rules import Literal
        >>> fold = PopulationHistory(
        ...     global_best_rule=Literal(value=0), tp=3, fp=1, fn=0, tn=6,
        ... )
        >>> run = RunResult(
        ...     run_id=0, seed=42, best_fold_idx=0, folds=[fold],
        ...     test_score=0.85, test_tp=4, test_fp=1, test_fn=1, test_tn=4,
        ...     feature_names={0: "age", 1: "income"},
        ... )
        >>> run.best_rule
        0
        >>> run.test_confusion_matrix
        '[TP: 4, FP: 1, FN: 1, TN: 4]'
    """

    run_id: int
    seed: int
    best_fold_idx: int
    folds: List[PopulationHistory]
    test_score: float
    test_tp: int
    test_fp: int
    test_fn: int
    test_tn: int
    feature_names: Dict[int, str]

    @cached_property
    def best_fold(self) -> PopulationHistory:
        """
        The ``PopulationHistory`` of the fold with the highest validation score.

        Examples:
            >>> from hgp_lib.metrics import RunResult, PopulationHistory
            >>> from hgp_lib.rules import Literal
            >>> f0 = PopulationHistory(
            ...     global_best_rule=Literal(value=0), tp=1, fp=0, fn=0, tn=1,
            ... )
            >>> f1 = PopulationHistory(
            ...     global_best_rule=Literal(value=1), tp=2, fp=0, fn=0, tn=2,
            ... )
            >>> run = RunResult(
            ...     run_id=0, seed=0, best_fold_idx=1, folds=[f0, f1],
            ...     test_score=0.9, test_tp=1, test_fp=0, test_fn=0, test_tn=1,
            ...     feature_names={},
            ... )
            >>> run.best_fold is f1
            True
        """
        return self.folds[self.best_fold_idx]

    @cached_property
    def best_rule(self) -> Rule:
        """
        The global best rule from the best fold.

        Examples:
            >>> from hgp_lib.metrics import RunResult, PopulationHistory
            >>> from hgp_lib.rules import Literal
            >>> fold = PopulationHistory(
            ...     global_best_rule=Literal(value=5), tp=0, fp=0, fn=0, tn=0,
            ... )
            >>> run = RunResult(
            ...     run_id=0, seed=0, best_fold_idx=0, folds=[fold],
            ...     test_score=0.8, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
            ...     feature_names={},
            ... )
            >>> str(run.best_rule)
            '5'
        """
        return self.best_fold.global_best_rule

    @cached_property
    def fold_val_scores(self) -> List[float]:
        """
        Best validation score from each fold (folds without validation are excluded).

        Examples:
            >>> from dataclasses import replace
            >>> from hgp_lib.metrics import RunResult, PopulationHistory, GenerationMetrics
            >>> from hgp_lib.rules import Literal
            >>> g = GenerationMetrics.from_population(
            ...     best_idx=0, best_rule=Literal(value=0),
            ...     train_scores=[0.8], complexities=[1],
            ...     child_population_generation_metrics=[],
            ... )
            >>> g_val = replace(g, val_score=0.7)
            >>> f0 = PopulationHistory(
            ...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
            ...     generations=[g_val],
            ... )
            >>> f1 = PopulationHistory(
            ...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
            ...     generations=[g],
            ... )
            >>> run = RunResult(
            ...     run_id=0, seed=0, best_fold_idx=0, folds=[f0, f1],
            ...     test_score=0.8, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
            ...     feature_names={},
            ... )
            >>> run.fold_val_scores
            [0.7]
        """
        return [
            fold.best_val_score
            for fold in self.folds
            if fold.best_val_score is not None
        ]

    @cached_property
    def fold_train_scores(self) -> List[float]:
        """
        Best training score from each fold (folds without generations are excluded).

        Examples:
            >>> from hgp_lib.metrics import RunResult, PopulationHistory, GenerationMetrics
            >>> from hgp_lib.rules import Literal
            >>> g = GenerationMetrics.from_population(
            ...     best_idx=0, best_rule=Literal(value=0),
            ...     train_scores=[0.8], complexities=[1],
            ...     child_population_generation_metrics=[],
            ... )
            >>> fold = PopulationHistory(
            ...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
            ...     generations=[g],
            ... )
            >>> run = RunResult(
            ...     run_id=0, seed=0, best_fold_idx=0, folds=[fold],
            ...     test_score=0.8, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
            ...     feature_names={},
            ... )
            >>> run.fold_train_scores
            [0.8]
        """
        return [
            fold.best_train_score
            for fold in self.folds
            if fold.best_train_score is not None
        ]

    @cached_property
    def mean_val_score(self) -> float:
        """
        Mean of the best validation scores across all folds. Returns ``0.0`` if no
        fold has a validation score.

        Examples:
            >>> from hgp_lib.metrics import RunResult, PopulationHistory
            >>> from hgp_lib.rules import Literal
            >>> fold = PopulationHistory(
            ...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
            ... )
            >>> run = RunResult(
            ...     run_id=0, seed=0, best_fold_idx=0, folds=[fold],
            ...     test_score=0.8, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
            ...     feature_names={},
            ... )
            >>> run.mean_val_score
            0.0
        """
        scores = self.fold_val_scores
        if len(scores) == 0:
            return 0.0
        return float(np.mean(scores))

    @cached_property
    def mean_train_score(self) -> float:
        """
        Mean of the best training scores across all folds. Returns ``0.0`` if no
        fold has training generations.

        Examples:
            >>> from hgp_lib.metrics import RunResult, PopulationHistory, GenerationMetrics
            >>> from hgp_lib.rules import Literal
            >>> g = GenerationMetrics.from_population(
            ...     best_idx=0, best_rule=Literal(value=0),
            ...     train_scores=[0.85], complexities=[1],
            ...     child_population_generation_metrics=[],
            ... )
            >>> fold = PopulationHistory(
            ...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
            ...     generations=[g],
            ... )
            >>> run = RunResult(
            ...     run_id=0, seed=0, best_fold_idx=0, folds=[fold],
            ...     test_score=0.8, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
            ...     feature_names={},
            ... )
            >>> run.mean_train_score
            0.85
        """
        scores = self.fold_train_scores
        if len(scores) == 0:
            return 0.0
        return float(np.mean(scores))

    @cached_property
    def train_confusion_matrix(self) -> str:
        """
        Formatted confusion matrix string for the best fold's training data.

        Examples:
            >>> from hgp_lib.metrics import RunResult, PopulationHistory
            >>> from hgp_lib.rules import Literal
            >>> fold = PopulationHistory(
            ...     global_best_rule=Literal(value=0), tp=3, fp=1, fn=2, tn=4,
            ... )
            >>> run = RunResult(
            ...     run_id=0, seed=0, best_fold_idx=0, folds=[fold],
            ...     test_score=0.8, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
            ...     feature_names={},
            ... )
            >>> run.train_confusion_matrix
            '[TP: 3, FP: 1, FN: 2, TN: 4]'
        """
        best_fold = self.best_fold
        return f"[TP: {best_fold.tp}, FP: {best_fold.fp}, FN: {best_fold.fn}, TN: {best_fold.tn}]"

    @cached_property
    def val_confusion_matrix(self) -> str:
        """
        Formatted confusion matrix string for the best fold's validation data.
        Returns ``"[]"`` if no validation data was used.

        Examples:
            >>> from hgp_lib.metrics import RunResult, PopulationHistory
            >>> from hgp_lib.rules import Literal
            >>> fold = PopulationHistory(
            ...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
            ... )
            >>> run = RunResult(
            ...     run_id=0, seed=0, best_fold_idx=0, folds=[fold],
            ...     test_score=0.8, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
            ...     feature_names={},
            ... )
            >>> run.val_confusion_matrix
            '[]'
        """
        best_fold = self.best_fold
        if best_fold.val_tp is None:
            return "[]"
        return f"[TP: {best_fold.val_tp}, FP: {best_fold.val_fp}, FN: {best_fold.val_fn}, TN: {best_fold.val_tn}]"

    @cached_property
    def test_confusion_matrix(self) -> str:
        """
        Formatted confusion matrix string for the held-out test set.

        Examples:
            >>> from hgp_lib.metrics import RunResult, PopulationHistory
            >>> from hgp_lib.rules import Literal
            >>> fold = PopulationHistory(
            ...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
            ... )
            >>> run = RunResult(
            ...     run_id=0, seed=0, best_fold_idx=0, folds=[fold],
            ...     test_score=0.8, test_tp=5, test_fp=2, test_fn=1, test_tn=7,
            ...     feature_names={},
            ... )
            >>> run.test_confusion_matrix
            '[TP: 5, FP: 2, FN: 1, TN: 7]'
        """
        return f"[TP: {self.test_tp}, FP: {self.test_fp}, FN: {self.test_fn}, TN: {self.test_tn}]"

best_fold cached property

The PopulationHistory of the fold with the highest validation score.

Examples:

>>> from hgp_lib.metrics import RunResult, PopulationHistory
>>> from hgp_lib.rules import Literal
>>> f0 = PopulationHistory(
...     global_best_rule=Literal(value=0), tp=1, fp=0, fn=0, tn=1,
... )
>>> f1 = PopulationHistory(
...     global_best_rule=Literal(value=1), tp=2, fp=0, fn=0, tn=2,
... )
>>> run = RunResult(
...     run_id=0, seed=0, best_fold_idx=1, folds=[f0, f1],
...     test_score=0.9, test_tp=1, test_fp=0, test_fn=0, test_tn=1,
...     feature_names={},
... )
>>> run.best_fold is f1
True

best_rule cached property

The global best rule from the best fold.

Examples:

>>> from hgp_lib.metrics import RunResult, PopulationHistory
>>> from hgp_lib.rules import Literal
>>> fold = PopulationHistory(
...     global_best_rule=Literal(value=5), tp=0, fp=0, fn=0, tn=0,
... )
>>> run = RunResult(
...     run_id=0, seed=0, best_fold_idx=0, folds=[fold],
...     test_score=0.8, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
...     feature_names={},
... )
>>> str(run.best_rule)
'5'

fold_val_scores cached property

Best validation score from each fold (folds without validation are excluded).

Examples:

>>> from dataclasses import replace
>>> from hgp_lib.metrics import RunResult, PopulationHistory, GenerationMetrics
>>> from hgp_lib.rules import Literal
>>> g = GenerationMetrics.from_population(
...     best_idx=0, best_rule=Literal(value=0),
...     train_scores=[0.8], complexities=[1],
...     child_population_generation_metrics=[],
... )
>>> g_val = replace(g, val_score=0.7)
>>> f0 = PopulationHistory(
...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
...     generations=[g_val],
... )
>>> f1 = PopulationHistory(
...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
...     generations=[g],
... )
>>> run = RunResult(
...     run_id=0, seed=0, best_fold_idx=0, folds=[f0, f1],
...     test_score=0.8, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
...     feature_names={},
... )
>>> run.fold_val_scores
[0.7]

fold_train_scores cached property

Best training score from each fold (folds without generations are excluded).

Examples:

>>> from hgp_lib.metrics import RunResult, PopulationHistory, GenerationMetrics
>>> from hgp_lib.rules import Literal
>>> g = GenerationMetrics.from_population(
...     best_idx=0, best_rule=Literal(value=0),
...     train_scores=[0.8], complexities=[1],
...     child_population_generation_metrics=[],
... )
>>> fold = PopulationHistory(
...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
...     generations=[g],
... )
>>> run = RunResult(
...     run_id=0, seed=0, best_fold_idx=0, folds=[fold],
...     test_score=0.8, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
...     feature_names={},
... )
>>> run.fold_train_scores
[0.8]

mean_val_score cached property

Mean of the best validation scores across all folds. Returns 0.0 if no fold has a validation score.

Examples:

>>> from hgp_lib.metrics import RunResult, PopulationHistory
>>> from hgp_lib.rules import Literal
>>> fold = PopulationHistory(
...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
... )
>>> run = RunResult(
...     run_id=0, seed=0, best_fold_idx=0, folds=[fold],
...     test_score=0.8, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
...     feature_names={},
... )
>>> run.mean_val_score
0.0

mean_train_score cached property

Mean of the best training scores across all folds. Returns 0.0 if no fold has training generations.

Examples:

>>> from hgp_lib.metrics import RunResult, PopulationHistory, GenerationMetrics
>>> from hgp_lib.rules import Literal
>>> g = GenerationMetrics.from_population(
...     best_idx=0, best_rule=Literal(value=0),
...     train_scores=[0.85], complexities=[1],
...     child_population_generation_metrics=[],
... )
>>> fold = PopulationHistory(
...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
...     generations=[g],
... )
>>> run = RunResult(
...     run_id=0, seed=0, best_fold_idx=0, folds=[fold],
...     test_score=0.8, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
...     feature_names={},
... )
>>> run.mean_train_score
0.85

train_confusion_matrix cached property

Formatted confusion matrix string for the best fold's training data.

Examples:

>>> from hgp_lib.metrics import RunResult, PopulationHistory
>>> from hgp_lib.rules import Literal
>>> fold = PopulationHistory(
...     global_best_rule=Literal(value=0), tp=3, fp=1, fn=2, tn=4,
... )
>>> run = RunResult(
...     run_id=0, seed=0, best_fold_idx=0, folds=[fold],
...     test_score=0.8, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
...     feature_names={},
... )
>>> run.train_confusion_matrix
'[TP: 3, FP: 1, FN: 2, TN: 4]'

val_confusion_matrix cached property

Formatted confusion matrix string for the best fold's validation data. Returns "[]" if no validation data was used.

Examples:

>>> from hgp_lib.metrics import RunResult, PopulationHistory
>>> from hgp_lib.rules import Literal
>>> fold = PopulationHistory(
...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
... )
>>> run = RunResult(
...     run_id=0, seed=0, best_fold_idx=0, folds=[fold],
...     test_score=0.8, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
...     feature_names={},
... )
>>> run.val_confusion_matrix
'[]'

test_confusion_matrix cached property

Formatted confusion matrix string for the held-out test set.

Examples:

>>> from hgp_lib.metrics import RunResult, PopulationHistory
>>> from hgp_lib.rules import Literal
>>> fold = PopulationHistory(
...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
... )
>>> run = RunResult(
...     run_id=0, seed=0, best_fold_idx=0, folds=[fold],
...     test_score=0.8, test_tp=5, test_fp=2, test_fn=1, test_tn=7,
...     feature_names={},
... )
>>> run.test_confusion_matrix
'[TP: 5, FP: 2, FN: 1, TN: 7]'

hgp_lib.metrics.results.ExperimentResult dataclass

Aggregated results across multiple benchmark runs.

Parameters:

Name Type Description Default
runs List[RunResult]

Results from each independent run.

required

Examples:

>>> from dataclasses import replace
>>> from hgp_lib.metrics import ExperimentResult, RunResult, PopulationHistory, GenerationMetrics
>>> from hgp_lib.rules import Literal
>>> g = GenerationMetrics.from_population(
...     best_idx=0, best_rule=Literal(value=0),
...     train_scores=[0.8], complexities=[1],
...     child_population_generation_metrics=[],
... )
>>> g_low = replace(g, val_score=0.5)
>>> g_high = replace(g, val_score=0.9)
>>> fold_low = PopulationHistory(
...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
...     generations=[g_low],
... )
>>> fold_high = PopulationHistory(
...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
...     generations=[g_high],
... )
>>> r1 = RunResult(
...     run_id=0, seed=0, best_fold_idx=0, folds=[fold_low],
...     test_score=0.8, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
...     feature_names={},
... )
>>> r2 = RunResult(
...     run_id=1, seed=1, best_fold_idx=0, folds=[fold_high],
...     test_score=0.9, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
...     feature_names={},
... )
>>> exp = ExperimentResult(runs=[r1, r2])
>>> exp.test_scores
[0.8, 0.9]
>>> exp.best_run.run_id
1
Source code in hgp_lib\metrics\results.py
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
@dataclass
class ExperimentResult:
    """
    Aggregated results across multiple benchmark runs.

    Args:
        runs (List[RunResult]): Results from each independent run.

    Examples:
        >>> from dataclasses import replace
        >>> from hgp_lib.metrics import ExperimentResult, RunResult, PopulationHistory, GenerationMetrics
        >>> from hgp_lib.rules import Literal
        >>> g = GenerationMetrics.from_population(
        ...     best_idx=0, best_rule=Literal(value=0),
        ...     train_scores=[0.8], complexities=[1],
        ...     child_population_generation_metrics=[],
        ... )
        >>> g_low = replace(g, val_score=0.5)
        >>> g_high = replace(g, val_score=0.9)
        >>> fold_low = PopulationHistory(
        ...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
        ...     generations=[g_low],
        ... )
        >>> fold_high = PopulationHistory(
        ...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
        ...     generations=[g_high],
        ... )
        >>> r1 = RunResult(
        ...     run_id=0, seed=0, best_fold_idx=0, folds=[fold_low],
        ...     test_score=0.8, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
        ...     feature_names={},
        ... )
        >>> r2 = RunResult(
        ...     run_id=1, seed=1, best_fold_idx=0, folds=[fold_high],
        ...     test_score=0.9, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
        ...     feature_names={},
        ... )
        >>> exp = ExperimentResult(runs=[r1, r2])
        >>> exp.test_scores
        [0.8, 0.9]
        >>> exp.best_run.run_id
        1
    """

    runs: List[RunResult]

    @cached_property
    def best_run(self) -> RunResult:
        """
        The run with the highest mean validation score across folds. When no run
        has validation scores, falls back to mean training score.

        Returns:
            RunResult: The best-performing run.

        Examples:
            >>> from hgp_lib.metrics import ExperimentResult, RunResult, PopulationHistory, GenerationMetrics
            >>> from hgp_lib.rules import Literal
            >>> g_low = GenerationMetrics.from_population(
            ...     best_idx=0, best_rule=Literal(value=0),
            ...     train_scores=[0.5], complexities=[1],
            ...     child_population_generation_metrics=[],
            ... )
            >>> g_high = GenerationMetrics.from_population(
            ...     best_idx=0, best_rule=Literal(value=0),
            ...     train_scores=[0.9], complexities=[1],
            ...     child_population_generation_metrics=[],
            ... )
            >>> f_low = PopulationHistory(
            ...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
            ...     generations=[g_low],
            ... )
            >>> f_high = PopulationHistory(
            ...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
            ...     generations=[g_high],
            ... )
            >>> r1 = RunResult(
            ...     run_id=0, seed=0, best_fold_idx=0, folds=[f_low],
            ...     test_score=0.7, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
            ...     feature_names={},
            ... )
            >>> r2 = RunResult(
            ...     run_id=1, seed=1, best_fold_idx=0, folds=[f_high],
            ...     test_score=0.9, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
            ...     feature_names={},
            ... )
            >>> ExperimentResult(runs=[r1, r2]).best_run.run_id
            1
        """
        # Use validation scores when available, otherwise fall back to training
        has_val = any(run.mean_val_score > 0.0 for run in self.runs)

        best_run = None
        best_mean = -float("inf")

        for run in self.runs:
            score = run.mean_val_score if has_val else run.mean_train_score
            if score > best_mean:
                best_mean = score
                best_run = run

        return best_run

    @cached_property
    def best_rule(self) -> Rule:
        """
        The best rule from the best fold of the best run.

        Returns:
            Rule: The overall best rule across the entire experiment.

        Examples:
            >>> from hgp_lib.metrics import ExperimentResult, RunResult, PopulationHistory
            >>> from hgp_lib.rules import Literal
            >>> fold = PopulationHistory(
            ...     global_best_rule=Literal(value=7), tp=0, fp=0, fn=0, tn=0,
            ... )
            >>> run = RunResult(
            ...     run_id=0, seed=0, best_fold_idx=0, folds=[fold],
            ...     test_score=0.8, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
            ...     feature_names={},
            ... )
            >>> str(ExperimentResult(runs=[run]).best_rule)
            '7'
        """
        best_run = self.best_run
        return best_run.folds[best_run.best_fold_idx].global_best_rule

    @cached_property
    def test_scores(self) -> list[float]:
        """
        Test scores from all runs.

        Examples:
            >>> from hgp_lib.metrics import ExperimentResult, RunResult, PopulationHistory
            >>> from hgp_lib.rules import Literal
            >>> fold = PopulationHistory(
            ...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
            ... )
            >>> r = RunResult(
            ...     run_id=0, seed=0, best_fold_idx=0, folds=[fold],
            ...     test_score=0.75, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
            ...     feature_names={},
            ... )
            >>> ExperimentResult(runs=[r]).test_scores
            [0.75]
        """
        return [run.test_score for run in self.runs]

best_run cached property

The run with the highest mean validation score across folds. When no run has validation scores, falls back to mean training score.

Returns:

Name Type Description
RunResult RunResult

The best-performing run.

Examples:

>>> from hgp_lib.metrics import ExperimentResult, RunResult, PopulationHistory, GenerationMetrics
>>> from hgp_lib.rules import Literal
>>> g_low = GenerationMetrics.from_population(
...     best_idx=0, best_rule=Literal(value=0),
...     train_scores=[0.5], complexities=[1],
...     child_population_generation_metrics=[],
... )
>>> g_high = GenerationMetrics.from_population(
...     best_idx=0, best_rule=Literal(value=0),
...     train_scores=[0.9], complexities=[1],
...     child_population_generation_metrics=[],
... )
>>> f_low = PopulationHistory(
...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
...     generations=[g_low],
... )
>>> f_high = PopulationHistory(
...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
...     generations=[g_high],
... )
>>> r1 = RunResult(
...     run_id=0, seed=0, best_fold_idx=0, folds=[f_low],
...     test_score=0.7, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
...     feature_names={},
... )
>>> r2 = RunResult(
...     run_id=1, seed=1, best_fold_idx=0, folds=[f_high],
...     test_score=0.9, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
...     feature_names={},
... )
>>> ExperimentResult(runs=[r1, r2]).best_run.run_id
1

best_rule cached property

The best rule from the best fold of the best run.

Returns:

Name Type Description
Rule Rule

The overall best rule across the entire experiment.

Examples:

>>> from hgp_lib.metrics import ExperimentResult, RunResult, PopulationHistory
>>> from hgp_lib.rules import Literal
>>> fold = PopulationHistory(
...     global_best_rule=Literal(value=7), tp=0, fp=0, fn=0, tn=0,
... )
>>> run = RunResult(
...     run_id=0, seed=0, best_fold_idx=0, folds=[fold],
...     test_score=0.8, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
...     feature_names={},
... )
>>> str(ExperimentResult(runs=[run]).best_rule)
'7'

test_scores cached property

Test scores from all runs.

Examples:

>>> from hgp_lib.metrics import ExperimentResult, RunResult, PopulationHistory
>>> from hgp_lib.rules import Literal
>>> fold = PopulationHistory(
...     global_best_rule=Literal(value=0), tp=0, fp=0, fn=0, tn=0,
... )
>>> r = RunResult(
...     run_id=0, seed=0, best_fold_idx=0, folds=[fold],
...     test_score=0.75, test_tp=0, test_fp=0, test_fn=0, test_tn=0,
...     feature_names={},
... )
>>> ExperimentResult(runs=[r]).test_scores
[0.75]