breton_cretenet package

Submodules

breton_cretenet.algorithm module

breton_cretenet.algorithm.decision_tree_regressor_algorithm(X_train, y_train, X_train_labels, max_depth=2, random_state=0, verbose=1)[source]

Fit a decision tree regression model to the training data.

Parameters

X_trainnumpy.ndarray

Training input data of shape (n_samples, n_features).

y_trainnumpy.ndarray

Target values of shape (n_samples,).

X_train_labelslist

List of strings representing the feature names.

max_depthint, optional (default=2)

The maximum depth of the decision tree.

random_stateint, optional (default=0)

Seed used by the random number generator.

verboseint, optional

Verbosity level of information, by default 1.

Returns

sklearn.tree.DecisionTreeRegressor

A fitted decision tree regression model.

breton_cretenet.algorithm.lasso_regression_feature_selection(X_train, y_train, X_train_labels, X_test, verbose=1)[source]

Apply Lasso regression feature selection to the training data.

Parameters

X_trainnumpy.ndarray

Training input data of shape (n_samples, n_features).

y_trainnumpy.ndarray

Target values of shape (n_samples,).

X_train_labelslist

List of strings representing the feature names.

X_testnumpy.ndarray

Training input data of shape (n_samples, n_features).

verboseint, optional

Verbosity level of information, by default 1.

Returns

tuple

A tuple containing the selected training input data of shape (n_samples, n_selected_features) and a list of strings representing the names of the selected features. If the number of training samples is less than or equal to 50, the function returns the original input data and feature names unchanged.

breton_cretenet.algorithm.linear_regression_algorithm(X_train, y_train, X_train_labels, verbose=1)[source]

Fit a linear regression model to the training data.

Parameters

X_trainnumpy.ndarray

Training input data of shape (n_samples, n_features).

y_trainnumpy.ndarray

Target values of shape (n_samples,).

X_train_labelslist

List of strings representing the feature names.

verboseint, optional

Verbosity level of information, by default 1.

Returns

sklearn.linear_model.LinearRegression

A fitted linear regression model.

breton_cretenet.algorithm.predict_from_regressor(model, X, X_labels)[source]

Predict the target values for new input data using a given regression model.

Parameters

modelsklearn estimator

A fitted regression model.

Xnumpy.ndarray

Input data of shape (n_samples, n_features).

X_labelslist

List of strings representing the feature names.

Returns

numpy.ndarray

Predicted target values of shape (n_samples,).

breton_cretenet.algorithm.score(y_true, y_predict)[source]

Calculate the mean absolute error (MAE) between true and predicted values.

Parameters

y_truenp.ndarray

Correct target values.

y_predictnp.ndarray

Estimated target values.

Returns

float

Mean absolute error between y_true and y_predict.

Examples

>>> y_true = np.array([3, -0.5, 2, 7])
>>> y_predict = np.array([2.5, 0.0, 2, 8])
>>> score(y_true, y_predict)
0.5

breton_cretenet.data_preparator module

breton_cretenet.data_preparator.detect_column_names_from_file(file)[source]

Given a file, this function reads the file line by line and detects the column names in it.

Parameters

  • file: a file object, representing the file to be read.

Returns

  • column_names: a list of strings, containing the column names detected in the file.

breton_cretenet.data_preparator.get_data_column_names(input)[source]

Returns the names of columns in a data file.

Parameters

inputstr

A URL or local path to a data file.

Returns

list

A list of column names.

Raises

ValueError

If the file does not exist.

Notes

This function detects the type of data file by checking its extension and assumes that the file is either a CSV or a text file with space-separated values. It then reads the file and returns the column names.

breton_cretenet.data_preparator.load_data(input, verbose=1)[source]

Load data from a CSV or fixed-width file into a NumPy array.

Parameters

inputstr

The path to the input file. Can be a local file path or a URL.

verboseint, optional

Verbosity level of information, by default 1.

Returns

numpy.ndarray

A 2-dimensional NumPy array containing the loaded data.

Raises

ValueError

If the input file format is not recognized.

Notes

This function reads data from CSV or fixed-width files using the pandas library, and converts it to a NumPy array. The file format is determined based on the file extension: “.csv” files are assumed to be comma-separated, while “.data” files are assumed to be fixed-width. The header of the file is ignored.

breton_cretenet.data_preparator.prepare(dataset, random_state=None, stratify=None)[source]

Creates a training and a test set from the features X and labels y in dataset.

Parameters

datasetnumpy.ndarray

Dataset of shape (n_samples, n_features), with labels in the last columns and features in the other columns.

random_stateint, optional

Seed chosen for the train test split. If no argument is given, the seed is not fixed.

stratifylist, optional

If not None, the dataset is split in a stratified fashion, using this as the class labels.

Returns

numpy.ndarray

X_train, an array containing the features of the training set.

numpy.ndarray

X_test, an array containing the features of the test set.

numpy.ndarray

y_train, an array containing the labels of the training set.

numpy.ndarray

y_test, an array containing the labels of the test set.

breton_cretenet.data_preprocessor module

breton_cretenet.data_preprocessor.preprocess(X_train, X_test, method='standardize', verbose=1)[source]

Creates a training and a test set from the features X and labels y in dataset.

Parameters

X_trainnumpy.ndarray

Array containing the features of the training set.

X_testnumpy.ndarray

Array containing the features of the test set.

methodstring, optional

Selects the preprocessing method we want to apply, if None selected, then “standardize” is chosen by default.

verboseint, optional

Verbosity level of information, by default 1.

Returns

numpy.ndarray

An array containing the preprocessed features of the training set.

numpy.ndarray

An array containing the preprocessed features of the test set.

breton_cretenet.data_preprocessor.preprocess_polynomialfeatures(data, data_column_names, degree=2, verbose=1)[source]

Applies polynomial feature expansion to a Numpy array of data, and returns the resulting array and the names of the columns in the expanded feature matrix.

Parameters:

datanumpy.ndarray

The input array of data to be expanded.

data_column_nameslist

A list of the names of the columns in the input data array.

degreeint, optional (default=2)

The degree of the polynomial features to generate.

verboseint, optional

Verbosity level of information, by default 1.

Returns:

tuple(numpy.ndarray, list)
A tuple containing two elements:
  1. A Numpy array representing the expanded feature matrix.

  2. A list of the names of the columns in the expanded feature matrix.

breton_cretenet.main module

breton_cretenet.main.get_args(args=None)[source]

Parse command-line arguments. Parameters ———- args : list, optional

List of command-line arguments to parse. The default is None.

Returns

argsargparse.Namespace

Parsed arguments.

breton_cretenet.main.main(args_test=None)[source]

The function executes a machine learning workflow on a given dataset, performs feature engineering, feature selection, model training, and model evaluation.

Parameters

args_testlist, optional

List of arguments for testing the package. Default is None.

Returns

None

Notes

This function performs the following steps: 1. Loads the dataset and concatenates multiple datasets if applicable. 2. Splits the dataset into training and testing sets. 3. Performs polynomial feature engineering on the dataset. 4. Scales the dataset. 5. Performs feature selection on the dataset. Optional 6. Trains the model(s) on the training set and evaluates the performance on the testing set. 7. Concatenates the results and outputs them in a formatted table.

breton_cretenet.test module

breton_cretenet.test.rand_data()[source]

A function that returns a random dataset for the tests.

Parameters:

None

Returns:

numpy.ndarray

An array of size (10, 6) with random features and labels

breton_cretenet.test.test_decision_tree_regressor_algorithm()[source]

Test function to ensure that the decision_tree_regressor_algorithm function returns an instance of the DecisionTreeRegressor class.

Parameters:

None

Returns:

None

breton_cretenet.test.test_get_data_column_names(input_data, expected_output)[source]

Test the get_data_column_names function from the data_preparator module.

Parameters

input_datastr

The input data to pass to get_data_column_names. This can be a URL or a local file path.

expected_outputlist or None

The expected output of get_data_column_names when called with input_data. If input_data is an invalid file path, this should be None.

Raises

ValueError

If get_data_column_names is called with an invalid file path and expected_output is None.

Returns

None

breton_cretenet.test.test_lasso_regression_feature_selection()[source]

Test the lasso_regression_feature_selection function.

Parameters

None

Returns

None

breton_cretenet.test.test_linear_regression_algorithm()[source]

Test function to ensure that the linear_regression_algorithm function returns an instance of the LinearRegression class.

Parameters:

None

Returns:

None

breton_cretenet.test.test_load_data(input, expected_shape)[source]

Test the load_data function of the data_preparator module using parameterized inputs.

Parameters:

inputstr

The URL of the data file to load.

expected_shapetuple of int

The expected shape of the NumPy array returned by the load_data function.

Raises:

ValueError

If the load_data function is called with an invalid URL.

Returns:

None

breton_cretenet.test.test_main(dataset)[source]

Test function for the main method in the codebase.

Parameters

datasetstr

Name of the dataset to use for testing.

Returns

None

This function does not return anything.

Raises

AssertionError

If the test fails.

breton_cretenet.test.test_main_pull_request(dataset, random_state, degree, preprocessing, feature_selection, algorithm, max_depth)[source]

Test function for the main method in the codebase.

Parameters

datasetstr

Name of the dataset to use for testing.

random_stateint

Seed value for the random number generator.

degreeint

Degree of the polynomial features to generate.

preprocessingstr

Type of preprocessing to apply to the data.

feature_selectionbool

Whether or not to perform feature selection.

algorithmstr

Type of algorithm to use for testing.

max_depthint

Maximum depth of the decision tree for testing.

Returns

None

This function does not return anything.

Raises

AssertionError

If the test fails.

breton_cretenet.test.test_predict_from_regressor()[source]

Test function to ensure that the predict_from_regressor function returns an array of predictions with the same length as the input array.

Parameters:

None

Returns:

None

breton_cretenet.test.test_preparator_is_random_if_no_seed()[source]

Test function to ensure that the preparator returns random splits.

Parameters:

None

Returns:

None

breton_cretenet.test.test_preparator_with_seed()[source]

Test function to ensure that the preparator gives fixed splits if the seed is set.

Parameters:

None

Returns:

None

breton_cretenet.test.test_preparator_xy_alignement()[source]

Test function to ensure that the preparator keeps the features and the labels grouped correctly after the shuffling.

Parameters:

None

Returns:

None

breton_cretenet.test.test_preprocessor_inexistant_method()[source]

Test function to ensure that the if no existing method is selected, then the standardization is applied.

Parameters:

None

Returns:

None

breton_cretenet.test.test_preprocessor_minmax()[source]

Test function to ensure that the MinMax method of the preprocessor is correctly implemented.

Parameters:

None

Returns:

None

breton_cretenet.test.test_preprocessor_polynomial()[source]

Test function to ensure that the Polynomial Features method of the preprocessor is correctly implemented.

Parameters:

None

Returns:

None

breton_cretenet.test.test_preprocessor_robust()[source]

Test function to ensure that the robust scaler method of the preprocessor is correctly implemented.

Parameters:

None

Returns:

None

breton_cretenet.test.test_preprocessor_standard()[source]

Test function to ensure that the standard method of the preprocessor is correctly implemented.

Parameters:

None

Returns:

None

Module contents