breton_cretenet package

Submodules

breton_cretenet.algorithm module

breton_cretenet.algorithm.decision_tree_regressor_algorithm(X_train, y_train, X_train_labels, max_depth=2, random_state=0, verbose=1)[source]

Fit a decision tree regression model to the training data.

Parameters

X_trainnumpy.ndarray: Training input data of shape (n_samples, n_features).
y_trainnumpy.ndarray: Target values of shape (n_samples,).
X_train_labelslist: List of strings representing the feature names.
max_depthint, optional (default=2): The maximum depth of the decision tree.
random_stateint, optional (default=0): Seed used by the random number generator.
verboseint, optional: Verbosity level of information, by default 1.

Returns

sklearn.tree.DecisionTreeRegressor: A fitted decision tree regression model.

breton_cretenet.algorithm.lasso_regression_feature_selection(X_train, y_train, X_train_labels, X_test, verbose=1)[source]

Apply Lasso regression feature selection to the training data.

Parameters

X_trainnumpy.ndarray: Training input data of shape (n_samples, n_features).
y_trainnumpy.ndarray: Target values of shape (n_samples,).
X_train_labelslist: List of strings representing the feature names.
X_testnumpy.ndarray: Training input data of shape (n_samples, n_features).
verboseint, optional: Verbosity level of information, by default 1.

Returns

tuple: A tuple containing the selected training input data of shape (n_samples, n_selected_features) and a list of strings representing the names of the selected features. If the number of training samples is less than or equal to 50, the function returns the original input data and feature names unchanged.

breton_cretenet.algorithm.linear_regression_algorithm(X_train, y_train, X_train_labels, verbose=1)[source]

Fit a linear regression model to the training data.

Parameters

X_trainnumpy.ndarray: Training input data of shape (n_samples, n_features).
y_trainnumpy.ndarray: Target values of shape (n_samples,).
X_train_labelslist: List of strings representing the feature names.
verboseint, optional: Verbosity level of information, by default 1.

Returns

sklearn.linear_model.LinearRegression: A fitted linear regression model.

breton_cretenet.algorithm.predict_from_regressor(model, X, X_labels)[source]

Predict the target values for new input data using a given regression model.

Parameters

modelsklearn estimator: A fitted regression model.
Xnumpy.ndarray: Input data of shape (n_samples, n_features).
X_labelslist: List of strings representing the feature names.

Returns

numpy.ndarray: Predicted target values of shape (n_samples,).

breton_cretenet.algorithm.score(y_true, y_predict)[source]

Calculate the mean absolute error (MAE) between true and predicted values.

Parameters

y_truenp.ndarray: Correct target values.
y_predictnp.ndarray: Estimated target values.

Returns

float: Mean absolute error between y_true and y_predict.

Examples

>>> y_true = np.array([3, -0.5, 2, 7])
>>> y_predict = np.array([2.5, 0.0, 2, 8])
>>> score(y_true, y_predict)
0.5

breton_cretenet.data_preparator module

breton_cretenet.data_preparator.detect_column_names_from_file(file)[source]

Given a file, this function reads the file line by line and detects the column names in it.

Parameters

file: a file object, representing the file to be read.

Returns

column_names: a list of strings, containing the column names detected in the file.

breton_cretenet.data_preparator.get_data_column_names(input)[source]

Returns the names of columns in a data file.

Parameters

inputstr: A URL or local path to a data file.

Returns

list: A list of column names.

Raises

ValueError: If the file does not exist.

Notes

This function detects the type of data file by checking its extension and assumes that the file is either a CSV or a text file with space-separated values. It then reads the file and returns the column names.

breton_cretenet.data_preparator.load_data(input, verbose=1)[source]

Load data from a CSV or fixed-width file into a NumPy array.

Parameters

inputstr: The path to the input file. Can be a local file path or a URL.
verboseint, optional: Verbosity level of information, by default 1.

Returns

numpy.ndarray: A 2-dimensional NumPy array containing the loaded data.

Raises

ValueError: If the input file format is not recognized.

Notes

This function reads data from CSV or fixed-width files using the pandas library, and converts it to a NumPy array. The file format is determined based on the file extension: “.csv” files are assumed to be comma-separated, while “.data” files are assumed to be fixed-width. The header of the file is ignored.

breton_cretenet.data_preparator.prepare(dataset, random_state=None, stratify=None)[source]

Creates a training and a test set from the features X and labels y in dataset.

Parameters

datasetnumpy.ndarray: Dataset of shape (n_samples, n_features), with labels in the last columns and features in the other columns.
random_stateint, optional: Seed chosen for the train test split. If no argument is given, the seed is not fixed.
stratifylist, optional: If not None, the dataset is split in a stratified fashion, using this as the class labels.

Returns

numpy.ndarray: X_train, an array containing the features of the training set.
numpy.ndarray: X_test, an array containing the features of the test set.
numpy.ndarray: y_train, an array containing the labels of the training set.
numpy.ndarray: y_test, an array containing the labels of the test set.

breton_cretenet.data_preprocessor module

breton_cretenet.data_preprocessor.preprocess(X_train, X_test, method='standardize', verbose=1)[source]

Creates a training and a test set from the features X and labels y in dataset.

Parameters

X_trainnumpy.ndarray: Array containing the features of the training set.
X_testnumpy.ndarray: Array containing the features of the test set.
methodstring, optional: Selects the preprocessing method we want to apply, if None selected, then “standardize” is chosen by default.
verboseint, optional: Verbosity level of information, by default 1.

Returns

numpy.ndarray: An array containing the preprocessed features of the training set.
numpy.ndarray: An array containing the preprocessed features of the test set.

breton_cretenet.data_preprocessor.preprocess_polynomialfeatures(data, data_column_names, degree=2, verbose=1)[source]

Applies polynomial feature expansion to a Numpy array of data, and returns the resulting array and the names of the columns in the expanded feature matrix.

Parameters:

datanumpy.ndarray: The input array of data to be expanded.
data_column_nameslist: A list of the names of the columns in the input data array.
degreeint, optional (default=2): The degree of the polynomial features to generate.
verboseint, optional: Verbosity level of information, by default 1.

Returns:

tuple(numpy.ndarray, list)

A tuple containing two elements:

A Numpy array representing the expanded feature matrix.
A list of the names of the columns in the expanded feature matrix.

breton_cretenet.main module

breton_cretenet.main.get_args(args=None)[source]

Parse command-line arguments. Parameters ———- args : list, optional

List of command-line arguments to parse. The default is None.

Returns

argsargparse.Namespace: Parsed arguments.

breton_cretenet.main.main(args_test=None)[source]

The function executes a machine learning workflow on a given dataset, performs feature engineering, feature selection, model training, and model evaluation.

Parameters

args_testlist, optional: List of arguments for testing the package. Default is None.

Returns

None

Notes

This function performs the following steps: 1. Loads the dataset and concatenates multiple datasets if applicable. 2. Splits the dataset into training and testing sets. 3. Performs polynomial feature engineering on the dataset. 4. Scales the dataset. 5. Performs feature selection on the dataset. Optional 6. Trains the model(s) on the training set and evaluates the performance on the testing set. 7. Concatenates the results and outputs them in a formatted table.

breton_cretenet.test module

breton_cretenet.test.rand_data()[source]

A function that returns a random dataset for the tests.

Parameters:

None

Returns:

numpy.ndarray: An array of size (10, 6) with random features and labels

breton_cretenet.test.test_decision_tree_regressor_algorithm()[source]: Test function to ensure that the decision_tree_regressor_algorithm function returns an instance of the DecisionTreeRegressor class.

Parameters:

None

Returns:

None

breton_cretenet.test.test_get_data_column_names(input_data, expected_output)[source]

Test the get_data_column_names function from the data_preparator module.

Parameters

input_datastr: The input data to pass to get_data_column_names. This can be a URL or a local file path.
expected_outputlist or None: The expected output of get_data_column_names when called with input_data. If input_data is an invalid file path, this should be None.

Raises

ValueError: If get_data_column_names is called with an invalid file path and expected_output is None.

Returns

None

breton_cretenet.test.test_lasso_regression_feature_selection()[source]: Test the lasso_regression_feature_selection function.

Parameters

None

Returns

None

breton_cretenet.test.test_linear_regression_algorithm()[source]: Test function to ensure that the linear_regression_algorithm function returns an instance of the LinearRegression class.

Parameters:

None

Returns:

None

breton_cretenet.test.test_load_data(input, expected_shape)[source]

Test the load_data function of the data_preparator module using parameterized inputs.

Parameters:

inputstr: The URL of the data file to load.
expected_shapetuple of int: The expected shape of the NumPy array returned by the load_data function.

Raises:

ValueError: If the load_data function is called with an invalid URL.

Returns:

None

breton_cretenet.test.test_main(dataset)[source]

Test function for the main method in the codebase.

Parameters

datasetstr: Name of the dataset to use for testing.

Returns

None: This function does not return anything.

Raises

AssertionError: If the test fails.

breton_cretenet.test.test_main_pull_request(dataset, random_state, degree, preprocessing, feature_selection, algorithm, max_depth)[source]

Test function for the main method in the codebase.

Parameters

datasetstr: Name of the dataset to use for testing.
random_stateint: Seed value for the random number generator.
degreeint: Degree of the polynomial features to generate.
preprocessingstr: Type of preprocessing to apply to the data.
feature_selectionbool: Whether or not to perform feature selection.
algorithmstr: Type of algorithm to use for testing.
max_depthint: Maximum depth of the decision tree for testing.

Returns

None: This function does not return anything.

Raises

AssertionError: If the test fails.

breton_cretenet.test.test_predict_from_regressor()[source]: Test function to ensure that the predict_from_regressor function returns an array of predictions with the same length as the input array.

Parameters:

None

Returns:

None

breton_cretenet.test.test_preparator_is_random_if_no_seed()[source]: Test function to ensure that the preparator returns random splits.

Parameters:

None

Returns:

None

breton_cretenet.test.test_preparator_with_seed()[source]: Test function to ensure that the preparator gives fixed splits if the seed is set.

Parameters:

None

Returns:

None

breton_cretenet.test.test_preparator_xy_alignement()[source]: Test function to ensure that the preparator keeps the features and the labels grouped correctly after the shuffling.

Parameters:

None

Returns:

None

breton_cretenet.test.test_preprocessor_inexistant_method()[source]: Test function to ensure that the if no existing method is selected, then the standardization is applied.

Parameters:

None

Returns:

None

breton_cretenet.test.test_preprocessor_minmax()[source]: Test function to ensure that the MinMax method of the preprocessor is correctly implemented.

Parameters:

None

Returns:

None

breton_cretenet.test.test_preprocessor_polynomial()[source]: Test function to ensure that the Polynomial Features method of the preprocessor is correctly implemented.

Parameters:

None

Returns:

None

breton_cretenet.test.test_preprocessor_robust()[source]: Test function to ensure that the robust scaler method of the preprocessor is correctly implemented.

Parameters:

None

Returns:

None

breton_cretenet.test.test_preprocessor_standard()[source]: Test function to ensure that the standard method of the preprocessor is correctly implemented.

Parameters:

None

Returns:

None

breton_cretenet package

Submodules

breton_cretenet.algorithm module

Parameters

Returns

Parameters

Returns

Parameters

Returns

Parameters

Returns

Parameters

Returns

Examples

breton_cretenet.data_preparator module

Parameters

Returns

Parameters

Returns

Raises

Notes

Parameters

Returns

Raises

Notes

Parameters

Returns

breton_cretenet.data_preprocessor module

Parameters

Returns

Parameters:

Returns:

breton_cretenet.main module

Returns

Parameters

Returns

Notes

breton_cretenet.test module

Parameters:

Returns:

Parameters:

Returns:

Parameters

Raises

Returns

Parameters

Returns

Parameters:

Returns:

Parameters:

Raises:

Returns:

Parameters

Returns

Raises

Parameters

Returns

Raises

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Module contents