breton_cretenet package
Submodules
breton_cretenet.algorithm module
- breton_cretenet.algorithm.decision_tree_regressor_algorithm(X_train, y_train, X_train_labels, max_depth=2, random_state=0, verbose=1)[source]
Fit a decision tree regression model to the training data.
Parameters
- X_trainnumpy.ndarray
Training input data of shape (n_samples, n_features).
- y_trainnumpy.ndarray
Target values of shape (n_samples,).
- X_train_labelslist
List of strings representing the feature names.
- max_depthint, optional (default=2)
The maximum depth of the decision tree.
- random_stateint, optional (default=0)
Seed used by the random number generator.
- verboseint, optional
Verbosity level of information, by default 1.
Returns
- sklearn.tree.DecisionTreeRegressor
A fitted decision tree regression model.
- breton_cretenet.algorithm.lasso_regression_feature_selection(X_train, y_train, X_train_labels, X_test, verbose=1)[source]
Apply Lasso regression feature selection to the training data.
Parameters
- X_trainnumpy.ndarray
Training input data of shape (n_samples, n_features).
- y_trainnumpy.ndarray
Target values of shape (n_samples,).
- X_train_labelslist
List of strings representing the feature names.
- X_testnumpy.ndarray
Training input data of shape (n_samples, n_features).
- verboseint, optional
Verbosity level of information, by default 1.
Returns
- tuple
A tuple containing the selected training input data of shape (n_samples, n_selected_features) and a list of strings representing the names of the selected features. If the number of training samples is less than or equal to 50, the function returns the original input data and feature names unchanged.
- breton_cretenet.algorithm.linear_regression_algorithm(X_train, y_train, X_train_labels, verbose=1)[source]
Fit a linear regression model to the training data.
Parameters
- X_trainnumpy.ndarray
Training input data of shape (n_samples, n_features).
- y_trainnumpy.ndarray
Target values of shape (n_samples,).
- X_train_labelslist
List of strings representing the feature names.
- verboseint, optional
Verbosity level of information, by default 1.
Returns
- sklearn.linear_model.LinearRegression
A fitted linear regression model.
- breton_cretenet.algorithm.predict_from_regressor(model, X, X_labels)[source]
Predict the target values for new input data using a given regression model.
Parameters
- modelsklearn estimator
A fitted regression model.
- Xnumpy.ndarray
Input data of shape (n_samples, n_features).
- X_labelslist
List of strings representing the feature names.
Returns
- numpy.ndarray
Predicted target values of shape (n_samples,).
- breton_cretenet.algorithm.score(y_true, y_predict)[source]
Calculate the mean absolute error (MAE) between true and predicted values.
Parameters
- y_truenp.ndarray
Correct target values.
- y_predictnp.ndarray
Estimated target values.
Returns
- float
Mean absolute error between y_true and y_predict.
Examples
>>> y_true = np.array([3, -0.5, 2, 7]) >>> y_predict = np.array([2.5, 0.0, 2, 8]) >>> score(y_true, y_predict) 0.5
breton_cretenet.data_preparator module
- breton_cretenet.data_preparator.detect_column_names_from_file(file)[source]
Given a file, this function reads the file line by line and detects the column names in it.
Parameters
file: a file object, representing the file to be read.
Returns
column_names: a list of strings, containing the column names detected in the file.
- breton_cretenet.data_preparator.get_data_column_names(input)[source]
Returns the names of columns in a data file.
Parameters
- inputstr
A URL or local path to a data file.
Returns
- list
A list of column names.
Raises
- ValueError
If the file does not exist.
Notes
This function detects the type of data file by checking its extension and assumes that the file is either a CSV or a text file with space-separated values. It then reads the file and returns the column names.
- breton_cretenet.data_preparator.load_data(input, verbose=1)[source]
Load data from a CSV or fixed-width file into a NumPy array.
Parameters
- inputstr
The path to the input file. Can be a local file path or a URL.
- verboseint, optional
Verbosity level of information, by default 1.
Returns
- numpy.ndarray
A 2-dimensional NumPy array containing the loaded data.
Raises
- ValueError
If the input file format is not recognized.
Notes
This function reads data from CSV or fixed-width files using the pandas library, and converts it to a NumPy array. The file format is determined based on the file extension: “.csv” files are assumed to be comma-separated, while “.data” files are assumed to be fixed-width. The header of the file is ignored.
- breton_cretenet.data_preparator.prepare(dataset, random_state=None, stratify=None)[source]
Creates a training and a test set from the features X and labels y in dataset.
Parameters
- datasetnumpy.ndarray
Dataset of shape (n_samples, n_features), with labels in the last columns and features in the other columns.
- random_stateint, optional
Seed chosen for the train test split. If no argument is given, the seed is not fixed.
- stratifylist, optional
If not None, the dataset is split in a stratified fashion, using this as the class labels.
Returns
- numpy.ndarray
X_train, an array containing the features of the training set.
- numpy.ndarray
X_test, an array containing the features of the test set.
- numpy.ndarray
y_train, an array containing the labels of the training set.
- numpy.ndarray
y_test, an array containing the labels of the test set.
breton_cretenet.data_preprocessor module
- breton_cretenet.data_preprocessor.preprocess(X_train, X_test, method='standardize', verbose=1)[source]
Creates a training and a test set from the features X and labels y in dataset.
Parameters
- X_trainnumpy.ndarray
Array containing the features of the training set.
- X_testnumpy.ndarray
Array containing the features of the test set.
- methodstring, optional
Selects the preprocessing method we want to apply, if None selected, then “standardize” is chosen by default.
- verboseint, optional
Verbosity level of information, by default 1.
Returns
- numpy.ndarray
An array containing the preprocessed features of the training set.
- numpy.ndarray
An array containing the preprocessed features of the test set.
- breton_cretenet.data_preprocessor.preprocess_polynomialfeatures(data, data_column_names, degree=2, verbose=1)[source]
Applies polynomial feature expansion to a Numpy array of data, and returns the resulting array and the names of the columns in the expanded feature matrix.
Parameters:
- datanumpy.ndarray
The input array of data to be expanded.
- data_column_nameslist
A list of the names of the columns in the input data array.
- degreeint, optional (default=2)
The degree of the polynomial features to generate.
- verboseint, optional
Verbosity level of information, by default 1.
Returns:
- tuple(numpy.ndarray, list)
- A tuple containing two elements:
A Numpy array representing the expanded feature matrix.
A list of the names of the columns in the expanded feature matrix.
breton_cretenet.main module
- breton_cretenet.main.get_args(args=None)[source]
Parse command-line arguments. Parameters ———- args : list, optional
List of command-line arguments to parse. The default is None.
Returns
- argsargparse.Namespace
Parsed arguments.
- breton_cretenet.main.main(args_test=None)[source]
The function executes a machine learning workflow on a given dataset, performs feature engineering, feature selection, model training, and model evaluation.
Parameters
- args_testlist, optional
List of arguments for testing the package. Default is None.
Returns
None
Notes
This function performs the following steps: 1. Loads the dataset and concatenates multiple datasets if applicable. 2. Splits the dataset into training and testing sets. 3. Performs polynomial feature engineering on the dataset. 4. Scales the dataset. 5. Performs feature selection on the dataset. Optional 6. Trains the model(s) on the training set and evaluates the performance on the testing set. 7. Concatenates the results and outputs them in a formatted table.
breton_cretenet.test module
- breton_cretenet.test.rand_data()[source]
A function that returns a random dataset for the tests.
Parameters:
None
Returns:
- numpy.ndarray
An array of size (10, 6) with random features and labels
- breton_cretenet.test.test_decision_tree_regressor_algorithm()[source]
Test function to ensure that the decision_tree_regressor_algorithm function returns an instance of the DecisionTreeRegressor class.
Parameters:
None
Returns:
None
- breton_cretenet.test.test_get_data_column_names(input_data, expected_output)[source]
Test the get_data_column_names function from the data_preparator module.
Parameters
- input_datastr
The input data to pass to get_data_column_names. This can be a URL or a local file path.
- expected_outputlist or None
The expected output of get_data_column_names when called with input_data. If input_data is an invalid file path, this should be None.
Raises
- ValueError
If get_data_column_names is called with an invalid file path and expected_output is None.
Returns
None
- breton_cretenet.test.test_lasso_regression_feature_selection()[source]
Test the lasso_regression_feature_selection function.
Parameters
None
Returns
None
- breton_cretenet.test.test_linear_regression_algorithm()[source]
Test function to ensure that the linear_regression_algorithm function returns an instance of the LinearRegression class.
Parameters:
None
Returns:
None
- breton_cretenet.test.test_load_data(input, expected_shape)[source]
Test the load_data function of the data_preparator module using parameterized inputs.
Parameters:
- inputstr
The URL of the data file to load.
- expected_shapetuple of int
The expected shape of the NumPy array returned by the load_data function.
Raises:
- ValueError
If the load_data function is called with an invalid URL.
Returns:
None
- breton_cretenet.test.test_main(dataset)[source]
Test function for the main method in the codebase.
Parameters
- datasetstr
Name of the dataset to use for testing.
Returns
- None
This function does not return anything.
Raises
- AssertionError
If the test fails.
- breton_cretenet.test.test_main_pull_request(dataset, random_state, degree, preprocessing, feature_selection, algorithm, max_depth)[source]
Test function for the main method in the codebase.
Parameters
- datasetstr
Name of the dataset to use for testing.
- random_stateint
Seed value for the random number generator.
- degreeint
Degree of the polynomial features to generate.
- preprocessingstr
Type of preprocessing to apply to the data.
- feature_selectionbool
Whether or not to perform feature selection.
- algorithmstr
Type of algorithm to use for testing.
- max_depthint
Maximum depth of the decision tree for testing.
Returns
- None
This function does not return anything.
Raises
- AssertionError
If the test fails.
- breton_cretenet.test.test_predict_from_regressor()[source]
Test function to ensure that the predict_from_regressor function returns an array of predictions with the same length as the input array.
Parameters:
None
Returns:
None
- breton_cretenet.test.test_preparator_is_random_if_no_seed()[source]
Test function to ensure that the preparator returns random splits.
Parameters:
None
Returns:
None
- breton_cretenet.test.test_preparator_with_seed()[source]
Test function to ensure that the preparator gives fixed splits if the seed is set.
Parameters:
None
Returns:
None
- breton_cretenet.test.test_preparator_xy_alignement()[source]
Test function to ensure that the preparator keeps the features and the labels grouped correctly after the shuffling.
Parameters:
None
Returns:
None
- breton_cretenet.test.test_preprocessor_inexistant_method()[source]
Test function to ensure that the if no existing method is selected, then the standardization is applied.
Parameters:
None
Returns:
None
- breton_cretenet.test.test_preprocessor_minmax()[source]
Test function to ensure that the MinMax method of the preprocessor is correctly implemented.
Parameters:
None
Returns:
None
- breton_cretenet.test.test_preprocessor_polynomial()[source]
Test function to ensure that the Polynomial Features method of the preprocessor is correctly implemented.
Parameters:
None
Returns:
None