predict-statistics
Documentation / statistics/predict-statistics
trainModels()
function trainModels(
fullData: Object[],
targetName: string,
options: object): number;
Defined in: statistics/predict-statistics.js:103
Trains an XGBoost model on preprocessed data and evaluates its performance.
XGBoost (eXtreme Gradient Boosting) works by sequentially building decision trees where each new tree corrects errors made by the ensemble of previous trees. It uses gradient descent to minimize a loss function by adding trees that predict the residuals or errors of prior trees, then combining them through boosting. The algorithm employs regularization techniques to prevent overfitting and handles missing values effectively through its sparsity-aware split finding approach.
Parameters
Parameter | Type | Description |
---|---|---|
|
| Preprocessed training data as array of objects with numeric values only |
|
| Name of the target variable column to predict |
| { | Configuration options for model training |
| { | XGBoost hyperparameters like learning_rate, max_depth, etc. // General Parameters |
|
| [default=1] Controls the verbosity of XGBoost's output 0: silent mode (no messages) 1: warnings only 2: info messages 3: debug messages // Tree Booster Parameters (Control tree structure) |
|
| [default=6] Maximum depth of each tree Controls model complexity. Higher values create more complex trees that may overfit. Reduced from 8 to 6 to limit tree complexity and prevent overfitting. |
|
| [default=0.3, alias: learning_rate] Step size shrinkage Controls how much weight is given to new trees in each boosting round. Smaller values (0.1) make the model more robust by shrinking feature weights. Set to 0.1 to allow more conservative boosting, requiring more trees but improving generalization. |
|
| Specifies the learning task and objective 'reg:squarederror': Regression with squared loss (minimize MSE) Options include classification objectives, ranking, and other regression metrics. |
|
| Number of parallel threads used for training Set to 4 to utilize multi-core processing without overwhelming the system. |
|
| [default=1] Fraction of training instances used per tree Values < 1 implement random sampling of the training data for each tree. Set to 0.9 to reduce overfitting by introducing randomness while using most of the data. |
|
| [default=1] Fraction of features used per tree Controls feature sampling for each tree, similar to Random Forest. Set to 0.9 to reduce overfitting and create diverse trees. |
|
| [default=1] Minimum sum of instance weight in a child Controls the minimum number of instances needed in a leaf node. Set to 3 to prevent the model from creating overly specific rules based on few samples. |
|
| [default=0, alias: min_split_loss] Minimum loss reduction for a split Controls the minimum reduction in the loss function required to make a split. Set to 0.1 to make splitting more conservative and reduce overfitting. // Regularization Parameters |
|
| [default=0, alias: reg_alpha] L1 regularization on weights Encourages sparsity by penalizing non-zero weights (feature selection). Set to 0 as gamma is being used for regularization. |
|
| [default=1, alias: reg_lambda] L2 regularization on weights Penalizes large weights to prevent overfitting (similar to Ridge regression). Default value of 1 provides moderate regularization. // Learning Control Parameters |
|
| Stop training if performance doesn't improve Stops adding trees when the validation metric doesn't improve for specified rounds. Set to 20 to prevent overfitting by stopping when the model stops improving. |
|
| [default=0] Random number seed for reproducibility Set to 42 to ensure consistent results across training runs. |
|
| Number of boosting rounds (trees to build) Set to 1000 to compensate for the lower learning rate (eta), allowing the model to converge slowly but more accurately. |
|
| Proportion of data to use for testing (default: 0.2) |
|
| Specific feature columns to use for training |
Returns
number
R² value (coefficient of determination) indicating model accuracy
See
Example
let data = [
{
"feature1": 1,
"feature2": 2,
"target": 3
}
];
let options = {
xgbParams: {
verbosity: 0,
max_depth: 7,
eta: 0.07,
objective: 'reg:squarederror',
nthread: 4,
}
};
let accuracy = await trainModels(data, 'target', options);
console.log(accuracy);
predictFuture()
function predictFuture(futureData: Object[], options: object): Promise<Object[]>;
Defined in: statistics/predict-statistics.js:243
Predicts energy output for future data using the trained XGBoost model
Parameters
Parameter | Type | Description |
---|---|---|
|
| Array of weather data objects for future dates |
| { } | ‐ |
Returns
Promise
<Object
[]>
Promise resolving to array of data objects with predictions
saveModel()
function saveModel(modelPath: string): Promise<void>;
Defined in: statistics/predict-statistics.js:280
Saves the trained XGBoost model to the specified file path
Parameters
Parameter | Type | Description |
---|---|---|
|
| Path where the model should be saved |
Returns
Promise
<void
>
Promise that resolves when the model is saved
loadModel()
function loadModel(modelPath: string): Promise<void>;
Defined in: statistics/predict-statistics.js:289
Loads a trained XGBoost model from the specified file path
Parameters
Parameter | Type | Description |
---|---|---|
|
| Path to the saved model file |
Returns
Promise
<void
>
Promise that resolves when the model is loaded
calculateRollingStats()
function calculateRollingStats(
data: any[],
field: string,
window: number): any[];
Defined in: statistics/predict-statistics.js:303
Calculate rolling statistics for a given array of values
Parameters
Parameter | Type | Default value | Description |
---|---|---|---|
|
|
| Array of data objects |
|
|
| Field name to calculate rolling stats for |
|
|
| Rolling window size |
Returns
any
[]
Array with added rolling statistics