diff --git "a/numpyml-display/final_dataset.jsonl" "b/numpyml-display/final_dataset.jsonl"
new file mode 100644--- /dev/null
+++ "b/numpyml-display/final_dataset.jsonl"
@@ -0,0 +1,57 @@
+{"title": "BallTree", "class_annotation": "numpy_ml.utils.data_structures.BallTree(leaf_size=40, metric=None)", "comment": "\"BallTree\"\n\n**********\n\n\n\nclass numpy_ml.utils.data_structures.BallTree(leaf_size=40, metric=None)\n\n\n\n   A ball tree data structure.\n\n\n\n   -[ Notes ]-\n\n\n\n   A ball tree is a binary tree in which every node defines a\n\n   *D*-dimensional hypersphere (\"ball\") containing a subset of the\n\n   points to be searched. Each internal node of the tree partitions\n\n   the data points into two disjoint sets which are associated with\n\n   different balls. While the balls themselves may intersect, each\n\n   point is assigned to one or the other ball in the partition\n\n   according to its distance from the ball's center. Each leaf node in\n\n   the tree defines a ball and enumerates all data points inside that\n\n   ball.\n\n\n\n   Parameters:\n\n      * **leaf_size** (*int*) -- The maximum number of datapoints at\n\n        each leaf. Default is 40.\n\n\n\n      * **metric** (Distance metric or None) -- The distance metric to\n\n        use for computing nearest neighbors. If None, use the\n\n        \"euclidean()\" metric. Default is None.\n\n\n\n   -[ References ]-\n\n\n\n   [1] Omohundro, S. M. (1989). \"Five balltree construction\n\n       algorithms\". *ICSI Technical Report TR-89-063*.\n\n\n\n   [2] Liu, T., Moore, A., & Gray A. (2006). \"New algorithms for\n\n       efficient high-dimensional nonparametric classification\". *J.\n\n       Mach. Learn. Res., 7*, 1135-1158.\n\n\n\n   fit(X, y=None)\n\n\n\n      Build a ball tree recursively using the O(M log N) *k*-d\n\n      construction algorithm.\n\n\n\n      -[ Notes ]-\n\n\n\n      Recursively divides data into nodes defined by a centroid *C*\n\n      and radius *r* such that each point below the node lies within\n\n      the hyper-sphere defined by *C* and *r*.\n\n\n\n      Parameters:\n\n         * **X** (\"ndarray\" of shape *(N, M)*) -- An array of *N*\n\n           examples each with *M* features.\n\n\n\n         * **y** (\"ndarray\" of shape *(N, *)* or None) -- An array of\n\n           target values / labels associated with the entries in *X*.\n\n           Default is None.\n\n\n\n   nearest_neighbors(k, x)\n\n\n\n      Find the *k* nearest neighbors in the ball tree to a query\n\n      vector *x* using the KNS1 algorithm.\n\n\n\n      Parameters:\n\n         * **k** (*int*) -- The number of closest points in *X* to\n\n           return\n\n\n\n         * **x** (\"ndarray\" of shape *(1, M)*) -- The query vector.\n\n\n\n      Returns:\n\n         **nearest** (list of \"PQNode\" s of length *k*) -- List of the\n\n         *k* points in *X* to closest to the query vector. The \"key\"\n\n         attribute of each \"PQNode\" contains the point itself, the\n\n         \"val\" attribute contains its target, and the \"distance\"\n\n         attribute contains its distance to the query vector.\n", "class_name": "numpy_ml.utils.data_structures.BallTree", "class_link": "numpy_ml/utils/data_structures.py#L197-L338", "test_file_path": "numpy_ml/tests/test_BallTree.py"}
+{"title": "BatchNorm2D", "class_annotation": "numpy_ml.neural_nets.layers.BatchNorm2D(momentum=0.9, epsilon=1e-05, optimizer=None)", "comment": "\"BatchNorm2D\"\n\n*************\n\n\n\nclass numpy_ml.neural_nets.layers.BatchNorm2D(momentum=0.9, epsilon=1e-05, optimizer=None)\n\n\n\n   Bases: \"LayerBase\"\n\n\n\n   A batch normalization layer for two-dimensional inputs with an\n\n   additional channel dimension.\n\n\n\n   -[ Notes ]-\n\n\n\n   BatchNorm is an attempt address the problem of internal covariate\n\n   shift (ICS) during training by normalizing layer inputs.\n\n\n\n   ICS refers to the change in the distribution of layer inputs during\n\n   training as a result of the changing parameters of the previous\n\n   layer(s). ICS can make it difficult to train models with saturating\n\n   nonlinearities, and in general can slow training by requiring a\n\n   lower learning rate.\n\n\n\n   Equations [train]:\n\n\n\n      Y = scaler * norm(X) + intercept\n\n      norm(X) = (X - mean(X)) / sqrt(var(X) + epsilon)\n\n\n\n   Equations [test]:\n\n\n\n      Y = scaler * running_norm(X) + intercept\n\n      running_norm(X) = (X - running_mean) / sqrt(running_var + epsilon)\n\n\n\n   In contrast to \"LayerNorm2D\", the BatchNorm layer calculates the\n\n   mean and var across the *batch* rather than the output features.\n\n   This has two disadvantages:\n\n\n\n      1. It is highly affected by batch size: smaller mini-batch sizes\n\n      increase the variance of the estimates for the global mean and\n\n      variance.\n\n\n\n      2. It is difficult to apply in RNNs -- one must fit a separate\n\n      BatchNorm layer for *each* time-step.\n\n\n\n   Parameters:\n\n      * **momentum** (*float*) -- The momentum term for the running\n\n        mean/running std calculations. The closer this is to 1, the\n\n        less weight will be given to the mean/std of the current batch\n\n        (i.e., higher smoothing). Default is 0.9.\n\n\n\n      * **epsilon** (*float*) -- A small smoothing constant to use\n\n        during computation of \"norm(X)\" to avoid divide-by-zero\n\n        errors. Default is 1e-5.\n\n\n\n      * **optimizer** (str, Optimizer object, or None) -- The\n\n        optimization strategy to use when performing gradient updates\n\n        within the \"update()\" method.  If None, use the \"SGD\"\n\n        optimizer with default parameters. Default is None.\n\n\n\n   Variables:\n\n      * **X** (*list*) -- Running list of inputs to the \"forward\"\n\n        method since the last call to \"update\". Only updated if the\n\n        *retain_derived* argument was set to True.\n\n\n\n      * **gradients** (*dict*) -- Dictionary of loss gradients with\n\n        regard to the layer parameters\n\n\n\n      * **parameters** (*dict*) -- Dictionary of layer parameters\n\n\n\n      * **hyperparameters** (*dict*) -- Dictionary of layer\n\n        hyperparameters\n\n\n\n      * **derived_variables** (*dict*) -- Dictionary of any\n\n        intermediate values computed during forward/backward\n\n        propagation.\n\n\n\n   property hyperparameters\n\n\n\n      Return a dictionary containing the layer hyperparameters.\n\n\n\n   reset_running_stats()\n\n\n\n      Reset the running mean and variance estimates to 0 and 1.\n\n\n\n   forward(X, retain_derived=True)\n\n\n\n      Compute the layer output on a single minibatch.\n\n\n\n      -[ Notes ]-\n\n\n\n      Equations [train]:\n\n\n\n         Y = scaler * norm(X) + intercept\n\n         norm(X) = (X - mean(X)) / sqrt(var(X) + epsilon)\n\n\n\n      Equations [test]:\n\n\n\n         Y = scaler * running_norm(X) + intercept\n\n         running_norm(X) = (X - running_mean) / sqrt(running_var + epsilon)\n\n\n\n      In contrast to \"LayerNorm2D\", the BatchNorm layer calculates the\n\n      mean and var across the *batch* rather than the output features.\n\n\n\n      Parameters:\n\n         * **X** (\"ndarray\" of shape *(n_ex, in_rows, in_cols,\n\n           in_ch)*) -- Input volume containing the *in_rows* x\n\n           *in_cols*-dimensional features for a minibatch of *n_ex*\n\n           examples.\n\n\n\n         * **retain_derived** (*bool*) -- Whether to use the current\n\n           intput to adjust the running mean and running_var\n\n           computations. Setting this to False is the same as freezing\n\n           the layer for the current input. Default is True.\n\n\n\n      Returns:\n\n         **Y** (\"ndarray\" of shape *(n_ex, in_rows, in_cols, in_ch)*)\n\n         -- Layer output for each of the *n_ex* examples.\n\n\n\n   backward(dLdy, retain_grads=True)\n\n\n\n      Backprop from layer outputs to inputs.\n\n\n\n      Parameters:\n\n         * **dLdY** (\"ndarray\" of shape *(n_ex, in_rows, in_cols,\n\n           in_ch)*) -- The gradient of the loss wrt. the layer output\n\n           *Y*.\n\n\n\n         * **retain_grads** (*bool*) -- Whether to include the\n\n           intermediate parameter gradients computed during the\n\n           backward pass in the final parameter update. Default is\n\n           True.\n\n\n\n      Returns:\n\n         **dX** (\"ndarray\" of shape *(n_ex, in_rows, in_cols, in_ch)*)\n\n         -- The gradient of the loss wrt. the layer input *X*.\n", "class_name": "numpy_ml.neural_nets.layers.BatchNorm2D", "class_link": "numpy_ml/neural_nets/layers/layers.py#L969-L1215", "test_file_path": "numpy_ml/tests/test_BatchNorm2D.py"}
+{"title": "RandomForest", "class_annotation": "numpy_ml.trees.RandomForest(n_trees, max_depth, n_feats, classifier=True, criterion='entropy')", "comment": "\"RandomForest\"\n\n**************\n\n\n\nclass numpy_ml.trees.RandomForest(n_trees, max_depth, n_feats, classifier=True, criterion='entropy')\n\n\n\n   An ensemble (forest) of decision trees where each split is\n\n   calculated using a random subset of the features in the input.\n\n\n\n   Parameters:\n\n      * **n_trees** (*int*) -- The number of individual decision trees\n\n        to use within the ensemble.\n\n\n\n      * **max_depth** (*int** or **None*) -- The depth at which to\n\n        stop growing each decision tree. If None, grow each tree until\n\n        the leaf nodes are pure.\n\n\n\n      * **n_feats** (*int*) -- The number of features to sample on\n\n        each split.\n\n\n\n      * **classifier** (*bool*) -- Whether *Y* contains class labels\n\n        or real-valued targets. Default is True.\n\n\n\n      * **criterion** (*{'entropy'**, **'gini'**, **'mse'}*) -- The\n\n        error criterion to use when calculating splits for each weak\n\n        learner. When \"classifier = False\", valid entries are {'mse'}.\n\n        When \"classifier = True\", valid entries are {'entropy',\n\n        'gini'}. Default is 'entropy'.\n\n\n\n   fit(X, Y)\n\n\n\n      Create *n_trees*-worth of bootstrapped samples from the training\n\n      data and use each to fit a separate decision tree.\n\n\n\n   predict(X)\n\n\n\n      Predict the target value for each entry in *X*.\n\n\n\n      Parameters:\n\n         **X** (\"ndarray\" of shape *(N, M)*) -- The training data of\n\n         *N* examples, each with *M* features.\n\n\n\n      Returns:\n\n         **y_pred** (\"ndarray\" of shape *(N,)*) -- Model predictions\n\n         for each entry in *X*.\n", "class_name": "numpy_ml.trees.RandomForest", "class_link": "numpy_ml/trees/rf.py#L11-L99", "test_file_path": "numpy_ml/tests/test_RandomForest.py"}
+{"title": "MLENGram", "class_annotation": "numpy_ml.ngram.MLENGram(N, unk=True, filter_stopwords=True, filter_punctuation=True)", "comment": "\"MLENGram\"\n\n**********\n\n\n\nclass numpy_ml.ngram.MLENGram(N, unk=True, filter_stopwords=True, filter_punctuation=True)\n\n\n\n   A simple, unsmoothed N-gram model.\n\n\n\n   Parameters:\n\n      * **N** (*int*) -- The maximum length (in words) of the context-\n\n        window to use in the langauge model. Model will compute all\n\n        n-grams from 1, ..., N.\n\n\n\n      * **unk** (*bool*) -- Whether to include the \"<unk>\" (unknown)\n\n        token in the LM. Default is True.\n\n\n\n      * **filter_stopwords** (*bool*) -- Whether to remove stopwords\n\n        before training. Default is True.\n\n\n\n      * **filter_punctuation** (*bool*) -- Whether to remove\n\n        punctuation before training. Default is True.\n\n\n\n   log_prob(words, N)\n\n\n\n      Compute the log probability of a sequence of words under the\n\n      unsmoothed, maximum-likelihood *N*-gram language model.\n\n\n\n      Parameters:\n\n         * **words** (*list** of **strings*) -- A sequence of words\n\n\n\n         * **N** (*int*) -- The gram-size of the language model to use\n\n           when calculating the log probabilities of the sequence\n\n\n\n      Returns:\n\n         **total_prob** (*float*) -- The total log-probability of the\n\n         sequence *words* under the *N*-gram language model\n\n\n\n   completions(words, N)\n\n\n\n      Return the distribution over proposed next words under the\n\n      *N*-gram language model.\n\n\n\n      Parameters:\n\n         * **words** (*list** or **tuple** of **strings*) -- The\n\n           initial sequence of words\n\n\n\n         * **N** (*int*) -- The gram-size of the language model to use\n\n           to generate completions\n\n\n\n      Returns:\n\n         **probs** (*list of (word, log_prob) tuples*) -- The list of\n\n         possible next words and their log probabilities under the\n\n         *N*-gram language model (unsorted)\n\n\n\n   cross_entropy(words, N)\n\n\n\n      Calculate the model cross-entropy on a sequence of words against\n\n      the empirical distribution of words in a sample.\n\n\n\n      -[ Notes ]-\n\n\n\n      Model cross-entropy, *H*, is defined as\n\n\n\n         H(W) = -\\frac{\\log p(W)}{n}\n\n\n\n      where W = [w_1, \\ldots, w_k] is a sequence of words, and *n* is\n\n      the number of *N*-grams in *W*.\n\n\n\n      The model cross-entropy is proportional (not equal, since we use\n\n      base *e*) to the average number of bits necessary to encode *W*\n\n      under the model distribution.\n\n\n\n      Parameters:\n\n         * **N** (*int*) -- The gram-size of the model to calculate\n\n           cross-entropy on.\n\n\n\n         * **words** (*list** or **tuple** of **strings*) -- The\n\n           sequence of words to compute cross-entropy on.\n\n\n\n      Returns:\n\n         **H** (*float*) -- The model cross-entropy for the words in\n\n         *words*.\n\n\n\n   generate(N, seed_words=['<bol>'], n_sentences=5)\n\n\n\n      Use the *N*-gram language model to generate sentences.\n\n\n\n      Parameters:\n\n         * **N** (*int*) -- The gram-size of the model to generate\n\n           from\n\n\n\n         * **seed_words** (*list** of **strs*) -- A list of seed words\n\n           to use to condition the initial sentence generation.\n\n           Default is \"[\"<bol>\"]\".\n\n\n\n         * **sentences** (*int*) -- The number of sentences to\n\n           generate from the *N*-gram model. Default is 50.\n\n\n\n      Returns:\n\n         **sentences** (*str*) -- Samples from the *N*-gram model,\n\n         joined by white spaces, with individual sentences separated\n\n         by newlines.\n\n\n\n   perplexity(words, N)\n\n\n\n      Calculate the model perplexity on a sequence of words.\n\n\n\n      -[ Notes ]-\n\n\n\n      Perplexity, *PP*, is defined as\n\n\n\n         PP(W)  =  \\left( \\frac{1}{p(W)} \\right)^{1 / n}\n\n\n\n      or simply\n\n\n\n         PP(W)  &=  \\exp(-\\log p(W) / n) \\\\        &=  \\exp(H(W))\n\n\n\n      where W = [w_1, \\ldots, w_k] is a sequence of words, *H(w)* is\n\n      the cross-entropy of *W* under the current model, and *n* is the\n\n      number of *N*-grams in *W*.\n\n\n\n      Minimizing perplexity is equivalent to maximizing the\n\n      probability of *words* under the *N*-gram model. It may also be\n\n      interpreted as the average branching factor when predicting the\n\n      next word under the language model.\n\n\n\n      Parameters:\n\n         * **N** (*int*) -- The gram-size of the model to calculate\n\n           perplexity with.\n\n\n\n         * **words** (*list** or **tuple** of **strings*) -- The\n\n           sequence of words to compute perplexity on.\n\n\n\n      Returns:\n\n         **perplexity** (*float*) -- The model perlexity for the words\n\n         in *words*.\n\n\n\n   train(corpus_fp, vocab=None, encoding=None)\n\n\n\n      Compile the n-gram counts for the text(s) in *corpus_fp*.\n\n\n\n      -[ Notes ]-\n\n\n\n      After running *train*, the \"self.counts\" attribute will store\n\n      dictionaries of the *N*, *N-1*, ..., 1-gram counts.\n\n\n\n      Parameters:\n\n         * **corpus_fp** (*str*) -- The path to a newline-separated\n\n           text corpus file.\n\n\n\n         * **vocab** (\"Vocabulary\" instance or None) -- If not None,\n\n           only the words in *vocab* will be used to construct the\n\n           language model; all out-of-vocabulary words will either be\n\n           mappend to \"<unk>\" (if \"self.unk = True\") or removed (if\n\n           \"self.unk = False\"). Default is None.\n\n\n\n         * **encoding** (*str** or **None*) -- Specifies the text\n\n           encoding for corpus. Common entries are 'utf-8',\n\n           'utf-8-sig', 'utf-16'. Default is None.\n", "class_name": "numpy_ml.ngram.MLENGram", "class_link": "numpy_ml/ngram/ngram.py#L313-L361", "test_file_path": "numpy_ml/tests/test_MLENGram.py"}
+{"title": "BidirectionalLSTM", "class_annotation": "numpy_ml.neural_nets.modules.BidirectionalLSTM(n_out, act_fn=None, gate_fn=None, merge_mode='concat', init='glorot_uniform', optimizer=None)", "comment": "\"BidirectionalLSTM\"\n\n*******************\n\n\n\nclass numpy_ml.neural_nets.modules.BidirectionalLSTM(n_out, act_fn=None, gate_fn=None, merge_mode='concat', init='glorot_uniform', optimizer=None)\n\n\n\n   A single bidirectional long short-term memory (LSTM) layer.\n\n\n\n   Parameters:\n\n      * **n_out** (*int*) -- The dimension of a single hidden state /\n\n        output on a given timestep\n\n\n\n      * **act_fn** (Activation object or None) -- The activation\n\n        function for computing \"A[t]\". If not specified, use \"Tanh\" by\n\n        default.\n\n\n\n      * **gate_fn** (Activation object or None) -- The gate function\n\n        for computing the update, forget, and output gates. If not\n\n        specified, use \"Sigmoid\" by default.\n\n\n\n      * **merge_mode** (*{\"sum\"**, **\"multiply\"**, **\"concat\"**,\n\n        **\"average\"}*) -- Mode by which outputs of the forward and\n\n        backward LSTMs will be combined. Default is 'concat'.\n\n\n\n      * **optimizer** (str or Optimizer object or None) -- The\n\n        optimization strategy to use when performing gradient updates\n\n        within the *update* method.  If None, use the \"SGD\" optimizer\n\n        with default parameters. Default is None.\n\n\n\n      * **init** (*{'glorot_normal'**, **'glorot_uniform'**,\n\n        **'he_normal'**, **'he_uniform'}*) -- The weight\n\n        initialization strategy. Default is 'glorot_uniform'.\n\n\n\n   forward(X)\n\n\n\n      Run a forward pass across all timesteps in the input.\n\n\n\n      Parameters:\n\n         **X** (\"ndarray\" of shape *(n_ex, n_in, n_t)*) -- Input\n\n         consisting of *n_ex* examples each of dimensionality *n_in*\n\n         and extending for *n_t* timesteps.\n\n\n\n      Returns:\n\n         **Y** (\"ndarray\" of shape *(n_ex, n_out, n_t)*) -- The value\n\n         of the hidden state for each of the *n_ex* examples across\n\n         each of the *n_t* timesteps.\n\n\n\n   backward(dLdA)\n\n\n\n      Run a backward pass across all timesteps in the input.\n\n\n\n      Parameters:\n\n         **dLdA** (\"ndarray\" of shape *(n_ex, n_out, n_t)*) -- The\n\n         gradient of the loss with respect to the layer output for\n\n         each of the *n_ex* examples across all *n_t* timesteps.\n\n\n\n      Returns:\n\n         **dLdX** (\"ndarray\" of shape *(n_ex, n_in, n_t)*) -- The\n\n         value of the hidden state for each of the *n_ex* examples\n\n         across each of the *n_t* timesteps.\n\n\n\n   property derived_variables\n\n\n\n      A dictionary of intermediate values computed during the\n\n      forward/backward passes.\n\n\n\n   property gradients\n\n\n\n      A dictionary of the accumulated module parameter gradients.\n\n\n\n   property parameters\n\n\n\n      A dictionary of the module parameters.\n\n\n\n   property hyperparameters\n\n\n\n      A dictionary of the module hyperparameters.\n", "class_name": "numpy_ml.neural_nets.modules.BidirectionalLSTM", "class_link": "numpy_ml/neural_nets/modules/modules.py#L987-L1189", "test_file_path": "numpy_ml/tests/test_BidirectionalLSTM.py"}
+{"title": "KNN", "class_annotation": "numpy_ml.nonparametric.KNN(k=5, leaf_size=40, classifier=True, metric=None, weights='uniform')", "comment": "\"KNN\"\n\n*****\n\n\n\nclass numpy_ml.nonparametric.KNN(k=5, leaf_size=40, classifier=True, metric=None, weights='uniform')\n\n\n\n   A *k*-nearest neighbors (kNN) model relying on a ball tree for\n\n   efficient computation.\n\n\n\n   Parameters:\n\n      * **k** (*int*) -- The number of neighbors to use during\n\n        prediction. Default is 5.\n\n\n\n      * **leaf_size** (*int*) -- The maximum number of datapoints at\n\n        each leaf in the ball tree. Default is 40.\n\n\n\n      * **classifier** (*bool*) -- Whether to treat the values in Y as\n\n        class labels (classifier = True) or real-valued targets\n\n        (classifier = False). Default is True.\n\n\n\n      * **metric** (Distance metric or None) -- The distance metric to\n\n        use for computing nearest neighbors. If None, use the\n\n        \"euclidean()\" metric by default. Default is None.\n\n\n\n      * **weights** (*{'uniform'**, **'distance'}*) -- How to weight\n\n        the predictions from each neighbors. 'uniform' assigns uniform\n\n        weights to each neighbor, while 'distance' assigns weights\n\n        proportional to the inverse of the distance from the query\n\n        point. Default is 'uniform'.\n\n\n\n   fit(X, y)\n\n\n\n      Fit the model to the data and targets in *X* and *y*\n\n\n\n      Parameters:\n\n         * **X** (numpy array of shape *(N, M)*) -- An array of *N*\n\n           examples to generate predictions on.\n\n\n\n         * **y** (numpy array of shape *(N, *)*) -- Targets for the\n\n           *N* rows in *X*.\n\n\n\n   predict(X)\n\n\n\n      Generate predictions for the targets associated with the rows in\n\n      *X*.\n\n\n\n      Parameters:\n\n         **X** (numpy array of shape *(N', M')*) -- An array of *N'*\n\n         examples to generate predictions on.\n\n\n\n      Returns:\n\n         **y** (numpy array of shape *(N', *)*) -- Predicted targets\n\n         for the *N'* rows in *X*.\n", "class_name": "numpy_ml.nonparametric.KNN", "class_link": "numpy_ml/nonparametric/knn.py#L9-L101", "test_file_path": "numpy_ml/tests/test_KNN.py"}
+{"title": "Sigmoid", "class_annotation": "numpy_ml.neural_nets.activations.Sigmoid", "comment": "\"Sigmoid\"\n\n*********\n\n\n\nclass numpy_ml.neural_nets.activations.Sigmoid\n\n\n\n   A logistic sigmoid activation function.\n\n\n\n   fn(z)\n\n\n\n      Evaluate the logistic sigmoid, \\sigma, on the elements of input\n\n      *z*.\n\n\n\n         \\sigma(x_i) = \\frac{1}{1 + e^{-x_i}}\n\n\n\n   grad(x)\n\n\n\n      Evaluate the first derivative of the logistic sigmoid on the\n\n      elements of *x*.\n\n\n\n         \\frac{\\partial \\sigma}{\\partial x_i} = \\sigma(x_i) (1 -\n\n         \\sigma(x_i))\n\n\n\n   grad2(x)\n\n\n\n      Evaluate the second derivative of the logistic sigmoid on the\n\n      elements of *x*.\n\n\n\n         \\frac{\\partial^2 \\sigma}{\\partial x_i^2} =     \\frac{\\partial\n\n         \\sigma}{\\partial x_i} (1 - 2 \\sigma(x_i))\n", "class_name": "numpy_ml.neural_nets.activations.Sigmoid", "class_link": "numpy_ml/neural_nets/activations/activations.py#L30-L70", "test_file_path": "numpy_ml/tests/test_Sigmoid.py"}
+{"title": "SkipConnectionIdentityModule", "class_annotation": "numpy_ml.neural_nets.modules.SkipConnectionIdentityModule(out_ch, kernel_shape1, kernel_shape2, stride1=1, stride2=1, act_fn=None, epsilon=1e-05, momentum=0.9, optimizer=None, init='glorot_uniform')", "comment": "\"SkipConnectionIdentityModule\"\n\n******************************\n\n\n\nclass numpy_ml.neural_nets.modules.SkipConnectionIdentityModule(out_ch, kernel_shape1, kernel_shape2, stride1=1, stride2=1, act_fn=None, epsilon=1e-05, momentum=0.9, optimizer=None, init='glorot_uniform')\n\n\n\n   A ResNet-like \"identity\" shortcut module.\n\n\n\n   -[ Notes ]-\n\n\n\n   The identity module enforces *same* padding during each convolution\n\n   to ensure module output has same dims as its input.\n\n\n\n      X -> Conv2D -> Act_fn -> BatchNorm2D -> Conv2D -> BatchNorm2D -> + -> Act_fn\n\n       \\______________________________________________________________/\n\n\n\n   -[ References ]-\n\n\n\n   [1] He et al. (2015). \"Deep residual learning for image\n\n       recognition.\" https://arxiv.org/pdf/1512.03385.pdf\n\n\n\n   Parameters:\n\n      * **out_ch** (*int*) -- The number of filters/kernels to compute\n\n        in the first convolutional layer.\n\n\n\n      * **kernel_shape1** (*2-tuple*) -- The dimension of a single 2D\n\n        filter/kernel in the first convolutional layer.\n\n\n\n      * **kernel_shape2** (*2-tuple*) -- The dimension of a single 2D\n\n        filter/kernel in the second convolutional layer.\n\n\n\n      * **stride1** (*int*) -- The stride/hop of the convolution\n\n        kernels in the first convolutional layer. Default is 1.\n\n\n\n      * **stride2** (*int*) -- The stride/hop of the convolution\n\n        kernels in the second convolutional layer. Default is 1.\n\n\n\n      * **act_fn** (Activation object or None) -- The activation\n\n        function for computing Y[t]. If None, use the identity f(x) =\n\n        x by default. Default is None.\n\n\n\n      * **epsilon** (*float*) -- A small smoothing constant to use\n\n        during \"BatchNorm2D\" computation to avoid divide-by-zero\n\n        errors. Default is 1e-5.\n\n\n\n      * **momentum** (*float*) -- The momentum term for the running\n\n        mean/running std calculations in the \"BatchNorm2D\" layers.\n\n        The closer this is to 1, the less weight will be given to the\n\n        mean/std of the current batch (i.e., higher smoothing).\n\n        Default is 0.9.\n\n\n\n      * **optimizer** (str or Optimizer object or None) -- The\n\n        optimization strategy to use when performing gradient updates\n\n        within the \"update()\" method.  If None, use the \"SGD\"\n\n        optimizer with default parameters. Default is None.\n\n\n\n      * **init** (*{'glorot_normal'**, **'glorot_uniform'**,\n\n        **'he_normal'**, **'he_uniform'}*) -- The weight\n\n        initialization strategy. Default is 'glorot_uniform'.\n\n\n\n   property parameters\n\n\n\n      A dictionary of the module parameters.\n\n\n\n   property hyperparameters\n\n\n\n      A dictionary of the module hyperparameters.\n\n\n\n   property derived_variables\n\n\n\n      A dictionary of intermediate values computed during the\n\n      forward/backward passes.\n\n\n\n   property gradients\n\n\n\n      A dictionary of the accumulated module parameter gradients.\n\n\n\n   forward(X, retain_derived=True)\n\n\n\n      Compute the module output given input volume *X*.\n\n\n\n      Parameters:\n\n         * **X** (\"ndarray\" of shape (n_ex, in_rows, in_cols, in_ch))\n\n           -- The input volume consisting of *n_ex* examples, each\n\n           with dimension (*in_rows*, *in_cols*, *in_ch*).\n\n\n\n         * **retain_derived** (*bool*) -- Whether to retain the\n\n           variables calculated during the forward pass for use later\n\n           during backprop. If False, this suggests the layer will not\n\n           be expected to backprop through wrt. this input. Default is\n\n           True.\n\n\n\n      Returns:\n\n         **Y** (\"ndarray\" of shape (n_ex, out_rows, out_cols, out_ch))\n\n         -- The module output volume.\n\n\n\n   backward(dLdY, retain_grads=True)\n\n\n\n      Compute the gradient of the loss with respect to the layer\n\n      parameters.\n\n\n\n      Parameters:\n\n         * **dLdy** (\"ndarray\" of shape (*n_ex, out_rows, out_cols,\n\n           out_ch*) or list of arrays) -- The gradient(s) of the loss\n\n           with respect to the module output(s).\n\n\n\n         * **retain_grads** (*bool*) -- Whether to include the\n\n           intermediate parameter gradients computed during the\n\n           backward pass in the final parameter update. Default is\n\n           True.\n\n\n\n      Returns:\n\n         **dX** (\"ndarray\" of shape (n_ex, in_rows, in_cols, in_ch))\n\n         -- The gradient of the loss with respect to the module input\n\n         volume.\n", "class_name": "numpy_ml.neural_nets.modules.SkipConnectionIdentityModule", "class_link": "numpy_ml/neural_nets/modules/modules.py#L360-L612", "test_file_path": "numpy_ml/tests/test_SkipConnectionIdentityModule.py"}
+{"title": "FullyConnected", "class_annotation": "numpy_ml.neural_nets.layers.FullyConnected(n_out, act_fn=None, init='glorot_uniform', optimizer=None)", "comment": "\"FullyConnected\"\n\n****************\n\n\n\nclass numpy_ml.neural_nets.layers.FullyConnected(n_out, act_fn=None, init='glorot_uniform', optimizer=None)\n\n\n\n   Bases: \"LayerBase\"\n\n\n\n   A fully-connected (dense) layer.\n\n\n\n   -[ Notes ]-\n\n\n\n   A fully connected layer computes the function\n\n\n\n      \\mathbf{Y} = f( \\mathbf{WX} + \\mathbf{b} )\n\n\n\n   where *f* is the activation nonlinearity, **W** and **b** are\n\n   parameters of the layer, and **X** is the minibatch of input\n\n   examples.\n\n\n\n   Parameters:\n\n      * **n_out** (*int*) -- The dimensionality of the layer output\n\n\n\n      * **act_fn** (str, Activation object, or None) -- The element-\n\n        wise output nonlinearity used in computing *Y*. If None, use\n\n        the identity function f(X) = X. Default is None.\n\n\n\n      * **init** (*{'glorot_normal'**, **'glorot_uniform'**,\n\n        **'he_normal'**, **'he_uniform'}*) -- The weight\n\n        initialization strategy. Default is *'glorot_uniform'*.\n\n\n\n      * **optimizer** (str, Optimizer object, or None) -- The\n\n        optimization strategy to use when performing gradient updates\n\n        within the \"update()\" method.  If None, use the \"SGD\"\n\n        optimizer with default parameters. Default is None.\n\n\n\n   Variables:\n\n      * **X** (*list*) -- Running list of inputs to the \"forward\"\n\n        method since the last call to \"update\". Only updated if the\n\n        *retain_derived* argument was set to True.\n\n\n\n      * **gradients** (*dict*) -- Dictionary of loss gradients with\n\n        regard to the layer parameters\n\n\n\n      * **parameters** (*dict*) -- Dictionary of layer parameters\n\n\n\n      * **hyperparameters** (*dict*) -- Dictionary of layer\n\n        hyperparameters\n\n\n\n      * **derived_variables** (*dict*) -- Dictionary of any\n\n        intermediate values computed during forward/backward\n\n        propagation.\n\n\n\n   property hyperparameters\n\n\n\n      Return a dictionary containing the layer hyperparameters.\n\n\n\n   forward(X, retain_derived=True)\n\n\n\n      Compute the layer output on a single minibatch.\n\n\n\n      Parameters:\n\n         * **X** (\"ndarray\" of shape *(n_ex, n_in)*) -- Layer input,\n\n           representing the *n_in*-dimensional features for a\n\n           minibatch of *n_ex* examples.\n\n\n\n         * **retain_derived** (*bool*) -- Whether to retain the\n\n           variables calculated during the forward pass for use later\n\n           during backprop. If False, this suggests the layer will not\n\n           be expected to backprop through wrt. this input. Default is\n\n           True.\n\n\n\n      Returns:\n\n         **Y** (\"ndarray\" of shape *(n_ex, n_out)*) -- Layer output\n\n         for each of the *n_ex* examples.\n\n\n\n   backward(dLdy, retain_grads=True)\n\n\n\n      Backprop from layer outputs to inputs.\n\n\n\n      Parameters:\n\n         * **dLdy** (\"ndarray\" of shape *(n_ex, n_out)* or list of\n\n           arrays) -- The gradient(s) of the loss wrt. the layer\n\n           output(s).\n\n\n\n         * **retain_grads** (*bool*) -- Whether to include the\n\n           intermediate parameter gradients computed during the\n\n           backward pass in the final parameter update. Default is\n\n           True.\n\n\n\n      Returns:\n\n         **dLdX** (\"ndarray\" of shape *(n_ex, n_in)* or list of\n\n         arrays) -- The gradient of the loss wrt. the layer input(s)\n\n         *X*.\n", "class_name": "numpy_ml.neural_nets.layers.FullyConnected", "class_link": "numpy_ml/neural_nets/layers/layers.py#L2010-L2189", "test_file_path": "numpy_ml/tests/test_FullyConnected.py"}
+{"title": "ReLU", "class_annotation": "numpy_ml.neural_nets.activations.ReLU", "comment": "\"ReLU\"\n\n******\n\n\n\nclass numpy_ml.neural_nets.activations.ReLU\n\n\n\n   A rectified linear activation function.\n\n\n\n   -[ Notes ]-\n\n\n\n   \"ReLU units can be fragile during training and can \"die\". For\n\n   example, a large gradient flowing through a ReLU neuron could cause\n\n   the weights to update in such a way that the neuron will never\n\n   activate on any datapoint again. If this happens, then the gradient\n\n   flowing through the unit will forever be zero from that point on.\n\n   That is, the ReLU units can irreversibly die during training since\n\n   they can get knocked off the data manifold.\n\n\n\n   For example, you may find that as much as 40% of your network can\n\n   be \"dead\" (i.e. neurons that never activate across the entire\n\n   training dataset) if the learning rate is set too high. With a\n\n   proper setting of the learning rate this is less frequently an\n\n   issue.\" [*]\n\n\n\n   -[ References ]-\n\n\n\n   [*] Karpathy, A. \"CS231n: Convolutional neural networks for visual\n\n       recognition.\"\n\n\n\n   Initialize the ActivationBase object\n\n\n\n   fn(z)\n\n\n\n      Evaulate the ReLU function on the elements of input *z*.\n\n\n\n         \\text{ReLU}(z_i)     &=  z_i \\ \\ \\ \\ &&\\text{if }z_i > 0 \\\\\n\n         &=  0 \\ \\ \\ \\ &&\\text{otherwise}\n\n\n\n   grad(x)\n\n\n\n      Evaulate the first derivative of the ReLU function on the\n\n      elements of input *x*.\n\n\n\n         \\frac{\\partial \\text{ReLU}}{\\partial x_i}     &=  1 \\ \\ \\ \\\n\n         &&\\text{if }x_i > 0 \\\\     &=  0   \\ \\ \\ \\ &&\\text{otherwise}\n\n\n\n   grad2(x)\n\n\n\n      Evaulate the second derivative of the ReLU function on the\n\n      elements of input *x*.\n\n\n\n         \\frac{\\partial^2 \\text{ReLU}}{\\partial x_i^2}  =  0\n", "class_name": "numpy_ml.neural_nets.activations.ReLU", "class_link": "numpy_ml/neural_nets/activations/activations.py#L73-L137", "test_file_path": "numpy_ml/tests/test_ReLU.py"}
+{"title": "manhattan", "class_annotation": "numpy_ml.utils.distance_metrics.manhattan(x, y)", "comment": "\"manhattan\"\n\n***********\n\n\n\nnumpy_ml.utils.distance_metrics.manhattan(x, y)\n\n\n\n   Compute the Manhattan (*L1*) distance between two real vectors\n\n\n\n   -[ Notes ]-\n\n\n\n   The Manhattan distance between two vectors **x** and **y** is\n\n\n\n      d(\\mathbf{x}, \\mathbf{y}) = \\sum_i |x_i - y_i|\n\n\n\n   Parameters:\n\n      * **x** (\"ndarray\" s of shape *(N,)*) -- The two vectors to\n\n        compute the distance between\n\n\n\n      * **y** (\"ndarray\" s of shape *(N,)*) -- The two vectors to\n\n        compute the distance between\n\n\n\n   Returns:\n\n      **d** (*float*) -- The L1 distance between **x** and **y**.\n", "class_name": "numpy_ml.utils.distance_metrics.manhattan", "class_link": "numpy_ml/utils/distance_metrics.py#L29-L51", "test_file_path": "numpy_ml/tests/test_manhattan.py"}
+{"title": "Tanh", "class_annotation": "numpy_ml.neural_nets.activations.Tanh", "comment": "\"Tanh\"\n\n******\n\n\n\nclass numpy_ml.neural_nets.activations.Tanh\n\n\n\n   A hyperbolic tangent activation function.\n\n\n\n   fn(z)\n\n\n\n      Compute the tanh function on the elements of input *z*.\n\n\n\n   grad(x)\n\n\n\n      Evaluate the first derivative of the tanh function on the\n\n      elements of input *x*.\n\n\n\n         \\frac{\\partial \\tanh}{\\partial x_i}  =  1 - \\tanh(x)^2\n\n\n\n   grad2(x)\n\n\n\n      Evaluate the second derivative of the tanh function on the\n\n      elements of input *x*.\n\n\n\n         \\frac{\\partial^2 \\tanh}{\\partial x_i^2} =     -2 \\tanh(x)\n\n         \\left(\\frac{\\partial \\tanh}{\\partial x_i}\\right)\n", "class_name": "numpy_ml.neural_nets.activations.Tanh", "class_link": "numpy_ml/neural_nets/activations/activations.py#L304-L339", "test_file_path": "numpy_ml/tests/test_Tanh.py"}
+{"title": "Standardizer", "class_annotation": "numpy_ml.preprocessing.general.Standardizer(with_mean=True, with_std=True)", "comment": "\"Standardizer\"\n\n**************\n\n\n\nclass numpy_ml.preprocessing.general.Standardizer(with_mean=True, with_std=True)\n\n\n\n   Feature-wise standardization for vector inputs.\n\n\n\n   -[ Notes ]-\n\n\n\n   Due to the sensitivity of empirical mean and standard deviation\n\n   calculations to extreme values, *Standardizer* cannot guarantee\n\n   balanced feature scales in the presence of outliers. In particular,\n\n   note that because outliers for each feature can have different\n\n   magnitudes, the spread of the transformed data on each feature can\n\n   be very different.\n\n\n\n   Similar to sklearn, *Standardizer* uses a biased estimator for the\n\n   standard deviation: \"numpy.std(x, ddof=0)\".\n\n\n\n   Parameters:\n\n      * **with_mean** (*bool*) -- Whether to scale samples to have 0\n\n        mean during transformation. Default is True.\n\n\n\n      * **with_std** (*bool*) -- Whether to scale samples to have unit\n\n        variance during transformation. Default is True.\n\n\n\n   property hyperparameters\n\n\n\n   property parameters\n\n\n\n   fit(X)\n\n\n\n      Store the feature-wise mean and standard deviation across the\n\n      samples in *X* for future scaling.\n\n\n\n      Parameters:\n\n         **X** (\"ndarray\" of shape *(N, C)*) -- An array of N samples,\n\n         each with dimensionality *C*\n\n\n\n   transform(X)\n\n\n\n      Standardize features by removing the mean and scaling to unit\n\n      variance.\n\n\n\n      For a sample *x*, the standardized score is calculated as:\n\n\n\n         z = (x - u) / s\n\n\n\n      where *u* is the mean of the training samples or zero if\n\n      *with_mean* is False, and *s* is the standard deviation of the\n\n      training samples or 1 if *with_std* is False.\n\n\n\n      Parameters:\n\n         **X** (\"ndarray\" of shape *(N, C)*) -- An array of N samples,\n\n         each with dimensionality *C*.\n\n\n\n      Returns:\n\n         **Z** (\"ndarray\" of shape *(N, C)*) -- The feature-wise\n\n         standardized version of *X*.\n\n\n\n   inverse_transform(Z)\n\n\n\n      Convert a collection of standardized features back into the\n\n      original feature space.\n\n\n\n      For a standardized sample *z*, the unstandardized score is\n\n      calculated as:\n\n\n\n         x = z s + u\n\n\n\n      where *u* is the mean of the training samples or zero if\n\n      *with_mean* is False, and *s* is the standard deviation of the\n\n      training samples or 1 if *with_std* is False.\n\n\n\n      Parameters:\n\n         **Z** (\"ndarray\" of shape *(N, C)*) -- An array of *N*\n\n         standardized samples, each with dimensionality *C*.\n\n\n\n      Returns:\n\n         **X** (\"ndarray\" of shape *(N, C)*) -- The unstandardixed\n\n         samples from *Z*.\n", "class_name": "numpy_ml.preprocessing.general.Standardizer", "class_link": "numpy_ml/preprocessing/general.py#L141-L272", "test_file_path": "numpy_ml/tests/test_Standardizer.py"}
+{"title": "DCT", "class_annotation": "numpy_ml.preprocessing.dsp.DCT(frame, orthonormal=True)", "comment": "\"DCT\"\n\n*****\n\n\n\nnumpy_ml.preprocessing.dsp.DCT(frame, orthonormal=True)\n\n\n\n   A naive O(N^2) implementation of the 1D discrete cosine transform-\n\n   II (DCT-II).\n\n\n\n   -[ Notes ]-\n\n\n\n   For a signal \\mathbf{x} = [x_1, \\ldots, x_N] consisting of *N*\n\n   samples, the *k* th DCT coefficient, c_k, is\n\n\n\n      c_k = 2 \\sum_{n=0}^{N-1} x_n \\cos(\\pi k (2 n + 1) / (2 N))\n\n\n\n   where *k* ranges from 0, \\ldots, N-1.\n\n\n\n   The DCT is highly similar to the DFT -- whereas in a DFT the basis\n\n   functions are sinusoids, in a DCT they are restricted solely to\n\n   cosines. A signal's DCT representation tends to have more of its\n\n   energy concentrated in a smaller number of coefficients when\n\n   compared to the DFT, and is thus commonly used for signal\n\n   compression. [1]\n\n\n\n   [1] Smoother signals can be accurately approximated using fewer DFT\n\n       / DCT coefficients, resulting in a higher compression ratio.\n\n       The DCT naturally yields a continuous extension at the signal\n\n       boundaries due its use of even basis functions (cosine). This\n\n       in turn produces a smoother extension in comparison to DFT or\n\n       DCT approximations, resulting in a higher compression.\n\n\n\n   Parameters:\n\n      * **frame** (\"ndarray\" of shape *(N,)*) -- A signal frame\n\n        consisting of N samples\n\n\n\n      * **orthonormal** (*bool*) -- Scale to ensure the coefficient\n\n        vector is orthonormal. Default is True.\n\n\n\n   Returns:\n\n      **dct** (\"ndarray\" of shape *(N,)*) -- The discrete cosine\n\n      transform of the samples in *frame*.\n", "class_name": "numpy_ml.preprocessing.dsp.DCT", "class_link": "numpy_ml/preprocessing/dsp.py#L161-L209", "test_file_path": "numpy_ml/tests/test_DCT.py"}
+{"title": "Multiply", "class_annotation": "numpy_ml.neural_nets.layers.Multiply(act_fn=None, optimizer=None)", "comment": "\"Multiply\"\n\n**********\n\n\n\nclass numpy_ml.neural_nets.layers.Multiply(act_fn=None, optimizer=None)\n\n\n\n   Bases: \"LayerBase\"\n\n\n\n   A multiplication layer that returns the *elementwise* product of\n\n   its inputs, passed through an optional nonlinearity.\n\n\n\n   Parameters:\n\n      * **act_fn** (str, Activation object, or None) -- The element-\n\n        wise output nonlinearity used in computing the final output.\n\n        If None, use the identity function f(x) = x. Default is None.\n\n\n\n      * **optimizer** (str, Optimizer object, or None) -- The\n\n        optimization strategy to use when performing gradient updates\n\n        within the \"update()\" method.  If None, use the \"SGD\"\n\n        optimizer with default parameters. Default is None.\n\n\n\n   Variables:\n\n      * **X** (*list*) -- Running list of inputs to the \"forward\"\n\n        method since the last call to \"update\". Only updated if the\n\n        *retain_derived* argument was set to True.\n\n\n\n      * **gradients** (*dict*) -- Unused\n\n\n\n      * **parameters** (*dict*) -- Unused\n\n\n\n      * **hyperparameters** (*dict*) -- Dictionary of layer\n\n        hyperparameters\n\n\n\n      * **derived_variables** (*dict*) -- Dictionary of any\n\n        intermediate values computed during forward/backward\n\n        propagation.\n\n\n\n   property hyperparameters\n\n\n\n      Return a dictionary containing the layer hyperparameters.\n\n\n\n   forward(X, retain_derived=True)\n\n\n\n      Compute the layer output on a single minibatch.\n\n\n\n      Parameters:\n\n         * **X** (list of length *n_inputs*) -- A list of tensors, all\n\n           of the same shape.\n\n\n\n         * **retain_derived** (*bool*) -- Whether to retain the\n\n           variables calculated during the forward pass for use later\n\n           during backprop. If False, this suggests the layer will not\n\n           be expected to backprop through wrt. this input. Default is\n\n           True.\n\n\n\n      Returns:\n\n         **Y** (\"ndarray\" of shape *(n_ex, *)*) -- The product over\n\n         the *n_ex* examples.\n\n\n\n   backward(dLdY, retain_grads=True)\n\n\n\n      Backprop from layer outputs to inputs.\n\n\n\n      Parameters:\n\n         * **dLdY** (\"ndarray\" of shape *(n_ex, *)*) -- The gradient\n\n           of the loss wrt. the layer output *Y*.\n\n\n\n         * **retain_grads** (*bool*) -- Whether to include the\n\n           intermediate parameter gradients computed during the\n\n           backward pass in the final parameter update. Default is\n\n           True.\n\n\n\n      Returns:\n\n         **dX** (list of length *n_inputs*) -- The gradient of the\n\n         loss wrt. each input in *X*.\n", "class_name": "numpy_ml.neural_nets.layers.Multiply", "class_link": "numpy_ml/neural_nets/layers/layers.py#L745-L856", "test_file_path": "numpy_ml/tests/test_Multiply.py"}
+{"title": "TFIDFEncoder", "class_annotation": "numpy_ml.preprocessing.nlp.TFIDFEncoder(vocab=None, lowercase=True, min_count=0, smooth_idf=True, max_tokens=None, input_type='files', filter_stopwords=True, filter_punctuation=True, tokenizer='words')", "comment": "\"TFIDFEncoder\"\n\n**************\n\n\n\nclass numpy_ml.preprocessing.nlp.TFIDFEncoder(vocab=None, lowercase=True, min_count=0, smooth_idf=True, max_tokens=None, input_type='files', filter_stopwords=True, filter_punctuation=True, tokenizer='words')\n\n\n\n   An object for compiling and encoding the term-frequency inverse-\n\n   document-frequency (TF-IDF) representation of the tokens in a text\n\n   corpus.\n\n\n\n   -[ Notes ]-\n\n\n\n   TF-IDF is intended to reflect how important a word is to a document\n\n   in a collection or corpus. For a word token *w* in a document *d*,\n\n   and a corpus, D = \\{d_1, \\ldots, d_N\\}, we have:\n\n\n\n      \\text{TF}(w, d)  &=  \\text{num. occurences of }w \\text{ in\n\n      document }d \\\\ \\text{IDF}(w, D)  &=  \\log \\frac{|D|}{|\\{ d \\in\n\n      D: t \\in d \\}|}\n\n\n\n   Parameters:\n\n      * **vocab** (\"Vocabulary\" object or list-like) -- An existing\n\n        vocabulary to filter the tokens in the corpus against. Default\n\n        is None.\n\n\n\n      * **lowercase** (*bool*) -- Whether to convert each string to\n\n        lowercase before tokenization. Default is True.\n\n\n\n      * **min_count** (*int*) -- Minimum number of times a token must\n\n        occur in order to be included in vocab. Default is 0.\n\n\n\n      * **smooth_idf** (*bool*) -- Whether to add 1 to the denominator\n\n        of the IDF calculation to avoid divide-by-zero errors. Default\n\n        is True.\n\n\n\n      * **max_tokens** (*int*) -- Only add the *max_tokens* most\n\n        frequent tokens that occur more than *min_count* to the\n\n        vocabulary.  If None, add all tokens greater that occur more\n\n        than than *min_count*. Default is None.\n\n\n\n      * **input_type** (*{'files'**, **'strings'}*) -- If 'files', the\n\n        sequence input to *fit* is expected to be a list of filepaths.\n\n        If 'strings', the input is expected to be a list of lists,\n\n        each sublist containing the raw strings for a single document\n\n        in the corpus. Default is 'filename'.\n\n\n\n      * **filter_stopwords** (*bool*) -- Whether to remove stopwords\n\n        before encoding the words in the corpus. Default is True.\n\n\n\n      * **filter_punctuation** (*bool*) -- Whether to remove\n\n        punctuation before encoding the words in the corpus. Default\n\n        is True.\n\n\n\n      * **tokenizer** (*{'whitespace'**, **'words'**,\n\n        **'characters'**, **'bytes'}*) -- Strategy to follow when\n\n        mapping strings to tokens. The *'whitespace'* tokenizer splits\n\n        strings at whitespace characters. The *'words'* tokenizer\n\n        splits strings using a \"word\" regex. The *'characters'*\n\n        tokenizer splits strings into individual characters. The\n\n        *'bytes'* tokenizer splits strings into a collection of\n\n        individual bytes.\n\n\n\n   fit(corpus_seq, encoding='utf-8-sig')\n\n\n\n      Compute term-frequencies and inverse document frequencies on a\n\n      collection of documents.\n\n\n\n      Parameters:\n\n         * **corpus_seq** (*str** or **list** of **strs*) -- The\n\n           filepath / list of filepaths / raw string contents of the\n\n           document(s) to be encoded, in accordance with the\n\n           *input_type* parameter passed to the \"__init__()\" method.\n\n           Each document is expected to be a string of tokens\n\n           separated by whitespace.\n\n\n\n         * **encoding** (*str*) -- Specifies the text encoding for\n\n           corpus if *input_type* is *files*. Common entries are\n\n           either 'utf-8' (no header byte), or 'utf-8-sig' (header\n\n           byte). Default is 'utf-8-sig'.\n\n\n\n      Returns:\n\n         *self*\n\n\n\n   transform(ignore_special_chars=True)\n\n\n\n      Generate the term-frequency inverse-document-frequency encoding\n\n      of a text corpus.\n\n\n\n      Parameters:\n\n         **ignore_special_chars** (*bool*) -- Whether to drop columns\n\n         corresponding to \"<eol>\", \"<bol>\", and \"<unk>\" tokens from\n\n         the final tfidf encoding. Default is True.\n\n\n\n      Returns:\n\n         **tfidf** (numpy array of shape *(D, M [- 3])*) -- The\n\n         encoded corpus, with each row corresponding to a single\n\n         document, and each column corresponding to a token id. The\n\n         mapping between column numbers and tokens is stored in the\n\n         *idx2token* attribute IFF *ignore_special_chars* is False.\n\n         Otherwise, the mappings are not accurate.\n", "class_name": "numpy_ml.preprocessing.nlp.TFIDFEncoder", "class_link": "numpy_ml/preprocessing/nlp.py#L583-L1005", "test_file_path": "numpy_ml/tests/test_TFIDFEncoder.py"}
+{"title": "hamming", "class_annotation": "numpy_ml.utils.distance_metrics.hamming(x, y)", "comment": "\"hamming\"\n\n*********\n\n\n\nnumpy_ml.utils.distance_metrics.hamming(x, y)\n\n\n\n   Compute the Hamming distance between two integer-valued vectors.\n\n\n\n   -[ Notes ]-\n\n\n\n   The Hamming distance between two vectors **x** and **y** is\n\n\n\n      d(\\mathbf{x}, \\mathbf{y}) = \\frac{1}{N} \\sum_i \\mathbb{1}_{x_i\n\n      \\neq y_i}\n\n\n\n   Parameters:\n\n      * **x** (\"ndarray\" s of shape *(N,)*) -- The two vectors to\n\n        compute the distance between. Both vectors should be integer-\n\n        valued.\n\n\n\n      * **y** (\"ndarray\" s of shape *(N,)*) -- The two vectors to\n\n        compute the distance between. Both vectors should be integer-\n\n        valued.\n\n\n\n   Returns:\n\n      **d** (*float*) -- The Hamming distance between **x** and **y**.\n", "class_name": "numpy_ml.utils.distance_metrics.hamming", "class_link": "numpy_ml/utils/distance_metrics.py#L109-L132", "test_file_path": "numpy_ml/tests/test_hamming.py"}
+{"title": "Conv2D", "class_annotation": "numpy_ml.neural_nets.layers.Conv2D(out_ch, kernel_shape, pad=0, stride=1, dilation=0, act_fn=None, optimizer=None, init='glorot_uniform')", "comment": "\"Conv2D\"\n\n********\n\n\n\nclass numpy_ml.neural_nets.layers.Conv2D(out_ch, kernel_shape, pad=0, stride=1, dilation=0, act_fn=None, optimizer=None, init='glorot_uniform')\n\n\n\n   Bases: \"LayerBase\"\n\n\n\n   Apply a two-dimensional convolution kernel over an input volume.\n\n\n\n   -[ Notes ]-\n\n\n\n   Equations:\n\n\n\n      out = act_fn(pad(X) * W + b)\n\n      n_rows_out = floor(1 + (n_rows_in + pad_left + pad_right - filter_rows) / stride)\n\n      n_cols_out = floor(1 + (n_cols_in + pad_top + pad_bottom - filter_cols) / stride)\n\n\n\n   where *'*'* denotes the cross-correlation operation with stride *s*\n\n   and dilation *d*.\n\n\n\n   Parameters:\n\n      * **out_ch** (*int*) -- The number of filters/kernels to compute\n\n        in the current layer\n\n\n\n      * **kernel_shape** (*2-tuple*) -- The dimension of a single 2D\n\n        filter/kernel in the current layer\n\n\n\n      * **act_fn** (str, Activation object, or None) -- The activation\n\n        function for computing \"Y[t]\". If None, use the identity\n\n        function f(X) = X by default. Default is None.\n\n\n\n      * **pad** (*int**, **tuple**, or **'same'*) -- The number of\n\n        rows/columns to zero-pad the input with. Default is 0.\n\n\n\n      * **stride** (*int*) -- The stride/hop of the convolution\n\n        kernels as they move over the input volume. Default is 1.\n\n\n\n      * **dilation** (*int*) -- Number of pixels inserted between\n\n        kernel elements. Effective kernel shape after dilation is:\n\n        \"[kernel_rows * (d + 1) - d, kernel_cols * (d + 1) - d]\".\n\n        Default is 0.\n\n\n\n      * **init** (*{'glorot_normal'**, **'glorot_uniform'**,\n\n        **'he_normal'**, **'he_uniform'}*) -- The weight\n\n        initialization strategy. Default is *'glorot_uniform'*.\n\n\n\n      * **optimizer** (str, Optimizer object, or None) -- The\n\n        optimization strategy to use when performing gradient updates\n\n        within the \"update()\" method.  If None, use the \"SGD\"\n\n        optimizer with default parameters. Default is None.\n\n\n\n   Variables:\n\n      * **X** (*list*) -- Running list of inputs to the \"forward\"\n\n        method since the last call to \"update\". Only updated if the\n\n        *retain_derived* argument was set to True.\n\n\n\n      * **gradients** (*dict*) -- Dictionary of loss gradients with\n\n        regard to the layer parameters\n\n\n\n      * **parameters** (*dict*) -- Dictionary of layer parameters\n\n\n\n      * **hyperparameters** (*dict*) -- Dictionary of layer\n\n        hyperparameters\n\n\n\n      * **derived_variables** (*dict*) -- Dictionary of any\n\n        intermediate values computed during forward/backward\n\n        propagation.\n\n\n\n   property hyperparameters\n\n\n\n      A dictionary containing the layer hyperparameters.\n\n\n\n   forward(X, retain_derived=True)\n\n\n\n      Compute the layer output given input volume *X*.\n\n\n\n      Parameters:\n\n         * **X** (\"ndarray\" of shape *(n_ex, in_rows, in_cols,\n\n           in_ch)*) -- The input volume consisting of *n_ex* examples,\n\n           each with dimension (*in_rows*, *in_cols*, *in_ch*).\n\n\n\n         * **retain_derived** (*bool*) -- Whether to retain the\n\n           variables calculated during the forward pass for use later\n\n           during backprop. If False, this suggests the layer will not\n\n           be expected to backprop through wrt. this input. Default is\n\n           True.\n\n\n\n      Returns:\n\n         **Y** (\"ndarray\" of shape *(n_ex, out_rows, out_cols,\n\n         out_ch)*) -- The layer output.\n\n\n\n   backward(dLdy, retain_grads=True)\n\n\n\n      Compute the gradient of the loss with respect to the layer\n\n      parameters.\n\n\n\n      -[ Notes ]-\n\n\n\n      Relies on \"im2col()\" and \"col2im()\" to vectorize the gradient\n\n      calculation.\n\n\n\n      See the private method \"_backward_naive()\" for a more\n\n      straightforward implementation.\n\n\n\n      Parameters:\n\n         * **dLdy** (\"ndarray\" of shape *(n_ex, out_rows, out_cols,\n\n           out_ch)* or list of arrays) -- The gradient(s) of the loss\n\n           with respect to the layer output(s).\n\n\n\n         * **retain_grads** (*bool*) -- Whether to include the\n\n           intermediate parameter gradients computed during the\n\n           backward pass in the final parameter update. Default is\n\n           True.\n\n\n\n      Returns:\n\n         **dX** (\"ndarray\" of shape *(n_ex, in_rows, in_cols, in_ch)*)\n\n         -- The gradient of the loss with respect to the layer input\n\n         volume.\n", "class_name": "numpy_ml.neural_nets.layers.Conv2D", "class_link": "numpy_ml/neural_nets/layers/layers.py#L2895-L3178", "test_file_path": "numpy_ml/tests/test_Conv2D.py"}
+{"title": "LayerNorm1D", "class_annotation": "numpy_ml.neural_nets.layers.LayerNorm1D(epsilon=1e-05, optimizer=None)", "comment": "\"LayerNorm1D\"\n\n*************\n\n\n\nclass numpy_ml.neural_nets.layers.LayerNorm1D(epsilon=1e-05, optimizer=None)\n\n\n\n   Bases: \"LayerBase\"\n\n\n\n   A layer normalization layer for 1D inputs.\n\n\n\n   -[ Notes ]-\n\n\n\n   In contrast to \"BatchNorm1D\", the LayerNorm layer calculates the\n\n   mean and variance across *features* rather than examples in the\n\n   batch ensuring that the mean and variance estimates are independent\n\n   of batch size and permitting straightforward application in RNNs.\n\n\n\n   Equations [train & test]:\n\n\n\n      Y = scaler * norm(X) + intercept\n\n      norm(X) = (X - mean(X)) / sqrt(var(X) + epsilon)\n\n\n\n   Also in contrast to \"BatchNorm1D\", *scaler* and *intercept* are\n\n   applied *elementwise* to \"norm(X)\".\n\n\n\n   Parameters:\n\n      * **epsilon** (*float*) -- A small smoothing constant to use\n\n        during computation of \"norm(X)\" to avoid divide-by-zero\n\n        errors. Default is 1e-5.\n\n\n\n      * **optimizer** (str, Optimizer object, or None) -- The\n\n        optimization strategy to use when performing gradient updates\n\n        within the \"update()\" method.  If None, use the \"SGD\"\n\n        optimizer with default parameters. Default is None.\n\n\n\n   Variables:\n\n      * **X** (*list*) -- Running list of inputs to the \"forward\"\n\n        method since the last call to \"update\". Only updated if the\n\n        *retain_derived* argument was set to True.\n\n\n\n      * **gradients** (*dict*) -- Dictionary of loss gradients with\n\n        regard to the layer parameters\n\n\n\n      * **parameters** (*dict*) -- Dictionary of layer parameters\n\n\n\n      * **hyperparameters** (*dict*) -- Dictionary of layer\n\n        hyperparameters\n\n\n\n      * **derived_variables** (*dict*) -- Dictionary of any\n\n        intermediate values computed during forward/backward\n\n        propagation.\n\n\n\n   property hyperparameters\n\n\n\n      Return a dictionary containing the layer hyperparameters.\n\n\n\n   forward(X, retain_derived=True)\n\n\n\n      Compute the layer output on a single minibatch.\n\n\n\n      Parameters:\n\n         * **X** (\"ndarray\" of shape *(n_ex, n_in)*) -- Layer input,\n\n           representing the *n_in*-dimensional features for a\n\n           minibatch of *n_ex* examples.\n\n\n\n         * **retain_derived** (*bool*) -- Whether to retain the\n\n           variables calculated during the forward pass for use later\n\n           during backprop. If False, this suggests the layer will not\n\n           be expected to backprop through wrt. this input. Default is\n\n           True.\n\n\n\n      Returns:\n\n         **Y** (\"ndarray\" of shape *(n_ex, n_in)*) -- Layer output for\n\n         each of the *n_ex* examples.\n\n\n\n   backward(dLdy, retain_grads=True)\n\n\n\n      Backprop from layer outputs to inputs.\n\n\n\n      Parameters:\n\n         * **dLdY** (\"ndarray\" of shape *(n_ex, n_in)*) -- The\n\n           gradient of the loss wrt. the layer output *Y*.\n\n\n\n         * **retain_grads** (*bool*) -- Whether to include the\n\n           intermediate parameter gradients computed during the\n\n           backward pass in the final parameter update. Default is\n\n           True.\n\n\n\n      Returns:\n\n         **dX** (\"ndarray\" of shape *(n_ex, n_in)*) -- The gradient of\n\n         the loss wrt. the layer input *X*.\n", "class_name": "numpy_ml.neural_nets.layers.LayerNorm1D", "class_link": "numpy_ml/neural_nets/layers/layers.py#L1634-L1803", "test_file_path": "numpy_ml/tests/test_LayerNorm1D.py"}
+{"title": "SkipConnectionConvModule", "class_annotation": "numpy_ml.neural_nets.modules.SkipConnectionConvModule(out_ch1, out_ch2, kernel_shape1, kernel_shape2, kernel_shape_skip, pad1=0, pad2=0, stride1=1, stride2=1, act_fn=None, epsilon=1e-05, momentum=0.9, stride_skip=1, optimizer=None, init='glorot_uniform')", "comment": "\"SkipConnectionConvModule\"\n\n**************************\n\n\n\nclass numpy_ml.neural_nets.modules.SkipConnectionConvModule(out_ch1, out_ch2, kernel_shape1, kernel_shape2, kernel_shape_skip, pad1=0, pad2=0, stride1=1, stride2=1, act_fn=None, epsilon=1e-05, momentum=0.9, stride_skip=1, optimizer=None, init='glorot_uniform')\n\n\n\n   A ResNet-like \"convolution\" shortcut module.\n\n\n\n   -[ Notes ]-\n\n\n\n   In contrast to \"SkipConnectionIdentityModule\", the additional\n\n   *conv2d_skip* and *batchnorm_skip* layers in the shortcut path\n\n   allow adjusting the dimensions of *X* to match the output of the\n\n   main set of convolutions.\n\n\n\n      X -> Conv2D -> Act_fn -> BatchNorm2D -> Conv2D -> BatchNorm2D -> + -> Act_fn\n\n       \\_____________________ Conv2D -> Batchnorm2D __________________/\n\n\n\n   -[ References ]-\n\n\n\n   [1] He et al. (2015). \"Deep residual learning for image\n\n       recognition.\" https://arxiv.org/pdf/1512.03385.pdf\n\n\n\n   Parameters:\n\n      * **out_ch1** (*int*) -- The number of filters/kernels to\n\n        compute in the first convolutional layer.\n\n\n\n      * **out_ch2** (*int*) -- The number of filters/kernels to\n\n        compute in the second convolutional layer.\n\n\n\n      * **kernel_shape1** (*2-tuple*) -- The dimension of a single 2D\n\n        filter/kernel in the first convolutional layer.\n\n\n\n      * **kernel_shape2** (*2-tuple*) -- The dimension of a single 2D\n\n        filter/kernel in the second convolutional layer.\n\n\n\n      * **kernel_shape_skip** (*2-tuple*) -- The dimension of a single\n\n        2D filter/kernel in the \"skip\" convolutional layer.\n\n\n\n      * **stride1** (*int*) -- The stride/hop of the convolution\n\n        kernels in the first convolutional layer. Default is 1.\n\n\n\n      * **stride2** (*int*) -- The stride/hop of the convolution\n\n        kernels in the second convolutional layer. Default is 1.\n\n\n\n      * **stride_skip** (*int*) -- The stride/hop of the convolution\n\n        kernels in the \"skip\" convolutional layer. Default is 1.\n\n\n\n      * **pad1** (*int**, **tuple**, or **'same'*) -- The number of\n\n        rows/columns of 0's to pad the input to the first\n\n        convolutional layer with. Default is 0.\n\n\n\n      * **pad2** (*int**, **tuple**, or **'same'*) -- The number of\n\n        rows/columns of 0's to pad the input to the second\n\n        convolutional layer with. Default is 0.\n\n\n\n      * **act_fn** (Activation object or None) -- The activation\n\n        function for computing \"Y[t]\". If None, use the identity f(x)\n\n        = x by default. Default is None.\n\n\n\n      * **epsilon** (*float*) -- A small smoothing constant to use\n\n        during \"BatchNorm2D\" computation to avoid divide-by-zero\n\n        errors. Default is 1e-5.\n\n\n\n      * **momentum** (*float*) -- The momentum term for the running\n\n        mean/running std calculations in the \"BatchNorm2D\" layers.\n\n        The closer this is to 1, the less weight will be given to the\n\n        mean/std of the current batch (i.e., higher smoothing).\n\n        Default is 0.9.\n\n\n\n      * **init** (*str*) -- The weight initialization strategy. Valid\n\n        entries are {'glorot_normal', 'glorot_uniform', 'he_normal',\n\n        'he_uniform'}.\n\n\n\n      * **optimizer** (str or Optimizer object) -- The optimization\n\n        strategy to use when performing gradient updates within the\n\n        \"update\" method.  If None, use the \"SGD\" optimizer with\n\n        default parameters. Default is None.\n\n\n\n   property parameters\n\n\n\n      A dictionary of the module parameters.\n\n\n\n   property hyperparameters\n\n\n\n      A dictionary of the module hyperparameters.\n\n\n\n   property derived_variables\n\n\n\n      A dictionary of intermediate values computed during the\n\n      forward/backward passes.\n\n\n\n   property gradients\n\n\n\n      A dictionary of the accumulated module parameter gradients.\n\n\n\n   forward(X, retain_derived=True)\n\n\n\n      Compute the layer output given input volume *X*.\n\n\n\n      Parameters:\n\n         * **X** (\"ndarray\" of shape *(n_ex, in_rows, in_cols,\n\n           in_ch)*) -- The input volume consisting of *n_ex* examples,\n\n           each with dimension (*in_rows*, *in_cols*, *in_ch*).\n\n\n\n         * **retain_derived** (*bool*) -- Whether to retain the\n\n           variables calculated during the forward pass for use later\n\n           during backprop. If False, this suggests the layer will not\n\n           be expected to backprop through wrt. this input. Default is\n\n           True.\n\n\n\n      Returns:\n\n         **Y** (\"ndarray\" of shape *(n_ex, out_rows, out_cols,\n\n         out_ch)*) -- The module output volume.\n\n\n\n   backward(dLdY, retain_grads=True)\n\n\n\n      Compute the gradient of the loss with respect to the module\n\n      parameters.\n\n\n\n      Parameters:\n\n         * **dLdy** (\"ndarray\" of shape *(n_ex, out_rows, out_cols,\n\n           out_ch)*) --\n\n\n\n         * **arrays** (*or list of*) -- The gradient(s) of the loss\n\n           with respect to the module output(s).\n\n\n\n         * **retain_grads** (*bool*) -- Whether to include the\n\n           intermediate parameter gradients computed during the\n\n           backward pass in the final parameter update. Default is\n\n           True.\n\n\n\n      Returns:\n\n         **dX** (\"ndarray\" of shape *(n_ex, in_rows, in_cols, in_ch)*)\n\n         -- The gradient of the loss with respect to the module input\n\n         volume.\n", "class_name": "numpy_ml.neural_nets.modules.SkipConnectionConvModule", "class_link": "numpy_ml/neural_nets/modules/modules.py#L615-L984", "test_file_path": "numpy_ml/tests/test_SkipConnectionConvModule.py"}
+{"title": "WGAN_GPLoss", "class_annotation": "numpy_ml.neural_nets.losses.WGAN_GPLoss(lambda_=10)", "comment": "\"WGAN_GPLoss\"\n\n*************\n\n\n\nclass numpy_ml.neural_nets.losses.WGAN_GPLoss(lambda_=10)\n\n\n\n   The loss function for a Wasserstein GAN [*] [\u2020] with gradient\n\n   penalty.\n\n\n\n   -[ Notes ]-\n\n\n\n   Assuming an optimal critic, minimizing this quantity wrt. the\n\n   generator parameters corresponds to minimizing the Wasserstein-1\n\n   (earth-mover) distance between the fake and real data\n\n   distributions.\n\n\n\n   The formula for the WGAN-GP critic loss is\n\n\n\n      \\text{WGANLoss}     &=  \\sum_{x \\in X_{real}} p(x) D(x)\n\n      - \\sum_{x' \\in X_{fake}} p(x') D(x') \\\\ \\text{WGANLossGP}     &=\n\n      \\text{WGANLoss} + \\lambda         (||\\nabla_{X_{interp}}\n\n      D(X_{interp})||_2 - 1)^2\n\n\n\n   where\n\n\n\n      X_{fake}  &=   \\text{Generator}(\\mathbf{z}) \\\\ X_{interp}   &=\n\n      \\alpha X_{real} + (1 - \\alpha) X_{fake} \\\\\n\n\n\n   and\n\n\n\n      \\mathbf{z}  &\\sim  \\mathcal{N}(0, \\mathbb{1}) \\\\ \\alpha  &\\sim\n\n      \\text{Uniform}(0, 1)\n\n\n\n   -[ References ]-\n\n\n\n   [*] Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., &\n\n       Courville, A. (2017) \"Improved training of Wasserstein GANs\"\n\n       *Advances in Neural Information Processing Systems, 31*:\n\n       5769-5779.\n\n\n\n   [\u2020] Goodfellow, I. J, Abadie, P. A., Mirza, M., Xu, B., Farley, D.\n\n       W., Ozair, S., Courville, A., & Bengio, Y. (2014) \"Generative\n\n       adversarial nets\" *Advances in Neural Information Processing\n\n       Systems, 27*: 2672-2680.\n\n\n\n   Parameters:\n\n      **lambda** (*float*) -- The gradient penalty coefficient.\n\n      Default is 10.\n\n\n\n   loss(Y_fake, module, Y_real=None, gradInterp=None)\n\n\n\n      Computes the generator and critic loss using the WGAN-GP value\n\n      function.\n\n\n\n      Parameters:\n\n         * **Y_fake** (\"ndarray\" of shape (n_ex,)) -- The output of\n\n           the critic for *X_fake*.\n\n\n\n         * **module** (*{'C'**, **'G'}*) -- Whether to calculate the\n\n           loss for the critic ('C') or the generator ('G'). If\n\n           calculating loss for the critic, *Y_real* and *gradInterp*\n\n           must not be None.\n\n\n\n         * **Y_real** (\"ndarray\" of shape *(n_ex,)* or None) -- The\n\n           output of the critic for *X_real*. Default is None.\n\n\n\n         * **gradInterp** (\"ndarray\" of shape *(n_ex, n_feats)* or\n\n           None) -- The gradient of the critic output for *X_interp*\n\n           wrt. *X_interp*. Default is None.\n\n\n\n      Returns:\n\n         **loss** (*float*) -- Depending on the setting for *module*,\n\n         either the critic or generator loss, averaged over examples\n\n         in the minibatch.\n\n\n\n   grad(Y_fake, module, Y_real=None, gradInterp=None)\n\n\n\n      Computes the gradient of the generator or critic loss with\n\n      regard to its inputs.\n\n\n\n      Parameters:\n\n         * **Y_fake** (\"ndarray\" of shape *(n_ex,)*) -- The output of\n\n           the critic for *X_fake*.\n\n\n\n         * **module** (*{'C'**, **'G'}*) -- Whether to calculate the\n\n           gradient for the critic loss ('C') or the generator loss\n\n           ('G'). If calculating grads for the critic, *Y_real* and\n\n           *gradInterp* must not be None.\n\n\n\n         * **Y_real** (\"ndarray\" of shape *(n_ex,)* or None) -- The\n\n           output of the critic for *X_real*. Default is None.\n\n\n\n         * **gradInterp** (\"ndarray\" of shape *(n_ex, n_feats)* or\n\n           None) -- The gradient of the critic output on *X_interp*\n\n           wrt. *X_interp*. Default is None.\n\n\n\n      Returns:\n\n         **grads** (*tuple*) -- If *module* == 'C', returns a 3-tuple\n\n         containing the gradient of the critic loss with regard to\n\n         (*Y_fake*, *Y_real*, *gradInterp*). If *module* == 'G',\n\n         returns the gradient of the generator with regard to\n\n         *Y_fake*.\n", "class_name": "numpy_ml.neural_nets.losses.WGAN_GPLoss", "class_link": "numpy_ml/neural_nets/losses/losses.py#L331-L511", "test_file_path": "numpy_ml/tests/test_WGAN_GPLoss.py"}
+{"title": "chebyshev", "class_annotation": "numpy_ml.utils.distance_metrics.chebyshev(x, y)", "comment": "\"chebyshev\"\n\n***********\n\n\n\nnumpy_ml.utils.distance_metrics.chebyshev(x, y)\n\n\n\n   Compute the Chebyshev (L_\\infty) distance between two real vectors\n\n\n\n   -[ Notes ]-\n\n\n\n   The Chebyshev distance between two vectors **x** and **y** is\n\n\n\n      d(\\mathbf{x}, \\mathbf{y}) = \\max_i |x_i - y_i|\n\n\n\n   Parameters:\n\n      * **x** (\"ndarray\" s of shape *(N,)*) -- The two vectors to\n\n        compute the distance between\n\n\n\n      * **y** (\"ndarray\" s of shape *(N,)*) -- The two vectors to\n\n        compute the distance between\n\n\n\n   Returns:\n\n      **d** (*float*) -- The Chebyshev distance between **x** and\n\n      **y**.\n", "class_name": "numpy_ml.utils.distance_metrics.chebyshev", "class_link": "numpy_ml/utils/distance_metrics.py#L54-L76", "test_file_path": "numpy_ml/tests/test_chebyshev.py"}
+{"title": "DiGraph", "class_annotation": "numpy_ml.utils.graphs.DiGraph(V, E)", "comment": "\"DiGraph\"\n\n*********\n\n\n\nclass numpy_ml.utils.graphs.DiGraph(V, E)\n\n\n\n   Bases: \"Graph\"\n\n\n\n   A generic directed graph object.\n\n\n\n   Parameters:\n\n      * **V** (*list*) -- A list of vertex IDs.\n\n\n\n      * **E** (list of \"Edge\" objects) -- A list of directed edges\n\n        connecting pairs of vertices in \"V\".\n\n\n\n   reverse()\n\n\n\n      Reverse the direction of all edges in the graph\n\n\n\n   topological_ordering()\n\n\n\n      Returns a (non-unique) topological sort / linearization of the\n\n      nodes IFF the graph is acyclic, otherwise returns None.\n\n\n\n      -[ Notes ]-\n\n\n\n      A topological sort is an ordering on the nodes in *G* such that\n\n      for every directed edge u \\rightarrow v in the graph, *u*\n\n      appears before *v* in the ordering.  The topological ordering is\n\n      produced by ordering the nodes in *G* by their DFS \"last visit\n\n      time,\" from greatest to smallest.\n\n\n\n      This implementation follows a recursive, DFS-based approach [1]\n\n      which may break if the graph is very large. For an iterative\n\n      version, see Khan's algorithm [2] .\n\n\n\n      -[ References ]-\n\n\n\n      [1] Tarjan, R. (1976), Edge-disjoint spanning trees and depth-\n\n          first search, *Acta Informatica, 6 (2)*: 171\u2013185.\n\n\n\n      [2] Kahn, A. (1962), Topological sorting of large networks,\n\n          *Communications of the ACM, 5 (11)*: 558\u2013562.\n\n\n\n      Returns:\n\n         **ordering** (*list or None*) -- A topoligical ordering of\n\n         the vertex indices if the graph is a DAG, otherwise None.\n\n\n\n   is_acyclic()\n\n\n\n      Check whether the graph contains cycles\n", "class_name": "numpy_ml.utils.graphs.DiGraph", "class_link": "numpy_ml/utils/graphs.py#L173-L263", "test_file_path": "numpy_ml/tests/test_DiGraph.py"}
+{"title": "Add", "class_annotation": "numpy_ml.neural_nets.layers.Add(act_fn=None, optimizer=None)", "comment": "\"Add\"\n\n*****\n\n\n\nclass numpy_ml.neural_nets.layers.Add(act_fn=None, optimizer=None)\n\n\n\n   Bases: \"LayerBase\"\n\n\n\n   An \"addition\" layer that returns the sum of its inputs, passed\n\n   through an optional nonlinearity.\n\n\n\n   Parameters:\n\n      * **act_fn** (str, Activation object, or None) -- The element-\n\n        wise output nonlinearity used in computing the final output.\n\n        If None, use the identity function f(x) = x. Default is None.\n\n\n\n      * **optimizer** (str, Optimizer object, or None) -- The\n\n        optimization strategy to use when performing gradient updates\n\n        within the \"update()\" method.  If None, use the \"SGD\"\n\n        optimizer with default parameters. Default is None.\n\n\n\n   Variables:\n\n      * **X** (*list*) -- Running list of inputs to the \"forward\"\n\n        method since the last call to \"update\". Only updated if the\n\n        *retain_derived* argument was set to True.\n\n\n\n      * **gradients** (*dict*) -- Unused\n\n\n\n      * **parameters** (*dict*) -- Unused\n\n\n\n      * **hyperparameters** (*dict*) -- Dictionary of layer\n\n        hyperparameters\n\n\n\n      * **derived_variables** (*dict*) -- Dictionary of any\n\n        intermediate values computed during forward/backward\n\n        propagation.\n\n\n\n   property hyperparameters\n\n\n\n      Return a dictionary containing the layer hyperparameters.\n\n\n\n   forward(X, retain_derived=True)\n\n\n\n      Compute the layer output on a single minibatch.\n\n\n\n      Parameters:\n\n         * **X** (list of length *n_inputs*) -- A list of tensors, all\n\n           of the same shape.\n\n\n\n         * **retain_derived** (*bool*) -- Whether to retain the\n\n           variables calculated during the forward pass for use later\n\n           during backprop. If False, this suggests the layer will not\n\n           be expected to backprop through wrt. this input. Default is\n\n           True.\n\n\n\n      Returns:\n\n         **Y** (\"ndarray\" of shape *(n_ex, *)*) -- The sum over the\n\n         *n_ex* examples.\n\n\n\n   backward(dLdY, retain_grads=True)\n\n\n\n      Backprop from layer outputs to inputs.\n\n\n\n      Parameters:\n\n         * **dLdY** (\"ndarray\" of shape *(n_ex, *)*) -- The gradient\n\n           of the loss wrt. the layer output *Y*.\n\n\n\n         * **retain_grads** (*bool*) -- Whether to include the\n\n           intermediate parameter gradients computed during the\n\n           backward pass in the final parameter update. Default is\n\n           True.\n\n\n\n      Returns:\n\n         **dX** (list of length *n_inputs*) -- The gradient of the\n\n         loss wrt. each input in *X*.\n", "class_name": "numpy_ml.neural_nets.layers.Add", "class_link": "numpy_ml/neural_nets/layers/layers.py#L633-L742", "test_file_path": "numpy_ml/tests/test_Add.py"}
+{"title": "DecisionTree", "class_annotation": "numpy_ml.trees.DecisionTree(classifier=True, max_depth=None, n_feats=None, criterion='entropy', seed=None)", "comment": "\"DecisionTree\"\n\n**************\n\n\n\nclass numpy_ml.trees.DecisionTree(classifier=True, max_depth=None, n_feats=None, criterion='entropy', seed=None)\n\n\n\n   A decision tree model for regression and classification problems.\n\n\n\n   Parameters:\n\n      * **classifier** (*bool*) -- Whether to treat target values as\n\n        categorical (classifier = True) or continuous (classifier =\n\n        False). Default is True.\n\n\n\n      * **max_depth** (*int** or **None*) -- The depth at which to\n\n        stop growing the tree. If None, grow the tree until all leaves\n\n        are pure. Default is None.\n\n\n\n      * **n_feats** (*int*) -- Specifies the number of features to\n\n        sample on each split. If None, use all features on each split.\n\n        Default is None.\n\n\n\n      * **criterion** (*{'mse'**, **'entropy'**, **'gini'}*) -- The\n\n        error criterion to use when calculating splits. When\n\n        *classifier* is False, valid entries are {'mse'}. When\n\n        *classifier* is True, valid entries are {'entropy', 'gini'}.\n\n        Default is 'entropy'.\n\n\n\n      * **seed** (*int** or **None*) -- Seed for the random number\n\n        generator. Default is None.\n\n\n\n   fit(X, Y)\n\n\n\n      Fit a binary decision tree to a dataset.\n\n\n\n      Parameters:\n\n         * **X** (\"ndarray\" of shape *(N, M)*) -- The training data of\n\n           *N* examples, each with *M* features\n\n\n\n         * **Y** (\"ndarray\" of shape *(N,)*) -- An array of integer\n\n           class labels for each example in *X* if self.classifier =\n\n           True, otherwise the set of target values for each example\n\n           in *X*.\n\n\n\n   predict(X)\n\n\n\n      Use the trained decision tree to classify or predict the\n\n      examples in *X*.\n\n\n\n      Parameters:\n\n         **X** (\"ndarray\" of shape *(N, M)*) -- The training data of\n\n         *N* examples, each with *M* features\n\n\n\n      Returns:\n\n         **preds** (\"ndarray\" of shape *(N,)*) -- The integer class\n\n         labels predicted for each example in *X* if self.classifier =\n\n         True, otherwise the predicted target values.\n\n\n\n   predict_class_probs(X)\n\n\n\n      Use the trained decision tree to return the class probabilities\n\n      for the examples in *X*.\n\n\n\n      Parameters:\n\n         **X** (\"ndarray\" of shape *(N, M)*) -- The training data of\n\n         *N* examples, each with *M* features\n\n\n\n      Returns:\n\n         **preds** (\"ndarray\" of shape *(N, n_classes)*) -- The class\n\n         probabilities predicted for each example in *X*.\n", "class_name": "numpy_ml.trees.DecisionTree", "class_link": "numpy_ml/trees/dt.py#L21-L212", "test_file_path": "numpy_ml/tests/test_DecisionTree.py"}
+{"title": "PolynomialKernel", "class_annotation": "numpy_ml.utils.kernels.PolynomialKernel(d=3, gamma=None, c0=1)", "comment": "\"PolynomialKernel\"\n\n******************\n\n\n\nclass numpy_ml.utils.kernels.PolynomialKernel(d=3, gamma=None, c0=1)\n\n\n\n   The degree-*d* polynomial kernel.\n\n\n\n   -[ Notes ]-\n\n\n\n   For input vectors \\mathbf{x} and \\mathbf{y}, the polynomial kernel\n\n   is:\n\n\n\n      k(\\mathbf{x}, \\mathbf{y}) = (\\gamma \\mathbf{x}^\\top \\mathbf{y} +\n\n      c_0)^d\n\n\n\n   In contrast to the linear kernel, the polynomial kernel also\n\n   computes similarities *across* dimensions of the **x** and **y**\n\n   vectors, allowing it to account for interactions between features.\n\n   As an instance of the dot product family of kernels, the polynomial\n\n   kernel is invariant to a rotation of the coordinates about the\n\n   origin, but *not* to translations.\n\n\n\n   Parameters:\n\n      * **d** (*int*) -- Degree of the polynomial kernel. Default is\n\n        3.\n\n\n\n      * **gamma** (*float** or **None*) -- A scaling parameter for the\n\n        dot product between *x* and *y*, determining the amount of\n\n        smoothing/resonlution of the kernel. Larger values result in\n\n        greater smoothing. If None, defaults to 1 / *C*.  Sometimes\n\n        referred to as the kernel bandwidth.  Default is None.\n\n\n\n      * **c0** (*float*) -- Parameter trading off the influence of\n\n        higher-order versus lower-order terms in the polynomial. If\n\n        *c0* = 0, the kernel is said to be homogenous. Default is 1.\n\n\n\n   set_params(summary_dict)\n\n\n\n      Set the model parameters and hyperparameters using the settings\n\n      in *summary_dict*.\n\n\n\n      Parameters:\n\n         **summary_dict** (*dict*) -- A dictionary with keys\n\n         'parameters' and 'hyperparameters', structured as would be\n\n         returned by the \"summary()\" method. If a particular\n\n         (hyper)parameter is not included in this dict, the current\n\n         value will be used.\n\n\n\n      Returns:\n\n         **new_kernel** (Kernel instance) -- A kernel with parameters\n\n         and hyperparameters adjusted to those specified in\n\n         *summary_dict*.\n\n\n\n   summary()\n\n\n\n      Return the dictionary of model parameters, hyperparameters, and\n\n      ID\n", "class_name": "numpy_ml.utils.kernels.PolynomialKernel", "class_link": "numpy_ml/utils/kernels.py#L119-L181", "test_file_path": "numpy_ml/tests/test_PolynomialKernel.py"}
+{"title": "VAELoss", "class_annotation": "numpy_ml.neural_nets.losses.VAELoss", "comment": "\"VAELoss\"\n\n*********\n\n\n\nclass numpy_ml.neural_nets.losses.VAELoss\n\n\n\n   The variational lower bound for a variational autoencoder with\n\n   Bernoulli units.\n\n\n\n   -[ Notes ]-\n\n\n\n   The VLB to the sum of the binary cross entropy between the true\n\n   input and the predicted output (the \"reconstruction loss\") and the\n\n   KL divergence between the learned variational distribution q and\n\n   the prior, p, assumed to be a unit Gaussian.\n\n\n\n      \\text{VAELoss} =     \\text{cross_entropy}(\\mathbf{y},\n\n      \\hat{\\mathbf{y}})         + \\mathbb{KL}[q \\ || \\ p]\n\n\n\n   where \\mathbb{KL}[q \\ || \\ p] is the Kullback-Leibler divergence\n\n   between the distributions q and p.\n\n\n\n   -[ References ]-\n\n\n\n   [1] Kingma, D. P. & Welling, M. (2014). \"Auto-encoding variational\n\n       Bayes\". *arXiv preprint arXiv:1312.6114.*\n\n       https://arxiv.org/pdf/1312.6114.pdf\n\n\n\n   static loss(y, y_pred, t_mean, t_log_var)\n\n\n\n      Variational lower bound for a Bernoulli VAE.\n\n\n\n      Parameters:\n\n         * **y** (\"ndarray\" of shape *(n_ex, N)*) -- The original\n\n           images.\n\n\n\n         * **y_pred** (\"ndarray\" of shape *(n_ex, N)*) -- The VAE\n\n           reconstruction of the images.\n\n\n\n         * **t_mean** (\"ndarray\" of shape *(n_ex, T)*) -- Mean of the\n\n           variational distribution q(t \\mid x).\n\n\n\n         * **t_log_var** (\"ndarray\" of shape *(n_ex, T)*) -- Log of\n\n           the variance vector of the variational distribution q(t\n\n           \\mid x).\n\n\n\n      Returns:\n\n         **loss** (*float*) -- The VLB, averaged across the batch.\n\n\n\n   static grad(y, y_pred, t_mean, t_log_var)\n\n\n\n      Compute the gradient of the VLB with regard to the network\n\n      parameters.\n\n\n\n      Parameters:\n\n         * **y** (\"ndarray\" of shape *(n_ex, N)*) -- The original\n\n           images.\n\n\n\n         * **y_pred** (\"ndarray\" of shape *(n_ex, N)*) -- The VAE\n\n           reconstruction of the images.\n\n\n\n         * **t_mean** (\"ndarray\" of shape *(n_ex, T)*) -- Mean of the\n\n           variational distribution q(t | x).\n\n\n\n         * **t_log_var** (\"ndarray\" of shape *(n_ex, T)*) -- Log of\n\n           the variance vector of the variational distribution q(t |\n\n           x).\n\n\n\n      Returns:\n\n         * **dY_pred** (\"ndarray\" of shape *(n_ex, N)*) -- The\n\n           gradient of the VLB with regard to *y_pred*.\n\n\n\n         * **dLogVar** (\"ndarray\" of shape *(n_ex, T)*) -- The\n\n           gradient of the VLB with regard to *t_log_var*.\n\n\n\n         * **dMean** (\"ndarray\" of shape *(n_ex, T)*) -- The gradient\n\n           of the VLB with regard to *t_mean*.\n", "class_name": "numpy_ml.neural_nets.losses.VAELoss", "class_link": "numpy_ml/neural_nets/losses/losses.py#L225-L328", "test_file_path": "numpy_ml/tests/test_VAELoss.py"}
+{"title": "MultiHeadedAttentionModule", "class_annotation": "numpy_ml.neural_nets.modules.MultiHeadedAttentionModule(n_heads=8, dropout_p=0, init='glorot_uniform', optimizer=None)", "comment": "\"MultiHeadedAttentionModule\"\n\n****************************\n\n\n\nclass numpy_ml.neural_nets.modules.MultiHeadedAttentionModule(n_heads=8, dropout_p=0, init='glorot_uniform', optimizer=None)\n\n\n\n   A mutli-headed attention module.\n\n\n\n   -[ Notes ]-\n\n\n\n   Multi-head attention allows a model to jointly attend to\n\n   information from different representation subspaces at different\n\n   positions. With a single head, this information would get averaged\n\n   away when the attention weights are combined with the value\n\n\n\n      \\text{MultiHead}(\\mathbf{Q}, \\mathbf{K}, \\mathbf{V})     =\n\n      [\\text{head}_1; ...; \\text{head}_h] \\mathbf{W}^{(O)}\n\n\n\n   where\n\n\n\n      \\text{head}_i = \\text{SDP_attention}(     \\mathbf{Q W}_i^{(Q)},\n\n      \\mathbf{K W}_i^{(K)}, \\mathbf{V W}_i^{(V)})\n\n\n\n   and the projection weights are parameter matrices:\n\n\n\n      \\mathbf{W}_i^{(Q)}  &\\in     \\mathbb{R}^{(\\text{kqv_dim} \\\n\n      \\times \\ \\text{latent_dim})} \\\\ \\mathbf{W}_i^{(K)}  &\\in\n\n      \\mathbb{R}^{(\\text{kqv_dim} \\ \\times \\ \\text{latent_dim})} \\\\\n\n      \\mathbf{W}_i^{(V)}  &\\in     \\mathbb{R}^{(\\text{kqv_dim} \\\n\n      \\times \\ \\text{latent_dim})} \\\\ \\mathbf{W}^{(O)}  &\\in\n\n      \\mathbb{R}^{(\\text{n_heads} \\cdot \\text{latent_dim} \\ \\times \\\n\n      \\text{kqv_dim})}\n\n\n\n   Importantly, the current module explicitly assumes that\n\n\n\n      \\text{kqv_dim} = \\text{dim(query)} = \\text{dim(keys)} =\n\n      \\text{dim(values)}\n\n\n\n   and that\n\n\n\n      \\text{latent_dim} = \\text{kqv_dim / n_heads}\n\n\n\n   **[MH Attention Head h]**:\n\n\n\n      K --> W_h^(K) ------\\\n\n      V --> W_h^(V) ------- > DP_Attention --> head_h\n\n      Q --> W_h^(Q) ------/\n\n\n\n   The full **[MultiHeadedAttentionModule]** then becomes\n\n\n\n            -----------------\n\n      K --> | [Attn Head 1] | --> head_1 --\\\n\n      V --> | [Attn Head 2] | --> head_2 --\\\n\n      Q --> |      ...      |      ...       --> Concat --> W^(O) --> MH_out\n\n            | [Attn Head Z] | --> head_Z --/\n\n            -----------------\n\n\n\n   Due to the reduced dimension of each head, the total computational\n\n   cost is similar to that of a single attention head with full (i.e.,\n\n   kqv_dim) dimensionality.\n\n\n\n   Parameters:\n\n      * **n_heads** (*int*) -- The number of simultaneous attention\n\n        heads to use. Note that the larger *n_heads*, the smaller the\n\n        dimensionality of any single head, since \"latent_dim = kqv_dim\n\n        / n_heads\". Default is 8.\n\n\n\n      * **dropout_p** (*float in** [**0**, **1**)*) -- The dropout\n\n        propbability during training, applied to the output of the\n\n        softmax in each dot-product attention head. If 0, no dropout\n\n        is applied. Default is 0.\n\n\n\n      * **init** (*{'glorot_normal'**, **'glorot_uniform'**,\n\n        **'he_normal'**, **'he_uniform'}*) -- The weight\n\n        initialization strategy. Default is 'glorot_uniform'.\n\n\n\n      * **optimizer** (str, Optimizer object, or None) -- The\n\n        optimization strategy to use when performing gradient updates\n\n        within the \"update()\" method.  If None, use the \"SGD\"\n\n        optimizer with default parameters. Default is None.\n\n\n\n   forward(Q, K, V)\n\n\n\n   backward(dLdy)\n\n\n\n   property derived_variables\n\n\n\n      A dictionary of intermediate values computed during the\n\n      forward/backward passes.\n\n\n\n   property gradients\n\n\n\n      A dictionary of the accumulated module parameter gradients.\n\n\n\n   property parameters\n\n\n\n      A dictionary of the module parameters.\n\n\n\n   property hyperparameters\n\n\n\n      A dictionary of the module hyperparameters.\n", "class_name": "numpy_ml.neural_nets.modules.MultiHeadedAttentionModule", "class_link": "numpy_ml/neural_nets/modules/modules.py#L1192-L1427", "test_file_path": "numpy_ml/tests/test_MultiHeadedAttentionModule.py"}
+{"title": "HuffmanEncoder", "class_annotation": "numpy_ml.preprocessing.nlp.HuffmanEncoder", "comment": "\"HuffmanEncoder\"\n\n****************\n\n\n\nclass numpy_ml.preprocessing.nlp.HuffmanEncoder\n\n\n\n   fit(text)\n\n\n\n      Build a Huffman tree for the tokens in *text* and compute each\n\n      token's binary encoding.\n\n\n\n      -[ Notes ]-\n\n\n\n      In a Huffman code, tokens that occur more frequently are\n\n      (generally) represented using fewer bits. Huffman codes produce\n\n      the minimum expected codeword length among all methods for\n\n      encoding tokens individually.\n\n\n\n      Huffman codes correspond to paths through a binary tree, with 1\n\n      corresponding to \"move right\" and 0 corresponding to \"move\n\n      left\". In contrast to standard binary trees, the Huffman tree is\n\n      constructed from the bottom up. Construction begins by\n\n      initializing a min-heap priority queue consisting of each token\n\n      in the corpus, with priority corresponding to the token\n\n      frequency. At each step, the two most infrequent tokens in the\n\n      corpus are removed and become the children of a parent\n\n      pseudotoken whose \"frequency\" is the sum of the frequencies of\n\n      its children. This new parent pseudotoken is added to the\n\n      priority queue and the process is repeated recursively until no\n\n      tokens remain.\n\n\n\n      Parameters:\n\n         **text** (list of strs or \"Vocabulary\" instance) -- The\n\n         tokenized text or a pretrained \"Vocabulary\" object to use for\n\n         building the Huffman code.\n\n\n\n   transform(text)\n\n\n\n      Transform the words in *text* into their Huffman-code\n\n      representations.\n\n\n\n      Parameters:\n\n         **text** (list of *N* strings) -- The list of words to encode\n\n\n\n      Returns:\n\n         **codes** (list of *N* binary strings) -- The encoded words\n\n         in *text*\n\n\n\n   inverse_transform(codes)\n\n\n\n      Transform an encoded sequence of bit-strings back into words.\n\n\n\n      Parameters:\n\n         **codes** (list of *N* binary strings) -- A list of encoded\n\n         bit-strings, represented as strings.\n\n\n\n      Returns:\n\n         **text** (list of *N* strings) -- The decoded text.\n\n\n\n   property tokens\n\n\n\n      A list the unique tokens in *text*\n\n\n\n   property codes\n\n\n\n      A list with the Huffman code for each unique token in *text*\n", "class_name": "numpy_ml.preprocessing.nlp.HuffmanEncoder", "class_link": "numpy_ml/preprocessing/nlp.py#L431-L565", "test_file_path": "numpy_ml/tests/test_HuffmanEncoder.py"}
+{"title": "GPRegression", "class_annotation": "numpy_ml.nonparametric.GPRegression(kernel='RBFKernel', alpha=1e-10)", "comment": "\"GPRegression\"\n\n**************\n\n\n\nclass numpy_ml.nonparametric.GPRegression(kernel='RBFKernel', alpha=1e-10)\n\n\n\n   A Gaussian Process (GP) regression model.\n\n\n\n      y \\mid X, f  &\\sim  \\mathcal{N}( [f(x_1), \\ldots, f(x_n)],\n\n      \\alpha I ) \\\\ f \\mid X     &\\sim  \\text{GP}(0, K)\n\n\n\n   for data D = \\{(x_1, y_1), \\ldots, (x_n, y_n) \\} and a covariance\n\n   matrix K_{ij} = \\text{kernel}(x_i, x_j) for all i, j \\in \\{1,\n\n   \\ldots, n \\}.\n\n\n\n   Parameters:\n\n      * **kernel** (*str*) -- The kernel to use in fitting the GP\n\n        prior. Default is 'RBFKernel'.\n\n\n\n      * **alpha** (*float*) -- An isotropic noise term for the\n\n        diagonal in the GP covariance, *K*. Larger values correspond\n\n        to the expectation of greater noise in the observed data\n\n        points. Default is 1e-10.\n\n\n\n   fit(X, y)\n\n\n\n      Fit the GP prior to the training data.\n\n\n\n      Parameters:\n\n         * **X** (\"ndarray\" of shape *(N, M)*) -- A training dataset\n\n           of *N* examples, each with dimensionality *M*.\n\n\n\n         * **y** (\"ndarray\" of shape *(N, O)*) -- A collection of\n\n           real-valued training targets for the examples in *X*, each\n\n           with dimension *O*.\n\n\n\n   predict(X, conf_interval=0.95, return_cov=False)\n\n\n\n      Return the MAP estimate for y^*, corresponding the mean/mode of\n\n      the posterior predictive distribution, p(y^* \\mid x^*, X, y).\n\n\n\n      -[ Notes ]-\n\n\n\n      Under the GP regression model, the posterior predictive\n\n      distribution is\n\n\n\n         y^* \\mid x^*, X, y \\sim \\mathcal{N}(\\mu^*, \\text{cov}^*)\n\n\n\n      where\n\n\n\n         \\mu^*  &=  K^* (K + \\alpha I)^{-1} y \\\\ \\text{cov}^*  &=\n\n         K^{**} - K^{*'} (K + \\alpha I)^{-1} K^*\n\n\n\n      and\n\n\n\n         K  &=  \\text{kernel}(X, X) \\\\ K^*  &=  \\text{kernel}(X, X^*)\n\n         \\\\ K^{**}  &=  \\text{kernel}(X^*, X^*)\n\n\n\n      NB. This implementation uses the inefficient but general purpose\n\n      *np.linalg.inv* routine to invert (K + \\alpha I). A more\n\n      efficient way is to rely on the fact that *K* (and hence also K\n\n      + \\alpha I) is symmetric positive (semi-)definite and take the\n\n      inner product of the inverse of its (lower) Cholesky\n\n      decompositions:\n\n\n\n         Q^{-1} = \\text{cholesky}(Q)^{-1 \\top} \\text{cholesky}(Q)^{-1}\n\n\n\n      For more details on a production-grade implementation, see\n\n      Algorithm 2.1 in Rasmussen & Williams (2006).\n\n\n\n      Parameters:\n\n         * **X** (\"ndarray\" of shape (N, M)) -- The collection of\n\n           datapoints to generate predictions on\n\n\n\n         * **conf_interval** (*float in** (**0**, **1**)*) -- The\n\n           percentage confidence bound to return for each prediction.\n\n           If the scipy package is not available, this value is always\n\n           set to 0.95. Default is 0.95.\n\n\n\n         * **return_cov** (*bool*) -- If True, also return the\n\n           covariance (*cov**) of the posterior predictive\n\n           distribution for the points in *X*. Default is False.\n\n\n\n      Returns:\n\n         * **y_pred** (\"ndarray\" of shape *(N, O)*) -- The predicted\n\n           values for each point in *X*, each with dimensionality *O*.\n\n\n\n         * **conf** (\"ndarray\" of shape *(N, O)*) -- The %\n\n           conf_interval confidence bound for each *y_pred*. The conf\n\n           % confidence interval for the *i*'th prediction is \"[y[i] -\n\n           conf[i], y[i] + conf[i]]\".\n\n\n\n         * **cov** (\"ndarray\" of shape *(N, N)*) -- The covariance\n\n           (*cov**) of the posterior predictive distribution for *X*.\n\n           Only returned if *return_cov* is True.\n\n\n\n   marginal_log_likelihood(kernel_params=None)\n\n\n\n      Compute the log of the marginal likelihood (i.e., the log model\n\n      evidence), p(y \\mid X, \\text{kernel_params}).\n\n\n\n      -[ Notes ]-\n\n\n\n      Under the GP regression model, the marginal likelihood is\n\n      normally distributed:\n\n\n\n         y | X, \\theta  \\sim  \\mathcal{N}(0, K + \\alpha I)\n\n\n\n      Hence,\n\n\n\n         \\log p(y \\mid X, \\theta) =     -0.5 \\log \\det(K + \\alpha I) -\n\n         0.5 y^\\top (K + \\alpha I)^{-1} y + \\frac{n}{2} \\log 2 \\pi\n\n\n\n      where K = \\text{kernel}(X, X), \\theta is the set of kernel\n\n      parameters, and *n* is the number of dimensions in *K*.\n\n\n\n      Parameters:\n\n         **kernel_params** (*dict*) -- Parameters for the kernel\n\n         function. If None, calculate the marginal likelihood under\n\n         the kernel parameters defined at model initialization.\n\n         Default is None.\n\n\n\n      Returns:\n\n         **marginal_log_likelihood** (*float*) -- The log likelihood\n\n         of the training targets given the kernel parameterized by\n\n         *kernel_params* and the training inputs, marginalized over\n\n         all functions *f*.\n\n\n\n   sample(X, n_samples=1, dist='posterior_predictive')\n\n\n\n      Sample functions from the GP prior or posterior predictive\n\n      distribution.\n\n\n\n      Parameters:\n\n         * **X** (\"ndarray\" of shape *(N, M)*) -- The collection of\n\n           datapoints to generate predictions on. Only used if *dist*\n\n           = 'posterior_predictive'.\n\n\n\n         * **n_samples** (*int*) -- The number of samples to generate.\n\n           Default is 1.\n\n\n\n         * **dist** (*{\"posterior_predictive\"**, **\"prior\"}*) -- The\n\n           distribution to draw samples from. Default is\n\n           \"posterior_predictive\".\n\n\n\n      Returns:\n\n         **samples** (\"ndarray\" of shape *(n_samples, O, N)*) -- The\n\n         generated samples for the points in *X*.\n", "class_name": "numpy_ml.nonparametric.GPRegression", "class_link": "numpy_ml/nonparametric/gp.py#L18-L258", "test_file_path": "numpy_ml/tests/test_GPRegression.py"}
+{"title": "CrossEntropy", "class_annotation": "numpy_ml.neural_nets.losses.CrossEntropy", "comment": "\"CrossEntropy\"\n\n**************\n\n\n\nclass numpy_ml.neural_nets.losses.CrossEntropy\n\n\n\n   A cross-entropy loss.\n\n\n\n   -[ Notes ]-\n\n\n\n   For a one-hot target **y** and predicted class probabilities\n\n   \\hat{\\mathbf{y}}, the cross entropy is\n\n\n\n      \\mathcal{L}(\\mathbf{y}, \\hat{\\mathbf{y}})     = \\sum_i y_i \\log\n\n      \\hat{y}_i\n\n\n\n   static loss(y, y_pred)\n\n\n\n      Compute the cross-entropy (log) loss.\n\n\n\n      -[ Notes ]-\n\n\n\n      This method returns the sum (not the average!) of the losses for\n\n      each sample.\n\n\n\n      Parameters:\n\n         * **y** (\"ndarray\" of shape (n, m)) -- Class labels (one-hot\n\n           with *m* possible classes) for each of *n* examples.\n\n\n\n         * **y_pred** (\"ndarray\" of shape (n, m)) -- Probabilities of\n\n           each of *m* classes for the *n* examples in the batch.\n\n\n\n      Returns:\n\n         **loss** (*float*) -- The sum of the cross-entropy across\n\n         classes and examples.\n\n\n\n   static grad(y, y_pred)\n\n\n\n      Compute the gradient of the cross entropy loss with regard to\n\n      the softmax input, *z*.\n\n\n\n      -[ Notes ]-\n\n\n\n      The gradient for this method goes through both the cross-entropy\n\n      loss AND the softmax non-linearity to return \\frac{\\partial\n\n      \\mathcal{L}}{\\partial \\mathbf{z}} (rather than \\frac{\\partial\n\n      \\mathcal{L}}{\\partial \\text{softmax}(\\mathbf{z})}).\n\n\n\n      In particular, let:\n\n\n\n         \\mathcal{L}(\\mathbf{z})     =\n\n         \\text{cross_entropy}(\\text{softmax}(\\mathbf{z})).\n\n\n\n      The current method computes:\n\n\n\n         \\frac{\\partial \\mathcal{L}}{\\partial \\mathbf{z}}     &=\n\n         \\text{softmax}(\\mathbf{z}) - \\mathbf{y} \\\\     &=\n\n         \\hat{\\mathbf{y}} - \\mathbf{y}\n\n\n\n      Parameters:\n\n         * **y** (\"ndarray\" of shape *(n, m)*) -- A one-hot encoding\n\n           of the true class labels. Each row constitues a training\n\n           example, and each column is a different class.\n\n\n\n         * **y_pred** (\"ndarray\" of shape *(n, m)*) -- The network\n\n           predictions for the probability of each of *m* class labels\n\n           on each of *n* examples in a batch.\n\n\n\n      Returns:\n\n         **grad** (\"ndarray\" of shape (n, m)) -- The gradient of the\n\n         cross-entropy loss with respect to the *input* to the softmax\n\n         function.\n", "class_name": "numpy_ml.neural_nets.losses.CrossEntropy", "class_link": "numpy_ml/neural_nets/losses/losses.py#L110-L222", "test_file_path": "numpy_ml/tests/test_CrossEntropy.py"}
+{"title": "RNNCell", "class_annotation": "numpy_ml.neural_nets.layers.RNNCell(n_out, act_fn='Tanh', init='glorot_uniform', optimizer=None)", "comment": "\"RNNCell\"\n\n*********\n\n\n\nclass numpy_ml.neural_nets.layers.RNNCell(n_out, act_fn='Tanh', init='glorot_uniform', optimizer=None)\n\n\n\n   Bases: \"LayerBase\"\n\n\n\n   A single step of a vanilla (Elman) RNN.\n\n\n\n   -[ Notes ]-\n\n\n\n   At timestep *t*, the vanilla RNN cell computes\n\n\n\n      \\mathbf{Z}^{(t)}  &=     \\mathbf{W}_{ax} \\mathbf{X}^{(t)} +\n\n      \\mathbf{b}_{ax} +         \\mathbf{W}_{aa} \\mathbf{A}^{(t-1)} +\n\n      \\mathbf{b}_{aa} \\\\ \\mathbf{A}^{(t)}  &=  f(\\mathbf{Z}^{(t)})\n\n\n\n   where\n\n\n\n   * \\mathbf{X}^{(t)} is the input at time *t*\n\n\n\n   * \\mathbf{A}^{(t)} is the hidden state at timestep *t*\n\n\n\n   * *f* is the layer activation function\n\n\n\n   * \\mathbf{W}_{ax} and \\mathbf{b}_{ax} are the weights and bias for\n\n     the input to hidden layer\n\n\n\n   * \\mathbf{W}_{aa} and \\mathbf{b}_{aa} are the weights and biases\n\n     for the hidden to hidden layer\n\n\n\n   Parameters:\n\n      * **n_out** (*int*) -- The dimension of a single hidden state /\n\n        output on a given timestep\n\n\n\n      * **act_fn** (str, Activation object, or None) -- The activation\n\n        function for computing \"A[t]\". Default is *'Tanh'*.\n\n\n\n      * **init** (*{'glorot_normal'**, **'glorot_uniform'**,\n\n        **'he_normal'**, **'he_uniform'}*) -- The weight\n\n        initialization strategy. Default is *'glorot_uniform'*.\n\n\n\n      * **optimizer** (str, Optimizer object, or None) -- The\n\n        optimization strategy to use when performing gradient updates\n\n        within the \"update()\" method.  If None, use the \"SGD\"\n\n        optimizer with default parameters. Default is None.\n\n\n\n   property hyperparameters\n\n\n\n      Return a dictionary containing the layer hyperparameters.\n\n\n\n   forward(Xt)\n\n\n\n      Compute the network output for a single timestep.\n\n\n\n      Parameters:\n\n         **Xt** (\"ndarray\" of shape *(n_ex, n_in)*) -- Input at\n\n         timestep *t* consisting of *n_ex* examples each of\n\n         dimensionality *n_in*.\n\n\n\n      Returns:\n\n         **At** (\"ndarray\" of shape *(n_ex, n_out)*) -- The value of\n\n         the hidden state at timestep *t* for each of the *n_ex*\n\n         examples.\n\n\n\n   backward(dLdAt)\n\n\n\n      Backprop for a single timestep.\n\n\n\n      Parameters:\n\n         **dLdAt** (\"ndarray\" of shape *(n_ex, n_out)*) -- The\n\n         gradient of the loss wrt. the layer outputs (ie., hidden\n\n         states) at timestep *t*.\n\n\n\n      Returns:\n\n         **dLdXt** (\"ndarray\" of shape *(n_ex, n_in)*) -- The gradient\n\n         of the loss wrt. the layer inputs at timestep *t*.\n\n\n\n   flush_gradients()\n\n\n\n      Erase all the layer's derived variables and gradients.\n", "class_name": "numpy_ml.neural_nets.layers.RNNCell", "class_link": "numpy_ml/neural_nets/layers/layers.py#L3576-L3779", "test_file_path": "numpy_ml/tests/test_RNNCell.py"}
+{"title": "minkowski", "class_annotation": "numpy_ml.utils.distance_metrics.minkowski(x, y, p)", "comment": "\"minkowski\"\n\n***********\n\n\n\nnumpy_ml.utils.distance_metrics.minkowski(x, y, p)\n\n\n\n   Compute the Minkowski-*p* distance between two real vectors.\n\n\n\n   -[ Notes ]-\n\n\n\n   The Minkowski-*p* distance between two vectors **x** and **y** is\n\n\n\n      d(\\mathbf{x}, \\mathbf{y}) = \\left( \\sum_i |x_i - y_i|^p\n\n      \\right)^{1/p}\n\n\n\n   Parameters:\n\n      * **x** (\"ndarray\" s of shape *(N,)*) -- The two vectors to\n\n        compute the distance between\n\n\n\n      * **y** (\"ndarray\" s of shape *(N,)*) -- The two vectors to\n\n        compute the distance between\n\n\n\n      * **p** (*float > 1*) -- The parameter of the distance function.\n\n        When *p = 1*, this is the *L1* distance, and when *p=2*, this\n\n        is the *L2* distance. For *p < 1*, Minkowski-*p* does not\n\n        satisfy the triangle inequality and hence is not a valid\n\n        distance metric.\n\n\n\n   Returns:\n\n      **d** (*float*) -- The Minkowski-*p* distance between **x** and\n\n      **y**.\n", "class_name": "numpy_ml.utils.distance_metrics.minkowski", "class_link": "numpy_ml/utils/distance_metrics.py#L79-L106", "test_file_path": "numpy_ml/tests/test_minkowski.py"}
+{"title": "Embedding", "class_annotation": "numpy_ml.neural_nets.layers.Embedding(n_out, vocab_size, pool=None, init='glorot_uniform', optimizer=None)", "comment": "\"Embedding\"\n\n***********\n\n\n\nclass numpy_ml.neural_nets.layers.Embedding(n_out, vocab_size, pool=None, init='glorot_uniform', optimizer=None)\n\n\n\n   Bases: \"LayerBase\"\n\n\n\n   An embedding layer.\n\n\n\n   -[ Notes ]-\n\n\n\n   Equations:\n\n\n\n      Y = W[x]\n\n\n\n   NB. This layer must be the first in a neural network as the\n\n   gradients do not get passed back through to the inputs.\n\n\n\n   Parameters:\n\n      * **n_out** (*int*) -- The dimensionality of the embeddings\n\n\n\n      * **vocab_size** (*int*) -- The total number of items in the\n\n        vocabulary. All integer indices are expected to range between\n\n        0 and *vocab_size - 1*.\n\n\n\n      * **pool** (*{'sum'**, **'mean'**, **None}*) -- If not None,\n\n        apply this function to the collection of *n_in* encodings in\n\n        each example to produce a single, pooled embedding. Default is\n\n        None.\n\n\n\n      * **init** (*{'glorot_normal'**, **'glorot_uniform'**,\n\n        **'he_normal'**, **'he_uniform'}*) -- The weight\n\n        initialization strategy. Default is *'glorot_uniform'*.\n\n\n\n      * **optimizer** (str, Optimizer object, or None) -- The\n\n        optimization strategy to use when performing gradient updates\n\n        within the \"update()\" method.  If None, use the \"SGD\"\n\n        optimizer with default parameters. Default is None.\n\n\n\n   Variables:\n\n      * **X** (*list*) -- Running list of inputs to the \"forward\"\n\n        method since the last call to \"update\". Only updated if the\n\n        *retain_derived* argument was set to True.\n\n\n\n      * **gradients** (*dict*) -- Dictionary of loss gradients with\n\n        regard to the layer parameters\n\n\n\n      * **parameters** (*dict*) -- Dictionary of layer parameters\n\n\n\n      * **hyperparameters** (*dict*) -- Dictionary of layer\n\n        hyperparameters\n\n\n\n      * **derived_variables** (*dict*) -- Dictionary of any\n\n        intermediate values computed during forward/backward\n\n        propagation.\n\n\n\n   property hyperparameters\n\n\n\n      Return a dictionary containing the layer hyperparameters.\n\n\n\n   lookup(ids)\n\n\n\n      Return the embeddings associated with the IDs in *ids*.\n\n\n\n      Parameters:\n\n         **word_ids** (\"ndarray\" of shape (*M*,)) -- An array of *M*\n\n         IDs to retrieve embeddings for.\n\n\n\n      Returns:\n\n         **embeddings** (\"ndarray\" of shape (*M*, *n_out*)) -- The\n\n         embedding vectors for each of the *M* IDs.\n\n\n\n   forward(X, retain_derived=True)\n\n\n\n      Compute the layer output on a single minibatch.\n\n\n\n      -[ Notes ]-\n\n\n\n      Equations:\n\n         Y = W[x]\n\n\n\n      Parameters:\n\n         * **X** (\"ndarray\" of shape *(n_ex, n_in)* or list of length\n\n           *n_ex*) -- Layer input, representing a minibatch of *n_ex*\n\n           examples. If \"self.pool\" is None, each example must consist\n\n           of exactly *n_in* integer token IDs. Otherwise, *X* can be\n\n           a ragged array, with each example consisting of a variable\n\n           number of token IDs.\n\n\n\n         * **retain_derived** (*bool*) -- Whether to retain the\n\n           variables calculated during the forward pass for use later\n\n           during backprop. If False, this suggests the layer will not\n\n           be expected to backprop through with regard to this input.\n\n           Default is True.\n\n\n\n      Returns:\n\n         **Y** (\"ndarray\" of shape *(n_ex, n_in, n_out)*) --\n\n         Embeddings for each coordinate of each of the *n_ex* examples\n\n\n\n   backward(dLdy, retain_grads=True)\n\n\n\n      Backprop from layer outputs to embedding weights.\n\n\n\n      -[ Notes ]-\n\n\n\n      Because the items in *X* are interpreted as indices, we cannot\n\n      compute the gradient of the layer output wrt. *X*.\n\n\n\n      Parameters:\n\n         * **dLdy** (\"ndarray\" of shape *(n_ex, n_in, n_out)* or list\n\n           of arrays) -- The gradient(s) of the loss wrt. the layer\n\n           output(s)\n\n\n\n         * **retain_grads** (*bool*) -- Whether to include the\n\n           intermediate parameter gradients computed during the\n\n           backward pass in the final parameter update. Default is\n\n           True.\n", "class_name": "numpy_ml.neural_nets.layers.Embedding", "class_link": "numpy_ml/neural_nets/layers/layers.py#L1811-L2007", "test_file_path": "numpy_ml/tests/test_Embedding.py"}
+{"title": "LayerNorm2D", "class_annotation": "numpy_ml.neural_nets.layers.LayerNorm2D(epsilon=1e-05, optimizer=None)", "comment": "\"LayerNorm2D\"\n\n*************\n\n\n\nclass numpy_ml.neural_nets.layers.LayerNorm2D(epsilon=1e-05, optimizer=None)\n\n\n\n   Bases: \"LayerBase\"\n\n\n\n   A layer normalization layer for 2D inputs with an additional\n\n   channel dimension.\n\n\n\n   -[ Notes ]-\n\n\n\n   In contrast to \"BatchNorm2D\", the LayerNorm layer calculates the\n\n   mean and variance across *features* rather than examples in the\n\n   batch ensuring that the mean and variance estimates are independent\n\n   of batch size and permitting straightforward application in RNNs.\n\n\n\n   Equations [train & test]:\n\n\n\n      Y = scaler * norm(X) + intercept\n\n      norm(X) = (X - mean(X)) / sqrt(var(X) + epsilon)\n\n\n\n   Also in contrast to \"BatchNorm2D\", *scaler* and *intercept* are\n\n   applied *elementwise* to \"norm(X)\".\n\n\n\n   Parameters:\n\n      * **epsilon** (*float*) -- A small smoothing constant to use\n\n        during computation of \"norm(X)\" to avoid divide-by-zero\n\n        errors. Default is 1e-5.\n\n\n\n      * **optimizer** (str, Optimizer object, or None) -- The\n\n        optimization strategy to use when performing gradient updates\n\n        within the \"update()\" method.  If None, use the \"SGD\"\n\n        optimizer with default parameters. Default is None.\n\n\n\n   Variables:\n\n      * **X** (*list*) -- Running list of inputs to the \"forward\"\n\n        method since the last call to \"update\". Only updated if the\n\n        *retain_derived* argument was set to True.\n\n\n\n      * **gradients** (*dict*) -- Dictionary of loss gradients with\n\n        regard to the layer parameters\n\n\n\n      * **parameters** (*dict*) -- Dictionary of layer parameters\n\n\n\n      * **hyperparameters** (*dict*) -- Dictionary of layer\n\n        hyperparameters\n\n\n\n      * **derived_variables** (*dict*) -- Dictionary of any\n\n        intermediate values computed during forward/backward\n\n        propagation.\n\n\n\n   property hyperparameters\n\n\n\n      Return a dictionary containing the layer hyperparameters.\n\n\n\n   forward(X, retain_derived=True)\n\n\n\n      Compute the layer output on a single minibatch.\n\n\n\n      -[ Notes ]-\n\n\n\n      Equations [train & test]:\n\n\n\n         Y = scaler * norm(X) + intercept\n\n         norm(X) = (X - mean(X)) / sqrt(var(X) + epsilon)\n\n\n\n      Parameters:\n\n         * **X** (\"ndarray\" of shape *(n_ex, in_rows, in_cols,\n\n           in_ch)*) -- Input volume containing the *in_rows* by\n\n           *in_cols*-dimensional features for a minibatch of *n_ex*\n\n           examples.\n\n\n\n         * **retain_derived** (*bool*) -- Whether to retain the\n\n           variables calculated during the forward pass for use later\n\n           during backprop. If False, this suggests the layer will not\n\n           be expected to backprop through wrt. this input. Default is\n\n           True.\n\n\n\n      Returns:\n\n         **Y** (\"ndarray\" of shape *(n_ex, in_rows, in_cols, in_ch)*)\n\n         -- Layer output for each of the *n_ex* examples.\n\n\n\n   backward(dLdy, retain_grads=True)\n\n\n\n      Backprop from layer outputs to inputs.\n\n\n\n      Parameters:\n\n         * **dLdY** (\"ndarray\" of shape *(n_ex, in_rows, in_cols,\n\n           in_ch)*) -- The gradient of the loss wrt. the layer output\n\n           *Y*.\n\n\n\n         * **retain_grads** (*bool*) -- Whether to include the\n\n           intermediate parameter gradients computed during the\n\n           backward pass in the final parameter update. Default is\n\n           True.\n\n\n\n      Returns:\n\n         **dX** (\"ndarray\" of shape *(n_ex, in_rows, in_cols, in_ch)*)\n\n         -- The gradient of the loss wrt. the layer input *X*.\n", "class_name": "numpy_ml.neural_nets.layers.LayerNorm2D", "class_link": "numpy_ml/neural_nets/layers/layers.py#L1444-L1631", "test_file_path": "numpy_ml/tests/test_LayerNorm2D.py"}
+{"title": "SELU", "class_annotation": "numpy_ml.neural_nets.activations.SELU", "comment": "\"SELU\"\n\n******\n\n\n\nclass numpy_ml.neural_nets.activations.SELU\n\n\n\n   A scaled exponential linear unit (SELU).\n\n\n\n   -[ Notes ]-\n\n\n\n   SELU units, when used in conjunction with proper weight\n\n   initialization and regularization techniques, encourage neuron\n\n   activations to converge to zero-mean and unit variance without\n\n   explicit use of e.g., batchnorm.\n\n\n\n   For SELU units, the \\alpha and \\text{scale} values are constants\n\n   chosen so that the mean and variance of the inputs are preserved\n\n   between consecutive layers. As such the authors propose weights be\n\n   initialized using Lecun-Normal initialization: w_{ij} \\sim\n\n   \\mathcal{N}(0, 1 / \\text{fan_in}), and to use the dropout variant\n\n   \\alpha-dropout during regularization. [*]\n\n\n\n   See the reference for more information (especially the appendix ;-)\n\n   ).\n\n\n\n   -[ References ]-\n\n\n\n   [*] Klambauer, G., Unterthiner, T., & Hochreiter, S. (2017). \"Self-\n\n       normalizing neural networks.\" *Advances in Neural Information\n\n       Processing Systems, 30.*\n\n\n\n   Initialize the ActivationBase object\n\n\n\n   fn(z)\n\n\n\n      Evaluate the SELU activation on the elements of input *z*.\n\n\n\n         \\text{SELU}(z_i)  =  \\text{scale} \\times \\text{ELU}(z_i,\n\n         \\alpha)\n\n\n\n      which is simply\n\n\n\n         \\text{SELU}(z_i)     &= \\text{scale} \\times z_i \\ \\ \\ \\\n\n         &&\\text{if }z_i > 0 \\\\     &= \\text{scale} \\times \\alpha\n\n         (e^{z_i} - 1) \\ \\ \\ \\ &&\\text{otherwise}\n\n\n\n   grad(x)\n\n\n\n      Evaluate the first derivative of the SELU activation on the\n\n      elements of input *x*.\n\n\n\n         \\frac{\\partial \\text{SELU}}{\\partial x_i}     &=\n\n         \\text{scale} \\ \\ \\ \\ &&\\text{if } x_i > 0 \\\\     &=\n\n         \\text{scale} \\times \\alpha e^{x_i} \\ \\ \\ \\ &&\\text{otherwise}\n\n\n\n   grad2(x)\n\n\n\n      Evaluate the second derivative of the SELU activation on the\n\n      elements of input *x*.\n\n\n\n         \\frac{\\partial^2 \\text{SELU}}{\\partial x_i^2}     &=  0 \\ \\ \\\n\n         \\ &&\\text{if } x_i > 0 \\\\     &=  \\text{scale} \\times \\alpha\n\n         e^{x_i} \\ \\ \\ \\ &&\\text{otherwise}\n", "class_name": "numpy_ml.neural_nets.activations.SELU", "class_link": "numpy_ml/neural_nets/activations/activations.py#L532-L612", "test_file_path": "numpy_ml/tests/test_SELU.py"}
+{"title": "Softmax", "class_annotation": "numpy_ml.neural_nets.layers.Softmax(dim=-1, optimizer=None)", "comment": "\"Softmax\"\n\n*********\n\n\n\nclass numpy_ml.neural_nets.layers.Softmax(dim=-1, optimizer=None)\n\n\n\n   Bases: \"LayerBase\"\n\n\n\n   A softmax nonlinearity layer.\n\n\n\n   -[ Notes ]-\n\n\n\n   This is implemented as a layer rather than an activation primarily\n\n   because it requires retaining the layer input in order to compute\n\n   the softmax gradients properly. In other words, in contrast to\n\n   other simple activations, the softmax function and its gradient are\n\n   not computed elementwise, and thus are more easily expressed as a\n\n   layer.\n\n\n\n   The softmax function computes:\n\n\n\n      y_i = \\frac{e^{x_i}}{\\sum_j e^{x_j}}\n\n\n\n   where x_i is the *i* th element of input example **x**.\n\n\n\n   Parameters:\n\n      * **dim** (*int*) -- The dimension in *X* along which the\n\n        softmax will be computed. Default is -1.\n\n\n\n      * **optimizer** (str, Optimizer object, or None) -- The\n\n        optimization strategy to use when performing gradient updates\n\n        within the \"update()\" method.  If None, use the \"SGD\"\n\n        optimizer with default parameters. Default is None. Unused for\n\n        this layer.\n\n\n\n   Variables:\n\n      * **X** (*list*) -- Running list of inputs to the \"forward\"\n\n        method since the last call to \"update\". Only updated if the\n\n        *retain_derived* argument was set to True.\n\n\n\n      * **gradients** (*dict*) -- Dictionary of loss gradients with\n\n        regard to the layer parameters\n\n\n\n      * **parameters** (*dict*) -- Dictionary of layer parameters\n\n\n\n      * **hyperparameters** (*dict*) -- Dictionary of layer\n\n        hyperparameters\n\n\n\n      * **derived_variables** (*dict*) -- Dictionary of any\n\n        intermediate values computed during forward/backward\n\n        propagation.\n\n\n\n   property hyperparameters\n\n\n\n      Return a dictionary containing the layer hyperparameters.\n\n\n\n   forward(X, retain_derived=True)\n\n\n\n      Compute the layer output on a single minibatch.\n\n\n\n      Parameters:\n\n         * **X** (\"ndarray\" of shape *(n_ex, n_in)*) -- Layer input,\n\n           representing the *n_in*-dimensional features for a\n\n           minibatch of *n_ex* examples.\n\n\n\n         * **retain_derived** (*bool*) -- Whether to retain the\n\n           variables calculated during the forward pass for use later\n\n           during backprop. If False, this suggests the layer will not\n\n           be expected to backprop through wrt. this input. Default is\n\n           True.\n\n\n\n      Returns:\n\n         **Y** (\"ndarray\" of shape *(n_ex, n_out)*) -- Layer output\n\n         for each of the *n_ex* examples.\n\n\n\n   backward(dLdy, retain_grads=True)\n\n\n\n      Backprop from layer outputs to inputs.\n\n\n\n      Parameters:\n\n         * **dLdy** (\"ndarray\" of shape *(n_ex, n_out)* or list of\n\n           arrays) -- The gradient(s) of the loss wrt. the layer\n\n           output(s).\n\n\n\n         * **retain_grads** (*bool*) -- Whether to include the\n\n           intermediate parameter gradients computed during the\n\n           backward pass in the final parameter update. Default is\n\n           True.\n\n\n\n      Returns:\n\n         **dLdX** (\"ndarray\" of shape *(n_ex, n_in)*) -- The gradient\n\n         of the loss wrt. the layer input *X*.\n", "class_name": "numpy_ml.neural_nets.layers.Softmax", "class_link": "numpy_ml/neural_nets/layers/layers.py#L2192-L2349", "test_file_path": "numpy_ml/tests/test_Softmax.py"}
+{"title": "euclidean", "class_annotation": "numpy_ml.utils.distance_metrics.euclidean(x, y)", "comment": "\"euclidean\"\n\n***********\n\n\n\nnumpy_ml.utils.distance_metrics.euclidean(x, y)\n\n\n\n   Compute the Euclidean (*L2*) distance between two real vectors\n\n\n\n   -[ Notes ]-\n\n\n\n   The Euclidean distance between two vectors **x** and **y** is\n\n\n\n      d(\\mathbf{x}, \\mathbf{y}) = \\sqrt{ \\sum_i (x_i - y_i)^2  }\n\n\n\n   Parameters:\n\n      * **x** (\"ndarray\" s of shape *(N,)*) -- The two vectors to\n\n        compute the distance between\n\n\n\n      * **y** (\"ndarray\" s of shape *(N,)*) -- The two vectors to\n\n        compute the distance between\n\n\n\n   Returns:\n\n      **d** (*float*) -- The L2 distance between **x** and **y**.\n", "class_name": "numpy_ml.utils.distance_metrics.euclidean", "class_link": "numpy_ml/utils/distance_metrics.py#L4-L26", "test_file_path": "numpy_ml/tests/test_euclidean.py"}
+{"title": "LinearKernel", "class_annotation": "numpy_ml.utils.kernels.LinearKernel(c0=0)", "comment": "\"LinearKernel\"\n\n**************\n\n\n\nclass numpy_ml.utils.kernels.LinearKernel(c0=0)\n\n\n\n   The linear (i.e., dot-product) kernel.\n\n\n\n   -[ Notes ]-\n\n\n\n   For input vectors \\mathbf{x} and \\mathbf{y}, the linear kernel is:\n\n\n\n      k(\\mathbf{x}, \\mathbf{y}) = \\mathbf{x}^\\top \\mathbf{y} + c_0\n\n\n\n   Parameters:\n\n      **c0** (*float*) -- An \"inhomogeneity\" parameter. When *c0* = 0,\n\n      the kernel is said to be homogenous. Default is 1.\n\n\n\n   set_params(summary_dict)\n\n\n\n      Set the model parameters and hyperparameters using the settings\n\n      in *summary_dict*.\n\n\n\n      Parameters:\n\n         **summary_dict** (*dict*) -- A dictionary with keys\n\n         'parameters' and 'hyperparameters', structured as would be\n\n         returned by the \"summary()\" method. If a particular\n\n         (hyper)parameter is not included in this dict, the current\n\n         value will be used.\n\n\n\n      Returns:\n\n         **new_kernel** (Kernel instance) -- A kernel with parameters\n\n         and hyperparameters adjusted to those specified in\n\n         *summary_dict*.\n\n\n\n   summary()\n\n\n\n      Return the dictionary of model parameters, hyperparameters, and\n\n      ID\n", "class_name": "numpy_ml.utils.kernels.LinearKernel", "class_link": "numpy_ml/utils/kernels.py#L72-L116", "test_file_path": "numpy_ml/tests/test_LinearKernel.py"}
+{"title": "GELU", "class_annotation": "numpy_ml.neural_nets.activations.GELU(approximate=True)", "comment": "\"GELU\"\n\n******\n\n\n\nclass numpy_ml.neural_nets.activations.GELU(approximate=True)\n\n\n\n   A Gaussian error linear unit (GELU). [*]\n\n\n\n   -[ Notes ]-\n\n\n\n   A ReLU alternative. GELU weights inputs by their value, rather than\n\n   gates inputs by their sign, as in vanilla ReLUs.\n\n\n\n   -[ References ]-\n\n\n\n   [*] Hendrycks, D., & Gimpel, K. (2016). \"Bridging nonlinearities\n\n       and stochastic regularizers with Gaussian error linear units.\"\n\n       *CoRR*.\n\n\n\n   Parameters:\n\n      **approximate** (*bool*) -- Whether to use a faster but less\n\n      precise approximation to the Gauss error function when\n\n      calculating the unit activation and gradient. Default is True.\n\n\n\n   fn(z)\n\n\n\n      Compute the GELU function on the elements of input *z*.\n\n\n\n         \\text{GELU}(z_i) = z_i P(Z \\leq z_i) = z_i \\Phi(z_i)     =\n\n         z_i \\cdot \\frac{1}{2}(1 + \\text{erf}(x/\\sqrt{2}))\n\n\n\n   grad(x)\n\n\n\n      Evaluate the first derivative of the GELU function on the\n\n      elements of input *x*.\n\n\n\n         \\frac{\\partial \\text{GELU}}{\\partial x_i}  =     \\frac{1}{2}\n\n         + \\frac{1}{2}\\left(\\text{erf}(\\frac{x}{\\sqrt{2}}) +\n\n         \\frac{x + \\text{erf}'(\\frac{x}{\\sqrt{2}})}{\\sqrt{2}}\\right)\n\n\n\n      where \\text{erf}'(x) = \\frac{2}{\\sqrt{\\pi}} \\cdot \\exp\\{-x^2\\}.\n\n\n\n   grad2(x)\n\n\n\n      Evaluate the second derivative of the GELU function on the\n\n      elements of input *x*.\n\n\n\n         \\frac{\\partial^2 \\text{GELU}}{\\partial x_i^2} =\n\n         \\frac{1}{2\\sqrt{2}} \\left\\[\n\n         \\text{erf}'(\\frac{x}{\\sqrt{2}}) +         \\frac{1}{\\sqrt{2}}\n\n         \\text{erf}''(\\frac{x}{\\sqrt{2}})     \\right]\n\n\n\n      where \\text{erf}'(x) = \\frac{2}{\\sqrt{\\pi}} \\cdot \\exp\\{-x^2\\}\n\n      and \\text{erf}''(x) = \\frac{-4x}{\\sqrt{\\pi}} \\cdot \\exp\\{-x^2\\}.\n", "class_name": "numpy_ml.neural_nets.activations.GELU", "class_link": "numpy_ml/neural_nets/activations/activations.py#L210-L301", "test_file_path": "numpy_ml/tests/test_GELU.py"}
+{"title": "LeakyReLU", "class_annotation": "numpy_ml.neural_nets.activations.LeakyReLU(alpha=0.3)", "comment": "\"LeakyReLU\"\n\n***********\n\n\n\nclass numpy_ml.neural_nets.activations.LeakyReLU(alpha=0.3)\n\n\n\n   'Leaky' version of a rectified linear unit (ReLU).\n\n\n\n   -[ Notes ]-\n\n\n\n   Leaky ReLUs [*] are designed to address the vanishing gradient\n\n   problem in ReLUs by allowing a small non-zero gradient when *x* is\n\n   negative.\n\n\n\n   Parameters:\n\n      **alpha** (*float*) -- Activation slope when x < 0. Default is\n\n      0.3.\n\n\n\n   -[ References ]-\n\n\n\n   [*] Mass, L. M., Hannun, A. Y, & Ng, A. Y. (2013). \"Rectifier\n\n       nonlinearities improve neural network acoustic models.\"\n\n       *Proceedings of the 30th International Conference of Machine\n\n       Learning, 30*.\n\n\n\n   Initialize the ActivationBase object\n\n\n\n   fn(z)\n\n\n\n      Evaluate the leaky ReLU function on the elements of input *z*.\n\n\n\n         \\text{LeakyReLU}(z_i)     &=  z_i \\ \\ \\ \\ &&\\text{if } z_i >\n\n         0 \\\\     &=  \\alpha z_i \\ \\ \\ \\ &&\\text{otherwise}\n\n\n\n   grad(x)\n\n\n\n      Evaluate the first derivative of the leaky ReLU function on the\n\n      elements of input *x*.\n\n\n\n         \\frac{\\partial \\text{LeakyReLU}}{\\partial x_i}     &=  1 \\ \\\n\n         \\ \\ &&\\text{if }x_i > 0 \\\\     &=  \\alpha \\ \\ \\ \\\n\n         &&\\text{otherwise}\n\n\n\n   grad2(x)\n\n\n\n      Evaluate the second derivative of the leaky ReLU function on the\n\n      elements of input *x*.\n\n\n\n         \\frac{\\partial^2 \\text{LeakyReLU}}{\\partial x_i^2}  =  0\n", "class_name": "numpy_ml.neural_nets.activations.LeakyReLU", "class_link": "numpy_ml/neural_nets/activations/activations.py#L140-L207", "test_file_path": "numpy_ml/tests/test_LeakyReLU.py"}
+{"title": "Conv1D", "class_annotation": "numpy_ml.neural_nets.layers.Conv1D(out_ch, kernel_width, pad=0, stride=1, dilation=0, act_fn=None, init='glorot_uniform', optimizer=None)", "comment": "\"Conv1D\"\n\n********\n\n\n\nclass numpy_ml.neural_nets.layers.Conv1D(out_ch, kernel_width, pad=0, stride=1, dilation=0, act_fn=None, init='glorot_uniform', optimizer=None)\n\n\n\n   Bases: \"LayerBase\"\n\n\n\n   Apply a one-dimensional convolution kernel over an input volume.\n\n\n\n   -[ Notes ]-\n\n\n\n   Equations:\n\n\n\n      out = act_fn(pad(X) * W + b)\n\n      out_dim = floor(1 + (n_rows_in + pad_left + pad_right - kernel_width) / stride)\n\n\n\n   where '***' denotes the cross-correlation operation with stride *s*\n\n   and dilation *d*.\n\n\n\n   Parameters:\n\n      * **out_ch** (*int*) -- The number of filters/kernels to compute\n\n        in the current layer\n\n\n\n      * **kernel_width** (*int*) -- The width of a single 1D\n\n        filter/kernel in the current layer\n\n\n\n      * **act_fn** (str, Activation object, or None) -- The activation\n\n        function for computing \"Y[t]\". If None, use the identity\n\n        function f(x) = x by default. Default is None.\n\n\n\n      * **pad** (*int**, **tuple**, or **{'same'**, **'causal'}*) --\n\n        The number of rows/columns to zero-pad the input with. If\n\n        *'same'*, calculate padding to ensure the output length\n\n        matches in the input length. If *'causal'* compute padding\n\n        such that the output both has the same length as the input AND\n\n        \"output[t]\" does not depend on \"input[t + 1:]\". Default is 0.\n\n\n\n      * **stride** (*int*) -- The stride/hop of the convolution\n\n        kernels as they move over the input volume. Default is 1.\n\n\n\n      * **dilation** (*int*) -- Number of pixels inserted between\n\n        kernel elements. Effective kernel shape after dilation is:\n\n        \"[kernel_rows * (d + 1) - d, kernel_cols * (d + 1) - d]\".\n\n        Default is 0.\n\n\n\n      * **init** (*{'glorot_normal'**, **'glorot_uniform'**,\n\n        **'he_normal'**, **'he_uniform'}*) -- The weight\n\n        initialization strategy. Default is *'glorot_uniform'*.\n\n\n\n      * **optimizer** (str, Optimizer object, or None) -- The\n\n        optimization strategy to use when performing gradient updates\n\n        within the \"update()\" method.  If None, use the \"SGD\"\n\n        optimizer with default parameters. Default is None.\n\n\n\n   Variables:\n\n      * **X** (*list*) -- Running list of inputs to the \"forward\"\n\n        method since the last call to \"update\". Only updated if the\n\n        *retain_derived* argument was set to True.\n\n\n\n      * **gradients** (*dict*) -- Dictionary of loss gradients with\n\n        regard to the layer parameters\n\n\n\n      * **parameters** (*dict*) -- Dictionary of layer parameters\n\n\n\n      * **hyperparameters** (*dict*) -- Dictionary of layer\n\n        hyperparameters\n\n\n\n      * **derived_variables** (*dict*) -- Dictionary of any\n\n        intermediate values computed during forward/backward\n\n        propagation.\n\n\n\n   property hyperparameters\n\n\n\n      Return a dictionary containing the layer hyperparameters.\n\n\n\n   forward(X, retain_derived=True)\n\n\n\n      Compute the layer output given input volume *X*.\n\n\n\n      Parameters:\n\n         * **X** (\"ndarray\" of shape *(n_ex, l_in, in_ch)*) -- The\n\n           input volume consisting of *n_ex* examples, each of length\n\n           *l_in* and with *in_ch* input channels\n\n\n\n         * **retain_derived** (*bool*) -- Whether to retain the\n\n           variables calculated during the forward pass for use later\n\n           during backprop. If False, this suggests the layer will not\n\n           be expected to backprop through wrt. this input. Default is\n\n           True.\n\n\n\n      Returns:\n\n         **Y** (\"ndarray\" of shape *(n_ex, l_out, out_ch)*) -- The\n\n         layer output.\n\n\n\n   backward(dLdy, retain_grads=True)\n\n\n\n      Compute the gradient of the loss with respect to the layer\n\n      parameters.\n\n\n\n      -[ Notes ]-\n\n\n\n      Relies on \"im2col()\" and \"col2im()\" to vectorize the gradient\n\n      calculation.  See the private method \"_backward_naive()\" for a\n\n      more straightforward implementation.\n\n\n\n      Parameters:\n\n         * **dLdy** (\"ndarray\" of shape *(n_ex, l_out, out_ch)* or\n\n           list of arrays) -- The gradient(s) of the loss with respect\n\n           to the layer output(s).\n\n\n\n         * **retain_grads** (*bool*) -- Whether to include the\n\n           intermediate parameter gradients computed during the\n\n           backward pass in the final parameter update. Default is\n\n           True.\n\n\n\n      Returns:\n\n         **dX** (\"ndarray\" of shape *(n_ex, l_in, in_ch)*) -- The\n\n         gradient of the loss with respect to the layer input volume.\n", "class_name": "numpy_ml.neural_nets.layers.Conv1D", "class_link": "numpy_ml/neural_nets/layers/layers.py#L2603-L2892", "test_file_path": "numpy_ml/tests/test_Conv1D.py"}
+{"title": "SquaredError", "class_annotation": "numpy_ml.neural_nets.losses.SquaredError", "comment": "\"SquaredError\"\n\n**************\n\n\n\nclass numpy_ml.neural_nets.losses.SquaredError\n\n\n\n   A squared-error / *L2* loss.\n\n\n\n   -[ Notes ]-\n\n\n\n   For real-valued target **y** and predictions \\hat{\\mathbf{y}}, the\n\n   squared error is\n\n\n\n      \\mathcal{L}(\\mathbf{y}, \\hat{\\mathbf{y}})     = 0.5\n\n      ||\\hat{\\mathbf{y}} - \\mathbf{y}||_2^2\n\n\n\n   static loss(y, y_pred)\n\n\n\n      Compute the squared error between *y* and *y_pred*.\n\n\n\n      Parameters:\n\n         * **y** (\"ndarray\" of shape (n, m)) -- Ground truth values\n\n           for each of *n* examples\n\n\n\n         * **y_pred** (\"ndarray\" of shape (n, m)) -- Predictions for\n\n           the *n* examples in the batch.\n\n\n\n      Returns:\n\n         **loss** (*float*) -- The sum of the squared error across\n\n         dimensions and examples.\n\n\n\n   static grad(y, y_pred, z, act_fn)\n\n\n\n      Gradient of the squared error loss with respect to the pre-\n\n      nonlinearity input, *z*.\n\n\n\n      -[ Notes ]-\n\n\n\n      The current method computes the gradient \\frac{\\partial\n\n      \\mathcal{L}}{\\partial \\mathbf{z}}, where\n\n\n\n         \\mathcal{L}(\\mathbf{z})     &=\n\n         \\text{squared_error}(\\mathbf{y}, g(\\mathbf{z})) \\\\\n\n         g(\\mathbf{z})     &=  \\text{act_fn}(\\mathbf{z})\n\n\n\n      The gradient with respect to \\mathbf{z} is then\n\n\n\n         \\frac{\\partial \\mathcal{L}}{\\partial \\mathbf{z}}     =\n\n         (g(\\mathbf{z}) - \\mathbf{y}) \\left(         \\frac{\\partial\n\n         g}{\\partial \\mathbf{z}} \\right)\n\n\n\n      Parameters:\n\n         * **y** (\"ndarray\" of shape (n, m)) -- Ground truth values\n\n           for each of *n* examples.\n\n\n\n         * **y_pred** (\"ndarray\" of shape (n, m)) -- Predictions for\n\n           the *n* examples in the batch.\n\n\n\n         * **act_fn** (Activation object) -- The activation function\n\n           for the output layer of the network.\n\n\n\n      Returns:\n\n         **grad** (\"ndarray\" of shape (n, m)) -- The gradient of the\n\n         squared error loss with respect to *z*.\n", "class_name": "numpy_ml.neural_nets.losses.SquaredError", "class_link": "numpy_ml/neural_nets/losses/losses.py#L26-L107", "test_file_path": "numpy_ml/tests/test_SquaredError.py"}
+{"title": "DotProductAttention", "class_annotation": "numpy_ml.neural_nets.layers.DotProductAttention(scale=True, dropout_p=0, init='glorot_uniform', optimizer=None)", "comment": "\"DotProductAttention\"\n\n*********************\n\n\n\nclass numpy_ml.neural_nets.layers.DotProductAttention(scale=True, dropout_p=0, init='glorot_uniform', optimizer=None)\n\n\n\n   Bases: \"LayerBase\"\n\n\n\n   A single \"attention head\" layer using a dot-product for the scoring\n\n   function.\n\n\n\n   -[ Notes ]-\n\n\n\n   The equations for a dot product attention layer are:\n\n\n\n      \\mathbf{Z}  &=  \\mathbf{K Q}^\\\\top \\ \\ \\ \\ &&\\text{if scale =\n\n      False} \\\\             &=  \\mathbf{K Q}^\\top / \\sqrt{d_k} \\ \\ \\ \\\n\n      &&\\text{if scale = True} \\\\ \\mathbf{Y}  &=\n\n      \\text{dropout}(\\text{softmax}(\\mathbf{Z})) \\mathbf{V}\n\n\n\n   Parameters:\n\n      * **scale** (*bool*) -- Whether to scale the the key-query dot\n\n        product by the square root of the key/query vector\n\n        dimensionality before applying the Softmax. This is useful,\n\n        since the scale of dot product will otherwise increase as\n\n        query / key dimensions grow. Default is True.\n\n\n\n      * **dropout_p** (*float in** [**0**, **1**)*) -- The dropout\n\n        propbability during training, applied to the output of the\n\n        softmax. If 0, no dropout is applied. Default is 0.\n\n\n\n      * **init** (*{'glorot_normal'**, **'glorot_uniform'**,\n\n        **'he_normal'**, **'he_uniform'}*) -- The weight\n\n        initialization strategy. Default is *'glorot_uniform'*.\n\n        Unused.\n\n\n\n      * **optimizer** (str, Optimizer object, or None) -- The\n\n        optimization strategy to use when performing gradient updates\n\n        within the \"update()\" method.  If None, use the \"SGD\"\n\n        optimizer with default parameters. Default is None. Unused.\n\n\n\n   Variables:\n\n      * **X** (*list*) -- Running list of inputs to the \"forward\"\n\n        method since the last call to \"update\". Only updated if the\n\n        *retain_derived* argument was set to True.\n\n\n\n      * **gradients** (*dict*) -- Unused\n\n\n\n      * **parameters** (*dict*) -- Unused\n\n\n\n      * **hyperparameters** (*dict*) -- Dictionary of layer\n\n        hyperparameters\n\n\n\n      * **derived_variables** (*dict*) -- Dictionary of any\n\n        intermediate values computed during forward/backward\n\n        propagation.\n\n\n\n   property hyperparameters\n\n\n\n      Return a dictionary containing the layer hyperparameters.\n\n\n\n   freeze()\n\n\n\n      Freeze the layer parameters at their current values so they can\n\n      no longer be updated.\n\n\n\n   unfreeze()\n\n\n\n      Unfreeze the layer parameters so they can be updated.\n\n\n\n   forward(Q, K, V, retain_derived=True)\n\n\n\n      Compute the attention-weighted output of a collection of keys,\n\n      values, and queries.\n\n\n\n      -[ Notes ]-\n\n\n\n      In the most abstract (ie., hand-wave-y) sense:\n\n\n\n         * Query vectors ask questions\n\n\n\n         * Key vectors advertise their relevancy to questions\n\n\n\n         * Value vectors give possible answers to questions\n\n\n\n         * The dot product between Key and Query vectors provides\n\n           scores for each of the the *n_ex* different Value vectors\n\n\n\n      For a single query and *n* key-value pairs, dot-product\n\n      attention (with scaling) is:\n\n\n\n         w0 = dropout(softmax( (query @ key[0]) / sqrt(d_k) ))\n\n         w1 = dropout(softmax( (query @ key[1]) / sqrt(d_k) ))\n\n                                 ...\n\n         wn = dropout(softmax( (query @ key[n]) / sqrt(d_k) ))\n\n\n\n         y = np.array([w0, ..., wn]) @ values\n\n                   (1 \u00d7 n_ex)      (n_ex \u00d7 d_v)\n\n\n\n      In words, keys and queries are combined via dot-product to\n\n      produce a score, which is then passed through a softmax to\n\n      produce a weight on each value vector in Values. We elementwise\n\n      multiply each value vector by its weight, and then take the\n\n      elementwise sum of each weighted value vector to get the 1\n\n      \\times d_v output for the current example.\n\n\n\n      In vectorized form,\n\n\n\n         \\mathbf{Y} = \\text{dropout}(\n\n         \\text{softmax}(\\mathbf{KQ}^\\top / \\sqrt{d_k}) ) \\mathbf{V}\n\n\n\n      Parameters:\n\n         * **Q** (\"ndarray\" of shape *(n_ex, *, d_k)*) -- A set of\n\n           *n_ex* query vectors packed into a single matrix. Optional\n\n           middle dimensions can be used to specify, e.g., the number\n\n           of parallel attention heads.\n\n\n\n         * **K** (\"ndarray\" of shape *(n_ex, *, d_k)*) -- A set of\n\n           *n_ex* key vectors packed into a single matrix. Optional\n\n           middle dimensions can be used to specify, e.g., the number\n\n           of parallel attention heads.\n\n\n\n         * **V** (\"ndarray\" of shape *(n_ex, *, d_v)*) -- A set of\n\n           *n_ex* value vectors packed into a single matrix. Optional\n\n           middle dimensions can be used to specify, e.g., the number\n\n           of parallel attention heads.\n\n\n\n         * **retain_derived** (*bool*) -- Whether to retain the\n\n           variables calculated during the forward pass for use later\n\n           during backprop. If False, this suggests the layer will not\n\n           be expected to backprop through wrt. this input. Default is\n\n           True.\n\n\n\n      Returns:\n\n         **Y** (\"ndarray\" of shape *(n_ex, *, d_v)*) -- The attention-\n\n         weighted output values\n\n\n\n   backward(dLdy, retain_grads=True)\n\n\n\n      Backprop from layer outputs to inputs.\n\n\n\n      Parameters:\n\n         * **dLdY** (\"ndarray\" of shape *(n_ex, *, d_v)*) -- The\n\n           gradient of the loss wrt. the layer output *Y*\n\n\n\n         * **retain_grads** (*bool*) -- Whether to include the\n\n           intermediate parameter gradients computed during the\n\n           backward pass in the final parameter update. Default is\n\n           True.\n\n\n\n      Returns:\n\n         * **dQ** (\"ndarray\" of shape *(n_ex, *, d_k)* or list of\n\n           arrays) -- The gradient of the loss wrt. the layer query\n\n           matrix/matrices *Q*.\n\n\n\n         * **dK** (\"ndarray\" of shape *(n_ex, *, d_k)* or list of\n\n           arrays) -- The gradient of the loss wrt. the layer key\n\n           matrix/matrices *K*.\n\n\n\n         * **dV** (\"ndarray\" of shape *(n_ex, *, d_v)* or list of\n\n           arrays) -- The gradient of the loss wrt. the layer value\n\n           matrix/matrices *V*.\n", "class_name": "numpy_ml.neural_nets.layers.DotProductAttention", "class_link": "numpy_ml/neural_nets/layers/layers.py#L139-L362", "test_file_path": "numpy_ml/tests/test_DotProductAttention.py"}
+{"title": "Pool2D", "class_annotation": "numpy_ml.neural_nets.layers.Pool2D(kernel_shape, stride=1, pad=0, mode='max', optimizer=None)", "comment": "\"Pool2D\"\n\n********\n\n\n\nclass numpy_ml.neural_nets.layers.Pool2D(kernel_shape, stride=1, pad=0, mode='max', optimizer=None)\n\n\n\n   Bases: \"LayerBase\"\n\n\n\n   A single two-dimensional pooling layer.\n\n\n\n   Parameters:\n\n      * **kernel_shape** (*2-tuple*) -- The dimension of a single 2D\n\n        filter/kernel in the current layer\n\n\n\n      * **stride** (*int*) -- The stride/hop of the convolution\n\n        kernels as they move over the input volume. Default is 1.\n\n\n\n      * **pad** (*int**, **tuple**, or **'same'*) -- The number of\n\n        rows/columns of 0's to pad the input. Default is 0.\n\n\n\n      * **mode** (*{\"max\"**, **\"average\"}*) -- The pooling function to\n\n        apply.\n\n\n\n      * **optimizer** (str, Optimizer object, or None) -- The\n\n        optimization strategy to use when performing gradient updates\n\n        within the \"update()\" method.  If None, use the \"SGD\"\n\n        optimizer with default parameters. Default is None.\n\n\n\n   property hyperparameters\n\n\n\n      Return a dictionary containing the layer hyperparameters.\n\n\n\n   forward(X, retain_derived=True)\n\n\n\n      Compute the layer output given input volume *X*.\n\n\n\n      Parameters:\n\n         * **X** (\"ndarray\" of shape *(n_ex, in_rows, in_cols,\n\n           in_ch)*) -- The input volume consisting of *n_ex* examples,\n\n           each with dimension (*in_rows*,`in_cols`, *in_ch*)\n\n\n\n         * **retain_derived** (*bool*) -- Whether to retain the\n\n           variables calculated during the forward pass for use later\n\n           during backprop. If False, this suggests the layer will not\n\n           be expected to backprop through wrt. this input. Default is\n\n           True.\n\n\n\n      Returns:\n\n         **Y** (\"ndarray\" of shape *(n_ex, out_rows, out_cols,\n\n         out_ch)*) -- The layer output.\n\n\n\n   backward(dLdY, retain_grads=True)\n\n\n\n      Backprop from layer outputs to inputs\n\n\n\n      Parameters:\n\n         * **dLdY** (\"ndarray\" of shape *(n_ex, in_rows, in_cols,\n\n           in_ch)*) -- The gradient of the loss wrt. the layer output\n\n           *Y*.\n\n\n\n         * **retain_grads** (*bool*) -- Whether to include the\n\n           intermediate parameter gradients computed during the\n\n           backward pass in the final parameter update. Default is\n\n           True.\n\n\n\n      Returns:\n\n         **dX** (\"ndarray\" of shape *(n_ex, in_rows, in_cols, in_ch)*)\n\n         -- The gradient of the loss wrt. the layer input *X*.\n", "class_name": "numpy_ml.neural_nets.layers.Pool2D", "class_link": "numpy_ml/neural_nets/layers/layers.py#L3181-L3350", "test_file_path": "numpy_ml/tests/test_Pool2D.py"}
+{"title": "GaussianNBClassifier", "class_annotation": "numpy_ml.linear_models.GaussianNBClassifier(eps=1e-06)", "comment": "\"GaussianNBClassifier\"\n\n**********************\n\n\n\nclass numpy_ml.linear_models.GaussianNBClassifier(eps=1e-06)\n\n\n\n   A naive Bayes classifier for real-valued data.\n\n\n\n   -[ Notes ]-\n\n\n\n   The naive Bayes model assumes the features of each training example\n\n   \\mathbf{x} are mutually independent given the example label *y*:\n\n\n\n      P(\\mathbf{x}_i \\mid y_i) = \\prod_{j=1}^M P(x_{i,j} \\mid y_i)\n\n\n\n   where M is the rank of the i^{th} example \\mathbf{x}_i and y_i is\n\n   the label associated with the i^{th} example.\n\n\n\n   Combining the conditional independence assumption with a simple\n\n   application of Bayes' theorem gives the naive Bayes classification\n\n   rule:\n\n\n\n      \\hat{y} &= \\arg \\max_y P(y \\mid \\mathbf{x}) \\\\         &= \\arg\n\n      \\max_y  P(y) P(\\mathbf{x} \\mid y) \\\\         &= \\arg \\max_y\n\n      P(y) \\prod_{j=1}^M P(x_j \\mid y)\n\n\n\n   In the final expression, the prior class probability P(y) can be\n\n   specified in advance or estimated empirically from the training\n\n   data.\n\n\n\n   In the Gaussian version of the naive Bayes model, the feature\n\n   likelihood is assumed to be normally distributed for each class:\n\n\n\n      \\mathbf{x}_i \\mid y_i = c, \\theta \\sim \\mathcal{N}(\\mu_c,\n\n      \\Sigma_c)\n\n\n\n   where \\theta is the set of model parameters: \\{\\mu_1, \\Sigma_1,\n\n   \\ldots, \\mu_K, \\Sigma_K\\}, K is the total number of unique classes\n\n   present in the data, and the parameters for the Gaussian associated\n\n   with class c, \\mu_c and \\Sigma_c (where 1 \\leq c \\leq K), are\n\n   estimated via MLE from the set of training examples with label c.\n\n\n\n   Parameters:\n\n      **eps** (*float*) -- A value added to the variance to prevent\n\n      numerical error. Default is 1e-6.\n\n\n\n   Variables:\n\n      * **parameters** (*dict*) -- Dictionary of model parameters:\n\n        \"mean\", the *(K, M)* array of feature means under each class,\n\n        \"sigma\", the *(K, M)* array of feature variances under each\n\n        class, and \"prior\", the *(K,)* array of empirical prior\n\n        probabilities for each class label.\n\n\n\n      * **hyperparameters** (*dict*) -- Dictionary of model\n\n        hyperparameters\n\n\n\n      * **labels** (\"ndarray\" of shape *(K,)*) -- An array containing\n\n        the unique class labels for the training examples.\n\n\n\n   fit(X, y)\n\n\n\n      Fit the model parameters via maximum likelihood.\n\n\n\n      -[ Notes ]-\n\n\n\n      The model parameters are stored in the \"parameters\" attribute.\n\n      The following keys are present:\n\n\n\n         \"mean\": \"ndarray\" of shape *(K, M)*\n\n            Feature means for each of the *K* label classes\n\n\n\n         \"sigma\": \"ndarray\" of shape *(K, M)*\n\n            Feature variances for each of the *K* label classes\n\n\n\n         \"prior\": \"ndarray\" of shape *(K,)*\n\n            Prior probability of each of the *K* label classes,\n\n            estimated empirically from the training data\n\n\n\n      Parameters:\n\n         * **X** (\"ndarray\" of shape *(N, M)*) -- A dataset consisting\n\n           of *N* examples, each of dimension *M*\n\n\n\n         * **y** (\"ndarray\" of shape *(N,)*) -- The class label for\n\n           each of the *N* examples in *X*\n\n\n\n      Returns:\n\n         **self** (\"GaussianNBClassifier\" instance)\n\n\n\n   predict(X)\n\n\n\n      Use the trained classifier to predict the class label for each\n\n      example in **X**.\n\n\n\n      Parameters:\n\n         **X** (\"ndarray\" of shape *(N, M)*) -- A dataset of *N*\n\n         examples, each of dimension *M*\n\n\n\n      Returns:\n\n         **labels** (\"ndarray\" of shape *(N)*) -- The predicted class\n\n         labels for each example in *X*\n", "class_name": "numpy_ml.linear_models.GaussianNBClassifier", "class_link": "numpy_ml/linear_models/naive_bayes.py#L5-L214", "test_file_path": "numpy_ml/tests/test_GaussianNBClassifier.py"}
+{"title": "RBFKernel", "class_annotation": "numpy_ml.utils.kernels.RBFKernel(sigma=None)", "comment": "\"RBFKernel\"\n\n***********\n\n\n\nclass numpy_ml.utils.kernels.RBFKernel(sigma=None)\n\n\n\n   Radial basis function (RBF) / squared exponential kernel.\n\n\n\n   -[ Notes ]-\n\n\n\n   For input vectors \\mathbf{x} and \\mathbf{y}, the radial basis\n\n   function kernel is:\n\n\n\n      k(\\mathbf{x}, \\mathbf{y}) = \\exp \\left\\{ -0.5     \\left\\lVert\n\n      \\frac{\\mathbf{x} -         \\mathbf{y}}{\\sigma} \\right\\rVert_2^2\n\n      \\right\\}\n\n\n\n   The RBF kernel decreases with distance and ranges between zero (in\n\n   the limit) to one (when **x** = **y**). Notably, the implied\n\n   feature space of the kernel has an infinite number of dimensions.\n\n\n\n   Parameters:\n\n      **sigma** (float or array of shape *(C,)* or None) -- A scaling\n\n      parameter for the vectors **x** and **y**, producing an\n\n      isotropic kernel if a float, or an anistropic kernel if an array\n\n      of length *C*.  Larger values result in higher resolution /\n\n      greater smoothing. If None, defaults to \\sqrt(C / 2). Sometimes\n\n      referred to as the kernel 'bandwidth'. Default is None.\n\n\n\n   set_params(summary_dict)\n\n\n\n      Set the model parameters and hyperparameters using the settings\n\n      in *summary_dict*.\n\n\n\n      Parameters:\n\n         **summary_dict** (*dict*) -- A dictionary with keys\n\n         'parameters' and 'hyperparameters', structured as would be\n\n         returned by the \"summary()\" method. If a particular\n\n         (hyper)parameter is not included in this dict, the current\n\n         value will be used.\n\n\n\n      Returns:\n\n         **new_kernel** (Kernel instance) -- A kernel with parameters\n\n         and hyperparameters adjusted to those specified in\n\n         *summary_dict*.\n\n\n\n   summary()\n\n\n\n      Return the dictionary of model parameters, hyperparameters, and\n\n      ID\n", "class_name": "numpy_ml.utils.kernels.RBFKernel", "class_link": "numpy_ml/utils/kernels.py#L184-L238", "test_file_path": "numpy_ml/tests/test_RBFKernel.py"}
+{"title": "LSTMCell", "class_annotation": "numpy_ml.neural_nets.layers.LSTMCell(n_out, act_fn='Tanh', gate_fn='Sigmoid', init='glorot_uniform', optimizer=None)", "comment": "\"LSTMCell\"\n\n**********\n\n\n\nclass numpy_ml.neural_nets.layers.LSTMCell(n_out, act_fn='Tanh', gate_fn='Sigmoid', init='glorot_uniform', optimizer=None)\n\n\n\n   Bases: \"LayerBase\"\n\n\n\n   A single step of a long short-term memory (LSTM) RNN.\n\n\n\n   -[ Notes ]-\n\n\n\n   Notation:\n\n\n\n   * \"Z[t]\"  is the input to each of the gates at timestep *t*\n\n\n\n   * \"A[t]\"  is the value of the hidden state at timestep *t*\n\n\n\n   * \"Cc[t]\" is the value of the *candidate* cell/memory state at\n\n     timestep *t*\n\n\n\n   * \"C[t]\"  is the value of the *final* cell/memory state at timestep\n\n     *t*\n\n\n\n   * \"Gf[t]\" is the output of the forget gate at timestep *t*\n\n\n\n   * \"Gu[t]\" is the output of the update gate at timestep *t*\n\n\n\n   * \"Go[t]\" is the output of the output gate at timestep *t*\n\n\n\n   Equations:\n\n\n\n      Z[t]  = stack([A[t-1], X[t]])\n\n      Gf[t] = gate_fn(Wf @ Z[t] + bf)\n\n      Gu[t] = gate_fn(Wu @ Z[t] + bu)\n\n      Go[t] = gate_fn(Wo @ Z[t] + bo)\n\n      Cc[t] = act_fn(Wc @ Z[t] + bc)\n\n      C[t]  = Gf[t] * C[t-1] + Gu[t] * Cc[t]\n\n      A[t]  = Go[t] * act_fn(C[t])\n\n\n\n   where *@* indicates dot/matrix product, and '*' indicates\n\n   elementwise multiplication.\n\n\n\n   Parameters:\n\n      * **n_out** (*int*) -- The dimension of a single hidden state /\n\n        output on a given timestep.\n\n\n\n      * **act_fn** (str, Activation object, or None) -- The activation\n\n        function for computing \"A[t]\". Default is *'Tanh'*.\n\n\n\n      * **gate_fn** (str, Activation object, or None) -- The gate\n\n        function for computing the update, forget, and output gates.\n\n        Default is *'Sigmoid'*.\n\n\n\n      * **init** (*{'glorot_normal'**, **'glorot_uniform'**,\n\n        **'he_normal'**, **'he_uniform'}*) -- The weight\n\n        initialization strategy. Default is *'glorot_uniform'*.\n\n\n\n      * **optimizer** (str, Optimizer object, or None) -- The\n\n        optimization strategy to use when performing gradient updates\n\n        within the \"update()\" method.  If None, use the \"SGD\"\n\n        optimizer with default parameters. Default is None.\n\n\n\n   property hyperparameters\n\n\n\n      Return a dictionary containing the layer hyperparameters.\n\n\n\n   forward(Xt)\n\n\n\n      Compute the layer output for a single timestep.\n\n\n\n      Parameters:\n\n         **Xt** (\"ndarray\" of shape *(n_ex, n_in)*) -- Input at\n\n         timestep t consisting of *n_ex* examples each of\n\n         dimensionality *n_in*.\n\n\n\n      Returns:\n\n         * **At** (\"ndarray\" of shape *(n_ex, n_out)*) -- The value of\n\n           the hidden state at timestep *t* for each of the *n_ex*\n\n           examples.\n\n\n\n         * **Ct** (\"ndarray\" of shape *(n_ex, n_out)*) -- The value of\n\n           the cell/memory state at timestep *t* for each of the\n\n           *n_ex* examples.\n\n\n\n   backward(dLdAt)\n\n\n\n      Backprop for a single timestep.\n\n\n\n      Parameters:\n\n         **dLdAt** (\"ndarray\" of shape *(n_ex, n_out)*) -- The\n\n         gradient of the loss wrt. the layer outputs (ie., hidden\n\n         states) at timestep *t*.\n\n\n\n      Returns:\n\n         **dLdXt** (\"ndarray\" of shape *(n_ex, n_in)*) -- The gradient\n\n         of the loss wrt. the layer inputs at timestep *t*.\n\n\n\n   flush_gradients()\n\n\n\n      Erase all the layer's derived variables and gradients.\n", "class_name": "numpy_ml.neural_nets.layers.LSTMCell", "class_link": "numpy_ml/neural_nets/layers/layers.py#L3782-L4085", "test_file_path": "numpy_ml/tests/test_LSTMCell.py"}
+{"title": "DFT", "class_annotation": "numpy_ml.preprocessing.dsp.DFT(frame, positive_only=True)", "comment": "\"DFT\"\n\n*****\n\n\n\nnumpy_ml.preprocessing.dsp.DFT(frame, positive_only=True)\n\n\n\n   A naive O(N^2) implementation of the 1D discrete Fourier transform\n\n   (DFT).\n\n\n\n   -[ Notes ]-\n\n\n\n   The Fourier transform decomposes a signal into a linear combination\n\n   of sinusoids (ie., basis elements in the space of continuous\n\n   periodic functions).  For a sequence \\mathbf{x} = [x_1, \\ldots,\n\n   x_N] of N evenly spaced samples, the *k* th DFT coefficient is\n\n   given by:\n\n\n\n      c_k = \\sum_{n=0}^{N-1} x_n \\exp(-2 \\pi i k n / N)\n\n\n\n   where *i* is the imaginary unit, *k* is an index ranging from *0,\n\n   ..., N-1*, and X_k is the complex coefficient representing the\n\n   phase (imaginary part) and amplitude (real part) of the *k* th\n\n   sinusoid in the DFT spectrum. The frequency of the *k* th sinusoid\n\n   is (k 2 \\pi / N) radians per sample.\n\n\n\n   When applied to a real-valued input, the negative frequency terms\n\n   are the complex conjugates of the positive-frequency terms and the\n\n   overall spectrum is symmetric (excluding the first index, which\n\n   contains the zero-frequency / intercept term).\n\n\n\n   Parameters:\n\n      * **frame** (\"ndarray\" of shape *(N,)*) -- A signal frame\n\n        consisting of N samples\n\n\n\n      * **positive_only** (*bool*) -- Whether to only return the\n\n        coefficients for the positive frequency terms. Default is\n\n        True.\n\n\n\n   Returns:\n\n      **spectrum** (\"ndarray\" of shape *(N,)* or *(N // 2 + 1,)* if\n\n      *real_only*) -- The coefficients of the frequency spectrum for\n\n      *frame*, including imaginary components.\n", "class_name": "numpy_ml.preprocessing.dsp.DFT", "class_link": "numpy_ml/preprocessing/dsp.py#L224-L273", "test_file_path": "numpy_ml/tests/test_DFT.py"}
+{"title": "SoftPlus", "class_annotation": "numpy_ml.neural_nets.activations.SoftPlus", "comment": "\"SoftPlus\"\n\n**********\n\n\n\nclass numpy_ml.neural_nets.activations.SoftPlus\n\n\n\n   A softplus activation function.\n\n\n\n   -[ Notes ]-\n\n\n\n   In contrast to \"ReLU\", the softplus activation is differentiable\n\n   everywhere (including 0). It is, however, less computationally\n\n   efficient to compute.\n\n\n\n   The derivative of the softplus activation is the logistic sigmoid.\n\n\n\n   fn(z)\n\n\n\n      Evaluate the softplus activation on the elements of input *z*.\n\n\n\n         \\text{SoftPlus}(z_i) = \\log(1 + e^{z_i})\n\n\n\n   grad(x)\n\n\n\n      Evaluate the first derivative of the softplus activation on the\n\n      elements of input *x*.\n\n\n\n         \\frac{\\partial \\text{SoftPlus}}{\\partial x_i} =\n\n         \\frac{e^{x_i}}{1 + e^{x_i}}\n\n\n\n   grad2(x)\n\n\n\n      Evaluate the second derivative of the softplus activation on the\n\n      elements of input *x*.\n\n\n\n         \\frac{\\partial^2 \\text{SoftPlus}}{\\partial x_i^2} =\n\n         \\frac{e^{x_i}}{(1 + e^{x_i})^2}\n", "class_name": "numpy_ml.neural_nets.activations.SoftPlus", "class_link": "numpy_ml/neural_nets/activations/activations.py#L669-L721", "test_file_path": "numpy_ml/tests/test_SoftPlus.py"}
+{"title": "GeneralizedLinearModel", "class_annotation": "numpy_ml.linear_models.GeneralizedLinearModel(link, fit_intercept=True, tol=1e-05, max_iter=100)", "comment": "\"GeneralizedLinearModel\"\n\n************************\n\n\n\nclass numpy_ml.linear_models.GeneralizedLinearModel(link, fit_intercept=True, tol=1e-05, max_iter=100)\n\n\n\n   A generalized linear model with maximum likelihood fit via\n\n   iteratively reweighted least squares (IRLS).\n\n\n\n   -[ Notes ]-\n\n\n\n   The generalized linear model (GLM) [7] [8] assumes that each\n\n   target/dependent variable y_i in target vector \\mathbf{y} = (y_1,\n\n   \\ldots, y_n), has been drawn independently from a pre-specified\n\n   distribution in the exponential family [11] with unknown mean\n\n   \\mu_i. The GLM models a (one-to-one, continuous, differentiable)\n\n   function, *g*, of this mean value as a linear combination of the\n\n   model parameters \\mathbf{b} and observed covariates, \\mathbf{x}_i:\n\n\n\n      g(\\mathbb{E}[y_i \\mid \\mathbf{x}_i]) =     g(\\mu_i) =\n\n      \\mathbf{b}^\\top \\mathbf{x}_i\n\n\n\n   where *g* is known as the \"link function\" associated with the GLM.\n\n   The choice of link function is informed by the instance of the\n\n   exponential family the target is drawn from. Common examples:\n\n\n\n   +---------------------------+----------------------+--------------------------------+\n\n   | Distribution              | Link                 | Formula                        |\n\n   |===========================|======================|================================|\n\n   | Normal                    | Identity             | g(x) = x                       |\n\n   +---------------------------+----------------------+--------------------------------+\n\n   | Bernoulli                 | Logit                | g(x) = \\log(x) - \\log(1 - x)   |\n\n   +---------------------------+----------------------+--------------------------------+\n\n   | Binomial                  | Logit                | g(x) = \\log(x) - \\log(n - x)   |\n\n   +---------------------------+----------------------+--------------------------------+\n\n   | Poisson                   | Log                  | g(x) = \\log(x)                 |\n\n   +---------------------------+----------------------+--------------------------------+\n\n\n\n   An iteratively re-weighted least squares (IRLS) algorithm [9] can\n\n   be employed to find the maximum likelihood estimate for the model\n\n   parameters \\beta in any instance of the generalized linear model.\n\n   IRLS is equivalent to Fisher scoring [10], which itself is a slight\n\n   modification of classic Newton-Raphson for finding the zeros of the\n\n   first derivative of the model log-likelihood.\n\n\n\n   -[ References ]-\n\n\n\n   [7] Nelder, J., & Wedderburn, R. (1972). Generalized linear models.\n\n       *Journal of the Royal Statistical Society, Series A (General),\n\n       135(3)*: 370\u2013384.\n\n\n\n   [8] https://en.wikipedia.org/wiki/Generalized_linear_model\n\n\n\n   [9] https://en.wikipedia.org/wiki/Iteratively_reweighted_least_squ\n\n       ares\n\n\n\n   [10] https://en.wikipedia.org/wiki/Scoring_algorithm\n\n\n\n   [11] https://en.wikipedia.org/wiki/Exponential_family\n\n\n\n   Parameters:\n\n      * **link** (*{'identity'**, **'logit'**, **'log'}*) -- The link\n\n        function to use during modeling.\n\n\n\n      * **fit_intercept** (*bool*) -- Whether to fit an intercept term\n\n        in addition to the model coefficients. Default is True.\n\n\n\n      * **tol** (*float*) -- The minimum difference between successive\n\n        iterations of IRLS Default is 1e-5.\n\n\n\n      * **max_iter** (*int*) -- The maximum number of iteratively\n\n        reweighted least squares iterations to run during fitting.\n\n        Default is 100.\n\n\n\n   Variables:\n\n      **beta** (\"ndarray\" of shape *(M, 1)* or None) -- Fitted model\n\n      coefficients.\n\n\n\n   fit(X, y)\n\n\n\n      Find the maximum likelihood GLM coefficients via IRLS.\n\n\n\n      Parameters:\n\n         * **X** (\"ndarray\" of shape *(N, M)*) -- A dataset consisting\n\n           of *N* examples, each of dimension *M*.\n\n\n\n         * **y** (\"ndarray\" of shape *(N,)*) -- The targets for each\n\n           of the *N* examples in *X*.\n\n\n\n      Returns:\n\n         **self** (\"GeneralizedLinearModel\" instance)\n\n\n\n   predict(X)\n\n\n\n      Use the trained model to generate predictions for the\n\n      distribution means, \\mu, associated with the collection of data\n\n      points in **X**.\n\n\n\n      Parameters:\n\n         **X** (\"ndarray\" of shape *(Z, M)*) -- A dataset consisting\n\n         of *Z* new examples, each of dimension *M*.\n\n\n\n      Returns:\n\n         **mu_pred** (\"ndarray\" of shape *(Z,)*) -- The model\n\n         predictions for the expected value of the target associated\n\n         with each item in *X*.\n", "class_name": "numpy_ml.linear_models.GeneralizedLinearModel", "class_link": "numpy_ml/linear_models/glm.py#L48-L212", "test_file_path": "numpy_ml/tests/test_GeneralizedLinearModel.py"}
+{"title": "AdditiveNGram", "class_annotation": "numpy_ml.ngram.AdditiveNGram(N, K=1, unk=True, filter_stopwords=True, filter_punctuation=True)", "comment": "\"AdditiveNGram\"\n\n***************\n\n\n\nclass numpy_ml.ngram.AdditiveNGram(N, K=1, unk=True, filter_stopwords=True, filter_punctuation=True)\n\n\n\n   An N-Gram model with smoothed probabilities calculated via additive\n\n   / Lidstone smoothing.\n\n\n\n   -[ Notes ]-\n\n\n\n   The resulting estimates correspond to the expected value of the\n\n   posterior, *p(ngram_prob | counts)*, when using a symmetric\n\n   Dirichlet prior on counts with parameter *K*.\n\n\n\n   Parameters:\n\n      * **N** (*int*) -- The maximum length (in words) of the context-\n\n        window to use in the langauge model. Model will compute all\n\n        n-grams from 1, ..., N\n\n\n\n      * **K** (*float*) -- The pseudocount to add to each observation.\n\n        Larger values allocate more probability toward unseen events.\n\n        When *K* = 1, the model is known as Laplace smoothing.  When\n\n        *K* = 0.5, the model is known as expected likelihood\n\n        estimation (ELE) or the Jeffreys-Perks law. Default is 1.\n\n\n\n      * **unk** (*bool*) -- Whether to include the \"<unk>\" (unknown)\n\n        token in the LM. Default is True.\n\n\n\n      * **filter_stopwords** (*bool*) -- Whether to remove stopwords\n\n        before training. Default is True.\n\n\n\n      * **filter_punctuation** (*bool*) -- Whether to remove\n\n        punctuation before training. Default is True.\n\n\n\n   log_prob(words, N)\n\n\n\n      Compute the smoothed log probability of a sequence of words\n\n      under the *N*-gram language model with additive smoothing.\n\n\n\n      -[ Notes ]-\n\n\n\n      For a bigram, additive smoothing amounts to:\n\n\n\n         P(w_i \\mid w_{i-1}) = \\frac{A + K}{B + KV}\n\n\n\n      where\n\n\n\n         A  &=  \\text{Count}(w_{i-1}, w_i) \\\\ B  &=  \\sum_j\n\n         \\text{Count}(w_{i-1}, w_j) \\\\ V  &= |\\{ w_j \\ : \\\n\n         \\text{Count}(w_{i-1}, w_j) > 0 \\}|\n\n\n\n      This is equivalent to pretending we've seen every possible\n\n      *N*-gram sequence at least *K* times.\n\n\n\n      Additive smoothing can be problematic, as it:\n\n         * Treats each predicted word in the same way\n\n\n\n         * Can assign too much probability mass to unseen *N*-grams\n\n\n\n      Parameters:\n\n         * **words** (*list** of **strings*) -- A sequence of words.\n\n\n\n         * **N** (*int*) -- The gram-size of the language model to use\n\n           when calculating the log probabilities of the sequence.\n\n\n\n      Returns:\n\n         **total_prob** (*float*) -- The total log-probability of the\n\n         sequence *words* under the *N*-gram language model.\n\n\n\n   completions(words, N)\n\n\n\n      Return the distribution over proposed next words under the\n\n      *N*-gram language model.\n\n\n\n      Parameters:\n\n         * **words** (*list** or **tuple** of **strings*) -- The\n\n           initial sequence of words\n\n\n\n         * **N** (*int*) -- The gram-size of the language model to use\n\n           to generate completions\n\n\n\n      Returns:\n\n         **probs** (*list of (word, log_prob) tuples*) -- The list of\n\n         possible next words and their log probabilities under the\n\n         *N*-gram language model (unsorted)\n\n\n\n   cross_entropy(words, N)\n\n\n\n      Calculate the model cross-entropy on a sequence of words against\n\n      the empirical distribution of words in a sample.\n\n\n\n      -[ Notes ]-\n\n\n\n      Model cross-entropy, *H*, is defined as\n\n\n\n         H(W) = -\\frac{\\log p(W)}{n}\n\n\n\n      where W = [w_1, \\ldots, w_k] is a sequence of words, and *n* is\n\n      the number of *N*-grams in *W*.\n\n\n\n      The model cross-entropy is proportional (not equal, since we use\n\n      base *e*) to the average number of bits necessary to encode *W*\n\n      under the model distribution.\n\n\n\n      Parameters:\n\n         * **N** (*int*) -- The gram-size of the model to calculate\n\n           cross-entropy on.\n\n\n\n         * **words** (*list** or **tuple** of **strings*) -- The\n\n           sequence of words to compute cross-entropy on.\n\n\n\n      Returns:\n\n         **H** (*float*) -- The model cross-entropy for the words in\n\n         *words*.\n\n\n\n   generate(N, seed_words=['<bol>'], n_sentences=5)\n\n\n\n      Use the *N*-gram language model to generate sentences.\n\n\n\n      Parameters:\n\n         * **N** (*int*) -- The gram-size of the model to generate\n\n           from\n\n\n\n         * **seed_words** (*list** of **strs*) -- A list of seed words\n\n           to use to condition the initial sentence generation.\n\n           Default is \"[\"<bol>\"]\".\n\n\n\n         * **sentences** (*int*) -- The number of sentences to\n\n           generate from the *N*-gram model. Default is 50.\n\n\n\n      Returns:\n\n         **sentences** (*str*) -- Samples from the *N*-gram model,\n\n         joined by white spaces, with individual sentences separated\n\n         by newlines.\n\n\n\n   perplexity(words, N)\n\n\n\n      Calculate the model perplexity on a sequence of words.\n\n\n\n      -[ Notes ]-\n\n\n\n      Perplexity, *PP*, is defined as\n\n\n\n         PP(W)  =  \\left( \\frac{1}{p(W)} \\right)^{1 / n}\n\n\n\n      or simply\n\n\n\n         PP(W)  &=  \\exp(-\\log p(W) / n) \\\\        &=  \\exp(H(W))\n\n\n\n      where W = [w_1, \\ldots, w_k] is a sequence of words, *H(w)* is\n\n      the cross-entropy of *W* under the current model, and *n* is the\n\n      number of *N*-grams in *W*.\n\n\n\n      Minimizing perplexity is equivalent to maximizing the\n\n      probability of *words* under the *N*-gram model. It may also be\n\n      interpreted as the average branching factor when predicting the\n\n      next word under the language model.\n\n\n\n      Parameters:\n\n         * **N** (*int*) -- The gram-size of the model to calculate\n\n           perplexity with.\n\n\n\n         * **words** (*list** or **tuple** of **strings*) -- The\n\n           sequence of words to compute perplexity on.\n\n\n\n      Returns:\n\n         **perplexity** (*float*) -- The model perlexity for the words\n\n         in *words*.\n\n\n\n   train(corpus_fp, vocab=None, encoding=None)\n\n\n\n      Compile the n-gram counts for the text(s) in *corpus_fp*.\n\n\n\n      -[ Notes ]-\n\n\n\n      After running *train*, the \"self.counts\" attribute will store\n\n      dictionaries of the *N*, *N-1*, ..., 1-gram counts.\n\n\n\n      Parameters:\n\n         * **corpus_fp** (*str*) -- The path to a newline-separated\n\n           text corpus file.\n\n\n\n         * **vocab** (\"Vocabulary\" instance or None) -- If not None,\n\n           only the words in *vocab* will be used to construct the\n\n           language model; all out-of-vocabulary words will either be\n\n           mappend to \"<unk>\" (if \"self.unk = True\") or removed (if\n\n           \"self.unk = False\"). Default is None.\n\n\n\n         * **encoding** (*str** or **None*) -- Specifies the text\n\n           encoding for corpus. Common entries are 'utf-8',\n\n           'utf-8-sig', 'utf-16'. Default is None.\n", "class_name": "numpy_ml.ngram.AdditiveNGram", "class_link": "numpy_ml/ngram/ngram.py#L364-L456", "test_file_path": "numpy_ml/tests/test_AdditiveNGram.py"}
+{"title": "GradientBoostedDecisionTree", "class_annotation": "numpy_ml.trees.GradientBoostedDecisionTree(n_iter, max_depth=None, classifier=True, learning_rate=1, loss='crossentropy', step_size='constant')", "comment": "\"GradientBoostedDecisionTree\"\n\n*****************************\n\n\n\nclass numpy_ml.trees.GradientBoostedDecisionTree(n_iter, max_depth=None, classifier=True, learning_rate=1, loss='crossentropy', step_size='constant')\n\n\n\n   A gradient boosted ensemble of decision trees.\n\n\n\n   -[ Notes ]-\n\n\n\n   Gradient boosted machines (GBMs) fit an ensemble of *m* weak\n\n   learners such that:\n\n\n\n      f_m(X) = b(X) + \\eta w_1 g_1 + \\ldots + \\eta w_m g_m\n\n\n\n   where *b* is a fixed initial estimate for the targets, \\eta is a\n\n   learning rate parameter, and w_{\\cdot} and g_{\\cdot} denote the\n\n   weights and learner predictions for subsequent fits.\n\n\n\n   We fit each *w* and *g* iteratively using a greedy strategy so that\n\n   at each iteration *i*,\n\n\n\n      w_i, g_i = \\arg \\min_{w_i, g_i} L(Y, f_{i-1}(X) + w_i g_i)\n\n\n\n   On each iteration we fit a new weak learner to predict the negative\n\n   gradient of the loss with respect to the previous prediction,\n\n   f_{i-1}(X). We then use the element-wise product of the predictions\n\n   of this weak learner, g_i, with a weight, w_i, to compute the\n\n   amount to adjust the predictions of our model at the previous\n\n   iteration, f_{i-1}(X):\n\n\n\n      f_i(X) := f_{i-1}(X) + w_i g_i\n\n\n\n   Parameters:\n\n      * **n_iter** (*int*) -- The number of iterations / weak\n\n        estimators to use when fitting each dimension / class of *Y*.\n\n\n\n      * **max_depth** (*int*) -- The maximum depth of each decision\n\n        tree weak estimator. Default is None.\n\n\n\n      * **classifier** (*bool*) -- Whether *Y* contains class labels\n\n        or real-valued targets. Default is True.\n\n\n\n      * **learning_rate** (*float*) -- Value in [0, 1] controlling the\n\n        amount each weak estimator contributes to the overall model\n\n        prediction. Sometimes known as the *shrinkage parameter* in\n\n        the GBM literature. Default is 1.\n\n\n\n      * **loss** (*{'crossentropy'**, **'mse'}*) -- The loss to\n\n        optimize for the GBM. Default is 'crossentropy'.\n\n\n\n      * **step_size** (*{\"constant\"**, **\"adaptive\"}*) -- How to\n\n        choose the weight for each weak learner. If \"constant\", use a\n\n        fixed weight of 1 for each learner. If \"adaptive\", use a step\n\n        size computed via line-search on the current iteration's loss.\n\n        Default is 'constant'.\n\n\n\n   fit(X, Y)\n\n\n\n      Fit the gradient boosted decision trees on a dataset.\n\n\n\n      Parameters:\n\n         * **X** (\"ndarray\" of shape (N, M)) -- The training data of\n\n           *N* examples, each with *M* features\n\n\n\n         * **Y** (\"ndarray\" of shape (N,)) -- An array of integer\n\n           class labels for each example in *X* if \"self.classifier =\n\n           True\", otherwise the set of target values for each example\n\n           in *X*.\n\n\n\n   predict(X)\n\n\n\n      Use the trained model to classify or predict the examples in\n\n      *X*.\n\n\n\n      Parameters:\n\n         **X** (\"ndarray\" of shape *(N, M)*) -- The training data of\n\n         *N* examples, each with *M* features\n\n\n\n      Returns:\n\n         **preds** (\"ndarray\" of shape *(N,)*) -- The integer class\n\n         labels predicted for each example in *X* if \"self.classifier\n\n         = True\", otherwise the predicted target values.\n", "class_name": "numpy_ml.trees.GradientBoostedDecisionTree", "class_link": "numpy_ml/trees/gbdt.py#L18-L181", "test_file_path": "numpy_ml/tests/test_GradientBoostedDecisionTree.py"}
+{"title": "BatchNorm1D", "class_annotation": "numpy_ml.neural_nets.layers.BatchNorm1D(momentum=0.9, epsilon=1e-05, optimizer=None)", "comment": "\"BatchNorm1D\"\n\n*************\n\n\n\nclass numpy_ml.neural_nets.layers.BatchNorm1D(momentum=0.9, epsilon=1e-05, optimizer=None)\n\n\n\n   Bases: \"LayerBase\"\n\n\n\n   A batch normalization layer for 1D inputs.\n\n\n\n   -[ Notes ]-\n\n\n\n   BatchNorm is an attempt address the problem of internal covariate\n\n   shift (ICS) during training by normalizing layer inputs.\n\n\n\n   ICS refers to the change in the distribution of layer inputs during\n\n   training as a result of the changing parameters of the previous\n\n   layer(s). ICS can make it difficult to train models with saturating\n\n   nonlinearities, and in general can slow training by requiring a\n\n   lower learning rate.\n\n\n\n   Equations [train]:\n\n\n\n      Y = scaler * norm(X) + intercept\n\n      norm(X) = (X - mean(X)) / sqrt(var(X) + epsilon)\n\n\n\n   Equations [test]:\n\n\n\n      Y = scaler * running_norm(X) + intercept\n\n      running_norm(X) = (X - running_mean) / sqrt(running_var + epsilon)\n\n\n\n   In contrast to \"LayerNorm1D\", the BatchNorm layer calculates the\n\n   mean and var across the *batch* rather than the output features.\n\n   This has two disadvantages:\n\n\n\n      1. It is highly affected by batch size: smaller mini-batch sizes\n\n      increase the variance of the estimates for the global mean and\n\n      variance.\n\n\n\n      2. It is difficult to apply in RNNs -- one must fit a separate\n\n      BatchNorm layer for *each* time-step.\n\n\n\n   Parameters:\n\n      * **momentum** (*float*) -- The momentum term for the running\n\n        mean/running std calculations. The closer this is to 1, the\n\n        less weight will be given to the mean/std of the current batch\n\n        (i.e., higher smoothing). Default is 0.9.\n\n\n\n      * **epsilon** (*float*) -- A small smoothing constant to use\n\n        during computation of \"norm(X)\" to avoid divide-by-zero\n\n        errors. Default is 1e-5.\n\n\n\n      * **optimizer** (str, Optimizer object, or None) -- The\n\n        optimization strategy to use when performing gradient updates\n\n        within the \"update()\" method.  If None, use the \"SGD\"\n\n        optimizer with default parameters. Default is None.\n\n\n\n   Variables:\n\n      * **X** (*list*) -- Running list of inputs to the \"forward\"\n\n        method since the last call to \"update\". Only updated if the\n\n        *retain_derived* argument was set to True.\n\n\n\n      * **gradients** (*dict*) -- Dictionary of loss gradients with\n\n        regard to the layer parameters\n\n\n\n      * **parameters** (*dict*) -- Dictionary of layer parameters\n\n\n\n      * **hyperparameters** (*dict*) -- Dictionary of layer\n\n        hyperparameters\n\n\n\n      * **derived_variables** (*dict*) -- Dictionary of any\n\n        intermediate values computed during forward/backward\n\n        propagation.\n\n\n\n   property hyperparameters\n\n\n\n      Return a dictionary containing the layer hyperparameters.\n\n\n\n   reset_running_stats()\n\n\n\n      Reset the running mean and variance estimates to 0 and 1.\n\n\n\n   forward(X, retain_derived=True)\n\n\n\n      Compute the layer output on a single minibatch.\n\n\n\n      Parameters:\n\n         * **X** (\"ndarray\" of shape *(n_ex, n_in)*) -- Layer input,\n\n           representing the *n_in*-dimensional features for a\n\n           minibatch of *n_ex* examples.\n\n\n\n         * **retain_derived** (*bool*) -- Whether to use the current\n\n           intput to adjust the running mean and running_var\n\n           computations. Setting this to True is the same as freezing\n\n           the layer for the current input. Default is True.\n\n\n\n      Returns:\n\n         **Y** (\"ndarray\" of shape *(n_ex, n_in)*) -- Layer output for\n\n         each of the *n_ex* examples\n\n\n\n   backward(dLdy, retain_grads=True)\n\n\n\n      Backprop from layer outputs to inputs.\n\n\n\n      Parameters:\n\n         * **dLdY** (\"ndarray\" of shape *(n_ex, n_in)*) -- The\n\n           gradient of the loss wrt. the layer output *Y*.\n\n\n\n         * **retain_grads** (*bool*) -- Whether to include the\n\n           intermediate parameter gradients computed during the\n\n           backward pass in the final parameter update. Default is\n\n           True.\n\n\n\n      Returns:\n\n         **dX** (\"ndarray\" of shape *(n_ex, n_in)*) -- The gradient of\n\n         the loss wrt. the layer input *X*.\n", "class_name": "numpy_ml.neural_nets.layers.BatchNorm1D", "class_link": "numpy_ml/neural_nets/layers/layers.py#L1218-L1441", "test_file_path": "numpy_ml/tests/test_BatchNorm1D.py"}
+{"title": "LinearRegression", "class_annotation": "numpy_ml.linear_models.LinearRegression(fit_intercept=True)", "comment": "\"LinearRegression\"\n\n******************\n\n\n\nclass numpy_ml.linear_models.LinearRegression(fit_intercept=True)\n\n\n\n   A weighted linear least-squares regression model.\n\n\n\n   -[ Notes ]-\n\n\n\n   In weighted linear least-squares regression [1], a real-valued\n\n   target vector, **y**, is modeled as a linear combination of\n\n   covariates, **X**, and model coefficients, \\beta:\n\n\n\n      y_i = \\beta^\\top \\mathbf{x}_i + \\epsilon_i\n\n\n\n   In this equation \\epsilon_i \\sim \\mathcal{N}(0, \\sigma^2_i) is the\n\n   error term associated with example i, and \\sigma^2_i is the\n\n   variance of the corresponding example.\n\n\n\n   Under this model, the maximum-likelihood estimate for the\n\n   regression coefficients, \\beta, is:\n\n\n\n      \\hat{\\beta} = \\Sigma^{-1} \\mathbf{X}^\\top \\mathbf{Wy}\n\n\n\n   where \\Sigma^{-1} = (\\mathbf{X}^\\top \\mathbf{WX})^{-1} and **W** is\n\n   a diagonal matrix of weights, with each entry inversely\n\n   proportional to the variance of the corresponding measurement. When\n\n   **W** is the identity matrix the examples are weighted equally and\n\n   the model reduces to standard linear least squares [2].\n\n\n\n   -[ References ]-\n\n\n\n   [1] https://en.wikipedia.org/wiki/Weighted_least_squares\n\n\n\n   [2] https://en.wikipedia.org/wiki/General_linear_model\n\n\n\n   Parameters:\n\n      **fit_intercept** (*bool*) -- Whether to fit an intercept term\n\n      in addition to the model coefficients. Default is True.\n\n\n\n   Variables:\n\n      * **beta** (\"ndarray\" of shape *(M, K)* or None) -- Fitted model\n\n        coefficients.\n\n\n\n      * **sigma_inv** (\"ndarray\" of shape *(N, N)* or None) -- Inverse\n\n        of the data covariance matrix.\n\n\n\n   update(X, y, weights=None)\n\n\n\n      Incrementally update the linear least-squares coefficients for a\n\n      set of new examples.\n\n\n\n      -[ Notes ]-\n\n\n\n      The recursive least-squares algorithm [3] [4] is used to\n\n      efficiently update the regression parameters as new examples\n\n      become available. For a single new example (\\mathbf{x}_{t+1},\n\n      \\mathbf{y}_{t+1}), the parameter updates are\n\n\n\n         \\beta_{t+1} = \\left(     \\mathbf{X}_{1:t}^\\top\n\n         \\mathbf{X}_{1:t} +\n\n         \\mathbf{x}_{t+1}\\mathbf{x}_{t+1}^\\top \\right)^{-1}\n\n         \\mathbf{X}_{1:t}^\\top \\mathbf{Y}_{1:t} +\n\n         \\mathbf{x}_{t+1}^\\top \\mathbf{y}_{t+1}\n\n\n\n      where \\beta_{t+1} are the updated regression coefficients,\n\n      \\mathbf{X}_{1:t} and \\mathbf{Y}_{1:t} are the set of examples\n\n      observed from timestep 1 to *t*.\n\n\n\n      In the single-example case, the RLS algorithm uses the Sherman-\n\n      Morrison formula [5] to avoid re-inverting the covariance matrix\n\n      on each new update. In the multi-example case (i.e., where\n\n      \\mathbf{X}_{t+1} and \\mathbf{y}_{t+1} are matrices of *N*\n\n      examples each), we use the generalized Woodbury matrix identity\n\n      [6] to update the inverse covariance. This comes at a\n\n      performance cost, but is still more performant than doing\n\n      multiple single-example updates if *N* is large.\n\n\n\n      -[ References ]-\n\n\n\n      [3] Gauss, C. F. (1821) *Theoria combinationis observationum\n\n          erroribus minimis obnoxiae*, Werke, 4. Gottinge\n\n\n\n      [4] https://en.wikipedia.org/wiki/Recursive_least_squares_filter\n\n\n\n      [5] https://en.wikipedia.org/wiki/Sherman%E2%80%93Morrison_form\n\n          ula\n\n\n\n      [6] https://en.wikipedia.org/wiki/Woodbury_matrix_identity\n\n\n\n      Parameters:\n\n         * **X** (\"ndarray\" of shape *(N, M)*) -- A dataset consisting\n\n           of *N* examples, each of dimension *M*\n\n\n\n         * **y** (\"ndarray\" of shape *(N, K)*) -- The targets for each\n\n           of the *N* examples in *X*, where each target has dimension\n\n           *K*\n\n\n\n         * **weights** (\"ndarray\" of shape *(N,)* or None) -- Weights\n\n           associated with the examples in *X*. Examples with larger\n\n           weights exert greater influence on model fit.  When *y* is\n\n           a vector (i.e., *K = 1*), weights should be set to the\n\n           reciporical of the variance for each measurement (i.e., w_i\n\n           = 1/\\sigma^2_i). When *K > 1*, it is assumed that all\n\n           columns of *y* share the same weight w_i. If None, examples\n\n           are weighted equally, resulting in the standard linear\n\n           least squares update.  Default is None.\n\n\n\n      Returns:\n\n         **self** (\"LinearRegression\" instance)\n\n\n\n   fit(X, y, weights=None)\n\n\n\n      Fit regression coefficients via maximum likelihood.\n\n\n\n      Parameters:\n\n         * **X** (\"ndarray\" of shape *(N, M)*) -- A dataset consisting\n\n           of *N* examples, each of dimension *M*.\n\n\n\n         * **y** (\"ndarray\" of shape *(N, K)*) -- The targets for each\n\n           of the *N* examples in *X*, where each target has dimension\n\n           *K*.\n\n\n\n         * **weights** (\"ndarray\" of shape *(N,)* or None) -- Weights\n\n           associated with the examples in *X*. Examples with larger\n\n           weights exert greater influence on model fit.  When *y* is\n\n           a vector (i.e., *K = 1*), weights should be set to the\n\n           reciporical of the variance for each measurement (i.e., w_i\n\n           = 1/\\sigma^2_i). When *K > 1*, it is assumed that all\n\n           columns of *y* share the same weight w_i. If None, examples\n\n           are weighted equally, resulting in the standard linear\n\n           least squares update.  Default is None.\n\n\n\n      Returns:\n\n         **self** (\"LinearRegression\" instance)\n\n\n\n   predict(X)\n\n\n\n      Use the trained model to generate predictions on a new\n\n      collection of data points.\n\n\n\n      Parameters:\n\n         **X** (\"ndarray\" of shape *(Z, M)*) -- A dataset consisting\n\n         of *Z* new examples, each of dimension *M*.\n\n\n\n      Returns:\n\n         **y_pred** (\"ndarray\" of shape *(Z, K)*) -- The model\n\n         predictions for the items in *X*.\n", "class_name": "numpy_ml.linear_models.LinearRegression", "class_link": "numpy_ml/linear_models/linear_regression.py#L6-L236", "test_file_path": "numpy_ml/tests/test_LinearRegression.py"}
+{"title": "ELU", "class_annotation": "numpy_ml.neural_nets.activations.ELU(alpha=1.0)", "comment": "\"ELU\"\n\n*****\n\n\n\nclass numpy_ml.neural_nets.activations.ELU(alpha=1.0)\n\n\n\n   An exponential linear unit (ELU).\n\n\n\n   -[ Notes ]-\n\n\n\n   ELUs are intended to address the fact that ReLUs are strictly\n\n   nonnegative and thus have an average activation > 0, increasing the\n\n   chances of internal covariate shift and slowing down learning. ELU\n\n   units address this by (1) allowing negative values when x < 0,\n\n   which (2) are bounded by a value -\\alpha. Similar to \"LeakyReLU\",\n\n   the negative activation values help to push the average unit\n\n   activation towards 0. Unlike \"LeakyReLU\", however, the boundedness\n\n   of the negative activation allows for greater robustness in the\n\n   face of large negative values, allowing the function to avoid\n\n   conveying the *degree* of \"absence\" (negative activation) in the\n\n   input. [*]\n\n\n\n   Parameters:\n\n      **alpha** (*float*) -- Slope of negative segment. Default is 1.\n\n\n\n   -[ References ]-\n\n\n\n   [*] Clevert, D. A., Unterthiner, T., Hochreiter, S. (2016). \"Fast\n\n       and accurate deep network learning by exponential linear units\n\n       (ELUs)\". *4th International Conference on Learning\n\n       Representations*.\n\n\n\n   fn(z)\n\n\n\n      Evaluate the ELU activation on the elements of input *z*.\n\n\n\n         \\text{ELU}(z_i)     &=  z_i \\ \\ \\ \\ &&\\text{if }z_i > 0 \\\\\n\n         &=  \\alpha (e^{z_i} - 1) \\ \\ \\ \\ &&\\text{otherwise}\n\n\n\n   grad(x)\n\n\n\n      Evaluate the first derivative of the ELU activation on the\n\n      elements of input *x*.\n\n\n\n         \\frac{\\partial \\text{ELU}}{\\partial x_i}     &=  1 \\ \\ \\ \\\n\n         &&\\text{if } x_i > 0 \\\\     &=  \\alpha e^{x_i} \\ \\ \\ \\\n\n         &&\\text{otherwise}\n\n\n\n   grad2(x)\n\n\n\n      Evaluate the second derivative of the ELU activation on the\n\n      elements of input *x*.\n\n\n\n         \\frac{\\partial^2 \\text{ELU}}{\\partial x_i^2}     &=  0 \\ \\ \\\n\n         \\ &&\\text{if } x_i > 0 \\\\     &=  \\alpha e^{x_i} \\ \\ \\ \\\n\n         &&\\text{otherwise}\n", "class_name": "numpy_ml.neural_nets.activations.ELU", "class_link": "numpy_ml/neural_nets/activations/activations.py#L412-L488", "test_file_path": "numpy_ml/tests/test_ELU.py"}
+{"title": "WavenetResidualModule", "class_annotation": "numpy_ml.neural_nets.modules.WavenetResidualModule(ch_residual, ch_dilation, dilation, kernel_width, optimizer=None, init='glorot_uniform')", "comment": "\"WavenetResidualModule\"\n\n***********************\n\n\n\nclass numpy_ml.neural_nets.modules.WavenetResidualModule(ch_residual, ch_dilation, dilation, kernel_width, optimizer=None, init='glorot_uniform')\n\n\n\n   A WaveNet-like residual block with causal dilated convolutions.\n\n\n\n      *Skip path in* >-------------------------------------------> + ---> *Skip path out*\n\n                        Causal      |--> Tanh --|                  |\n\n      *Main    |--> Dilated Conv1D -|           * --> 1x1 Conv1D --|\n\n       path >--|                    |--> Sigm --|                  |\n\n       in*     |-------------------------------------------------> + ---> *Main path out*\n\n                                   *Residual path*\n\n\n\n   On the final block, the output of the skip path is further\n\n   processed to produce the network predictions.\n\n\n\n   -[ References ]-\n\n\n\n   [1] van den Oord et al. (2016). \"Wavenet: a generative model for\n\n       raw audio\". https://arxiv.org/pdf/1609.03499.pdf\n\n\n\n   Parameters:\n\n      * **ch_residual** (*int*) -- The number of output channels for\n\n        the 1x1 \"Conv1D\" layer in the main path.\n\n\n\n      * **ch_dilation** (*int*) -- The number of output channels for\n\n        the causal dilated \"Conv1D\" layer in the main path.\n\n\n\n      * **dilation** (*int*) -- The dilation rate for the causal\n\n        dilated \"Conv1D\" layer in the main path.\n\n\n\n      * **kernel_width** (*int*) -- The width of the causal dilated\n\n        \"Conv1D\" kernel in the main path.\n\n\n\n      * **init** (*{'glorot_normal'**, **'glorot_uniform'**,\n\n        **'he_normal'**, **'he_uniform'}*) -- The weight\n\n        initialization strategy. Default is 'glorot_uniform'.\n\n\n\n      * **optimizer** (str or Optimizer object or None) -- The\n\n        optimization strategy to use when performing gradient updates\n\n        within the \"update()\" method.  If None, use the \"SGD\"\n\n        optimizer with default parameters. Default is None.\n\n\n\n   property parameters\n\n\n\n      A dictionary of the module parameters.\n\n\n\n   property hyperparameters\n\n\n\n      A dictionary of the module hyperparameters\n\n\n\n   property derived_variables\n\n\n\n      A dictionary of intermediate values computed during the\n\n      forward/backward passes.\n\n\n\n   property gradients\n\n\n\n      A dictionary of the module parameter gradients.\n\n\n\n   forward(X_main, X_skip=None)\n\n\n\n      Compute the module output on a single minibatch.\n\n\n\n      Parameters:\n\n         * **X_main** (\"ndarray\" of shape *(n_ex, in_rows, in_cols,\n\n           in_ch)*) -- The input volume consisting of *n_ex* examples,\n\n           each with dimension (*in_rows*, *in_cols*, *in_ch*).\n\n\n\n         * **X_skip** (\"ndarray\" of shape *(n_ex, in_rows, in_cols,\n\n           in_ch)*, or None) -- The output of the preceding skip-\n\n           connection if this is not the first module in the network.\n\n\n\n      Returns:\n\n         * **Y_main** (\"ndarray\" of shape *(n_ex, out_rows, out_cols,\n\n           out_ch)*) -- The output of the main pathway.\n\n\n\n         * **Y_skip** (\"ndarray\" of shape *(n_ex, out_rows, out_cols,\n\n           out_ch)*) -- The output of the skip-connection pathway.\n\n\n\n   backward(dY_skip, dY_main=None)\n", "class_name": "numpy_ml.neural_nets.modules.WavenetResidualModule", "class_link": "numpy_ml/neural_nets/modules/modules.py#L119-L357", "test_file_path": "numpy_ml/tests/test_WavenetResidualModule.py"}