Ibotta

Ibotta

Ibotta, Inc. is an American mobile technology company headquartered in Denver, Colorado. Founded in 2011, the company offers cash back rewards on various purchases through its Ibotta Performance Network and direct to consumer app. Ibotta partners with CPG (consumer packaged goods) brands and network publishers to provide these rewards. As of 2024, the company operates solely in the United States. The company's rewards-as-a-service offering, the Ibotta Performance Network, went live in 2022. In August 2019, Ibotta received a $1 billion valuation after its Series D funding, and in 2023, the company surpassed $1.5 billion cash rewards paid to over 50 million consumers since the company's founding. Ibotta became a publicly traded company in April 2024 with a listing on the New York Stock Exchange. As of September 2025, Ibotta is trading at approximately $27.13 per share, marking a 69% decline from its initial public offering price of $88 per share on April 18, 2024. == History == === Founding through early 2019 === Ibotta was founded by current CEO Bryan Leach. The company was incorporated in 2011 and the app launched to both the App Store and Google Play stores in 2012. Early investors included entrepreneur and computer scientist Jim Clark and Tom “TJ” Jermoluk, Chairman of @Home Network. In 2015, Ibotta expanded beyond item level grocery, adding the ability to get cash back on in-store retail purchases. In 2016, in-app mobile commerce began, allowing users to navigate from the Ibotta app to its partners' apps to earn cash back on purchases. In 2016 with a Series C investment, Ibotta had raised over $73 million in funding. In March of that year, Ibotta partnered with Anheuser-Busch to offer cash back for adults who purchased its products. In May, the company partnered with LiveRamp so that companies could use their CRM data to create segmented, personalized campaigns. At the time, the company had around 200 full- and part-time employees and moved from offices in Lower Downtown Denver (LoDo) to a 40,000-square-foot office in the central Denver business district. A year later, the company had to expand to a second floor as it added almost another 100 employees. In 2017, Ibotta added cash back for Uber to its app as well as cash back rewards for online and mobile purchases. In 2018, Ibotta was listed on the Inc. 5,000 list as one of the fastest growing private companies in the U.S. A year later, in January 2019, the Ibotta app had been downloaded more than 30 million times with users receiving a reported $500 million in cash back rewards. That year, Ibotta was the largest mobile company in Colorado with six million monthly active users. === August 2019 to present === In August 2019, Ibotta was valued at $1 billion, following a Series D round of funding. The round was led by Koch Disruptive Technologies, a subsidiary of Koch Industries. 2019 was also the year the company introduced Pay with Ibotta, which allowed users to complete purchases at key retailers on the Ibotta app and earn instant cash back in the process. With that new service, users were able to enter their purchase total and use a QR code to checkout and receive immediate cash back. In 2020, the company partnered with Trees for the Future to plant up to 1 million trees as part of an Earth Month campaign to raise awareness about the waste of unused paper coupons. In response to the COVID-19 pandemic, Ibotta partnered with CPG brands in their “Here to Help” campaign and together committed over $10 million in cash back to American consumers. The company added the ability to earn cash back from online grocery pick-up and delivery orders. Later that year, Ibotta started its free Thanksgiving program, providing users with 100% cash back on select groceries needed for a Thanksgiving meal. By 2022, the company had provided approximately 10 million Thanksgiving meals. In 2021, Ibotta acquired the company OctoShop (originally InStok), a shopping browser extension company. The OctoShop app enables users to compare prices across stores and set restock and price-drop alerts. In April 2022, the Ibotta Performance Network (IPN) was launched. The IPN allows brands to deliver digital offers to consumers through third party publishers. Retailers including Walmart, Dollar General and Family Dollar, food delivery services including Instacart, and convenience stores including Shell are all part of the Ibotta Performance Network. This pay-per-sales or success-based performance network reaches over 200 million consumers. On April 18, 2024, Ibotta had its initial public offering (IPO), trading on the New York Stock Exchange (NYSE) under the ticker symbol IBTA. It was the largest technology IPO in Colorado history. In October 2025, Ibotta announced a partnership with technology and analytics company Circana, integrating Circana's Household Lift measurement into Ibotta campaigns to give CPG brands an increased understanding of the impact of their promotional campaigns. On November 3, 2025, Ibotta launched LiveLift, a tool for companies to measure the return on investment of digital promotions, in order to optimize performance marketing goals. === Athletic partnerships === Ibotta became the official jersey patch partner of the New Orleans Pelicans, a professional men's basketball team in the National Basketball Association (NBA), for the 2020–2021 and 2023–2024 seasons. Ibotta became the official jersey patch partner of the 2023 NBA champion Denver Nuggets baskeetball team beginning in the 2023–2024 season. In March 2023, F1 driver Logan Sargeant, the first U.S. racer to compete in F1 since 2015, partnered with Ibotta. The Ibotta logo was displayed on Sargeant's racing helmet throughout his F1 career. In June 2023, UConn Huskies women's basketball player Paige Bueckers entered into a "name, image, and likeness" (NIL) promotional agreement with Ibotta. According to a press release by Ibotta, the company has agreements with The Brandr Group, which finds NIL opportunities for women college athletes, and the Pearpop social media marketing platform to promote Ibotta. == Legal issues == In April 2025, shareholders filed a class action lawsuit—Fortune v. Ibotta, Inc., in the U.S. District Court for the District of Colorado (Case No. 25-cv-01213)—alleging that the registration statement in connection with Ibotta’s April 2024 initial public offering omitted material information. The complaint claims that, although Ibotta disclosed detailed terms for its contract with Walmart Inc., it failed to warn investors that its agreement with The Kroger Co., its second-largest client, was terminable at will and thus could be canceled without warning, creating a misleading impression of stability.

Color balance

In photography and image processing, color balance is the global adjustment of the intensities of the colors (typically red, green, and blue primary colors). An important goal of this adjustment is to render specific colors – particularly neutral colors like white or grey – correctly. Hence, the general method is sometimes called gray balance, neutral balance, or white balance. Color balance changes the overall mixture of colors in an image and is used for color correction. Generalized versions of color balance are used to correct colors other than neutrals or to deliberately change them for effect. White balance is one of the most common kinds of balancing, and is when colors are adjusted to make a white object (such as a piece of paper or a wall) appear white and not a shade of any other colour. Image data acquired by sensors – either film or electronic image sensors – must be transformed from the acquired values to new values that are appropriate for color reproduction or display. Several aspects of the acquisition and display process make such color correction essential – including that the acquisition sensors do not match the sensors in the human eye, that the properties of the display medium must be accounted for, and that the ambient viewing conditions of the acquisition differ from the display viewing conditions. The color balance operations in popular image editing applications usually operate directly on the red, green, and blue channel pixel values, without respect to any color sensing or reproduction model. In film photography, color balance is typically achieved by using color correction filters over the lights or on the camera lens. == Generalized color balance == Sometimes the adjustment to keep neutrals neutral is called white balance, and the phrase color balance refers to the adjustment that in addition makes other colors in a displayed image appear to have the same general appearance as the colors in an original scene. It is particularly important that neutral (gray, neutral, white) colors in a scene appear neutral in the reproduction. === Psychological color balance === Humans relate to flesh tones more critically than other colors. Trees, grass and sky can all be off without concern, but if human flesh tones are 'off' then the human subject can look sick or dead. To address this critical color balance issue, the tri-color primaries themselves are formulated to not balance as a true neutral color. The purpose of this color primary imbalance is to more faithfully reproduce the flesh tones through the entire brightness range. == Illuminant estimation and adaptation == Most digital cameras have means to select color correction based on the type of scene lighting, using either manual lighting selection, automatic white balance, or custom white balance. The algorithms for these processes perform generalized chromatic adaptation. Many methods exist for color balancing. Setting a button on a camera is a way for the user to indicate to the processor the nature of the scene lighting. Another option on some cameras is a button which one may press when the camera is pointed at a gray card or other neutral colored object. This captures an image of the ambient light, which enables a digital camera to set the correct color balance for that light. There is a large literature on how one might estimate the ambient lighting from the camera data and then use this information to transform the image data. A variety of algorithms have been proposed, and the quality of these has been debated. A few examples and examination of the references therein will lead the reader to many others. Examples are Retinex, an artificial neural network or a Bayesian method. == Chromatic colors == Color balancing an image affects not only the neutrals, but other colors as well. An image that is not color balanced is said to have a color cast, as everything in the image appears to have been shifted towards one color. Color balancing may be thought in terms of removing this color cast. Color balance is also related to color constancy. Algorithms and techniques used to attain color constancy are frequently used for color balancing, as well. Color constancy is, in turn, related to chromatic adaptation. Conceptually, color balancing consists of two steps: first, determining the illuminant under which an image was captured; and second, scaling the components (e.g., R, G, and B) of the image or otherwise transforming the components so they conform to the viewing illuminant. Viggiano found that white balancing in the camera's native RGB color model tended to produce less color inconstancy (i.e., less distortion of the colors) than in monitor RGB for over 4000 hypothetical sets of camera sensitivities. This difference typically amounted to a factor of more than two in favor of camera RGB. This means that it is advantageous to get color balance right at the time an image is captured, rather than edit later on a monitor. If one must color balance later, balancing the raw image data will tend to produce less distortion of chromatic colors than balancing in monitor RGB. == Mathematics of color balance == Color balancing is sometimes performed on a three-component image (e.g., RGB) using a 3x3 matrix. This type of transformation is appropriate if the image was captured using the wrong white balance setting on a digital camera, or through a color filter. Changing the color balance of an image can improve classifier results on a trained ML model. === Scaling monitor R, G, and B === In principle, one wants to scale all relative luminances in an image so that objects which are believed to be neutral appear so. If, say, a surface with R = 240 {\displaystyle R=240} was believed to be a white object, and if 255 is the count which corresponds to white, one could multiply all red values by 255/240. Doing analogously for green and blue would result, at least in theory, in a color balanced image. In this type of transformation the 3x3 matrix is a diagonal matrix. [ R G B ] = [ 255 / R w ′ 0 0 0 255 / G w ′ 0 0 0 255 / B w ′ ] [ R ′ G ′ B ′ ] {\displaystyle \left[{\begin{array}{c}R\\G\\B\end{array}}\right]=\left[{\begin{array}{ccc}255/R'_{w}&0&0\\0&255/G'_{w}&0\\0&0&255/B'_{w}\end{array}}\right]\left[{\begin{array}{c}R'\\G'\\B'\end{array}}\right]} where R {\displaystyle R} , G {\displaystyle G} , and B {\displaystyle B} are the color balanced red, green, and blue components of a pixel in the image; R ′ {\displaystyle R'} , G ′ {\displaystyle G'} , and B ′ {\displaystyle B'} are the red, green, and blue components of the image before color balancing, and R w ′ {\displaystyle R'_{w}} , G w ′ {\displaystyle G'_{w}} , and B w ′ {\displaystyle B'_{w}} are the red, green, and blue components of a pixel which is believed to be a white surface in the image before color balancing. This is a simple scaling of the red, green, and blue channels, and is why color balance tools in Photoshop have a white eyedropper tool. It has been demonstrated that performing the white balancing in the phosphor set assumed by sRGB tends to produce large errors in chromatic colors, even though it can render the neutral surfaces perfectly neutral. === Scaling X, Y, Z === If the image may be transformed into CIE XYZ tristimulus values, the color balancing may be performed there. This has been termed a "wrong von Kries" transformation. Although it has been demonstrated to offer usually poorer results than balancing in monitor RGB, it is mentioned here as a bridge to other things. Mathematically, one computes: [ X Y Z ] = [ X w / X w ′ 0 0 0 Y w / Y w ′ 0 0 0 Z w / Z w ′ ] [ X ′ Y ′ Z ′ ] {\displaystyle \left[{\begin{array}{c}X\\Y\\Z\end{array}}\right]=\left[{\begin{array}{ccc}X_{w}/X'_{w}&0&0\\0&Y_{w}/Y'_{w}&0\\0&0&Z_{w}/Z'_{w}\end{array}}\right]\left[{\begin{array}{c}X'\\Y'\\Z'\end{array}}\right]} where X {\displaystyle X} , Y {\displaystyle Y} , and Z {\displaystyle Z} are the color-balanced tristimulus values; X w {\displaystyle X_{w}} , Y w {\displaystyle Y_{w}} , and Z w {\displaystyle Z_{w}} are the tristimulus values of the viewing illuminant (the white point to which the image is being transformed to conform to); X w ′ {\displaystyle X'_{w}} , Y w ′ {\displaystyle Y'_{w}} , and Z w ′ {\displaystyle Z'_{w}} are the tristimulus values of an object believed to be white in the un-color-balanced image, and X ′ {\displaystyle X'} , Y ′ {\displaystyle Y'} , and Z ′ {\displaystyle Z'} are the tristimulus values of a pixel in the un-color-balanced image. If the tristimulus values of the monitor primaries are in a matrix P {\displaystyle \mathbf {P} } so that: [ X Y Z ] = P [ L R L G L B ] {\displaystyle \left[{\begin{array}{c}X\\Y\\Z\end{array}}\right]=\mathbf {P} \left[{\begin{array}{c}L_{R}\\L_{G}\\L_{B}\end{array}}\right]} where L R {\displaystyle L_{R}} , L G {\displaystyle L_{G}} , and L B {\displaystyle L_{B}} are the un-gamma corrected monitor RGB, one may use: [ L R L G L B ] = P − 1 [ X w / X w ′ 0 0

K-nearest neighbors algorithm

In statistics, the k-nearest neighbors algorithm (k-NN) is a non-parametric supervised learning method. It was first developed by Evelyn Fix and Joseph Hodges in 1951, and later expanded by Thomas Cover. In classification, a new example is assigned a label based on the labels of its k nearest training examples; in regression, the prediction is computed from the values of those neighbors. Most often, it is used for classification, as a k-NN classifier, the output of which is a class membership. An object is classified by a plurality vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of that single nearest neighbor. The k-NN algorithm can also be generalized for regression. In k-NN regression, also known as nearest neighbor smoothing, the output is the property value for the object. This value is the average of the values of k nearest neighbors. If k = 1, then the output is simply assigned to the value of that single nearest neighbor, also known as nearest neighbor interpolation. For both classification and regression, a useful technique can be to assign weights to the contributions of the neighbors, so that nearer neighbors contribute more to the average than distant ones. For example, a common weighting scheme consists of giving each neighbor a weight of 1/d, where d is the distance to the neighbor. The input consists of the k closest training examples in a data set. The neighbors are taken from a set of objects for which the class (for k-NN classification) or the object property value (for k-NN regression) is known. This can be thought of as the training set for the algorithm, though no explicit training step is required. A peculiarity (sometimes even a disadvantage) of the k-NN algorithm is its sensitivity to the local structure of the data. In k-NN classification the function is only approximated locally and all computation is deferred until function evaluation. Since this algorithm relies on distance, if the features represent different physical units or come in vastly different scales, then feature-wise normalizing of the training data can greatly improve its accuracy. == Statistical setting == Suppose we have pairs ( X 1 , Y 1 ) , ( X 2 , Y 2 ) , … , ( X n , Y n ) {\displaystyle (X_{1},Y_{1}),(X_{2},Y_{2}),\dots ,(X_{n},Y_{n})} taking values in R d × { 1 , 2 } {\displaystyle \mathbb {R} ^{d}\times \{1,2\}} , where Y is the class label of X, so that X | Y = r ∼ P r {\displaystyle X|Y=r\sim P_{r}} for r = 1 , 2 {\displaystyle r=1,2} (and probability distributions P r {\displaystyle P_{r}} ). Given some norm ‖ ⋅ ‖ {\displaystyle \|\cdot \|} on R d {\displaystyle \mathbb {R} ^{d}} and a point x ∈ R d {\displaystyle x\in \mathbb {R} ^{d}} , let ( X ( 1 ) , Y ( 1 ) ) , … , ( X ( n ) , Y ( n ) ) {\displaystyle (X_{(1)},Y_{(1)}),\dots ,(X_{(n)},Y_{(n)})} be a reordering of the training data such that ‖ X ( 1 ) − x ‖ ≤ ⋯ ≤ ‖ X ( n ) − x ‖ {\displaystyle \|X_{(1)}-x\|\leq \dots \leq \|X_{(n)}-x\|} . == Algorithm == The training examples are vectors in a multidimensional feature space, each with a class label. The training phase of the algorithm consists only of storing the feature vectors and class labels of the training samples. In the classification phase, k is a user-defined constant, and an unlabeled vector (a query or test point) is classified by assigning the label which is most frequent among the k training samples nearest to that query point. A commonly used distance metric for continuous variables is Euclidean distance. For discrete variables, such as for text classification, another metric can be used, such as the overlap metric (or Hamming distance). In the context of gene expression microarray data, for example, k-NN has been employed with correlation coefficients, such as Pearson and Spearman, as a metric. Often, the classification accuracy of k-NN can be improved significantly if the distance metric is learned with specialized algorithms such as large margin nearest neighbor or neighborhood components analysis. A drawback of the basic "majority voting" classification occurs when the class distribution is skewed. That is, examples of a more frequent class tend to dominate the prediction of the new example, because they tend to be common among the k nearest neighbors due to their large number. One way to overcome this problem is to weight the classification, taking into account the distance from the test point to each of its k nearest neighbors. The class (or value, in regression problems) of each of the k nearest points is multiplied by a weight proportional to the inverse of the distance from that point to the test point. Another way to overcome skew is by abstraction in data representation. For example, in a self-organizing map (SOM), each node is a representative (a center) of a cluster of similar points, regardless of their density in the original training data. k-NN can then be applied to the SOM. == Parameter selection == The best choice of k depends upon the data; generally, larger values of k reduces effect of the noise on the classification, but make boundaries between classes less distinct. A good k can be selected by various heuristic techniques (see hyperparameter optimization). The special case where the class is predicted to be the class of the closest training sample (i.e. when k = 1) is called the nearest neighbor algorithm. The accuracy of the k-NN algorithm can be severely degraded by the presence of noisy or irrelevant features, or if the feature scales are not consistent with their importance. Much research effort has been put into selecting or scaling features to improve classification. A particularly popular approach is the use of evolutionary algorithms to optimize feature scaling. Another popular approach is to scale features by the mutual information of the training data with the training classes. In binary (two class) classification problems, it is helpful to choose k to be an odd number as this avoids tied votes. One popular way of choosing the empirically optimal k in this setting is via bootstrap method. == The 1-nearest neighbor classifier == The most intuitive nearest neighbour type classifier is the one nearest neighbour classifier that assigns a point x to the class of its closest neighbour in the feature space, that is C n 1 n n ( x ) = Y ( 1 ) {\displaystyle C_{n}^{1nn}(x)=Y_{(1)}} . As the size of training data set approaches infinity, the one nearest neighbour classifier guarantees an error rate of no worse than twice the Bayes error rate (the minimum achievable error rate given the distribution of the data). == The weighted nearest neighbour classifier == The k-nearest neighbour classifier can be viewed as assigning the k nearest neighbours a weight 1 / k {\displaystyle 1/k} and all others 0 weight. This can be generalised to weighted nearest neighbour classifiers. That is, where the ith nearest neighbour is assigned a weight w n i {\displaystyle w_{ni}} , with ∑ i = 1 n w n i = 1 {\textstyle \sum _{i=1}^{n}w_{ni}=1} . An analogous result on the strong consistency of weighted nearest neighbour classifiers also holds. Let C n w n n {\displaystyle C_{n}^{wnn}} denote the weighted nearest classifier with weights { w n i } i = 1 n {\displaystyle \{w_{ni}\}_{i=1}^{n}} . Subject to regularity conditions, which in asymptotic theory are conditional variables which require assumptions to differentiate among parameters with some criteria. On the class distributions the excess risk has the following asymptotic expansion R R ( C n w n n ) − R R ( C Bayes ) = ( B 1 s n 2 + B 2 t n 2 ) { 1 + o ( 1 ) } , {\displaystyle {\mathcal {R}}_{\mathcal {R}}(C_{n}^{wnn})-{\mathcal {R}}_{\mathcal {R}}(C^{\text{Bayes}})=\left(B_{1}s_{n}^{2}+B_{2}t_{n}^{2}\right)\{1+o(1)\},} for constants B 1 {\displaystyle B_{1}} and B 2 {\displaystyle B_{2}} where s n 2 = ∑ i = 1 n w n i 2 {\displaystyle s_{n}^{2}=\sum _{i=1}^{n}w_{ni}^{2}} and t n = n − 2 / d ∑ i = 1 n w n i { i 1 + 2 / d − ( i − 1 ) 1 + 2 / d } {\displaystyle t_{n}=n^{-2/d}\sum _{i=1}^{n}w_{ni}\left\{i^{1+2/d}-(i-1)^{1+2/d}\right\}} . The optimal weighting scheme { w n i ∗ } i = 1 n {\displaystyle \{w_{ni}^{}\}_{i=1}^{n}} , that balances the two terms in the display above, is given as follows: set k ∗ = ⌊ B n 4 d + 4 ⌋ {\displaystyle k^{}=\lfloor Bn^{\frac {4}{d+4}}\rfloor } , w n i ∗ = 1 k ∗ [ 1 + d 2 − d 2 k ∗ 2 / d { i 1 + 2 / d − ( i − 1 ) 1 + 2 / d } ] {\displaystyle w_{ni}^{}={\frac {1}{k^{}}}\left[1+{\frac {d}{2}}-{\frac {d}{2{k^{}}^{2/d}}}\{i^{1+2/d}-(i-1)^{1+2/d}\}\right]} for i = 1 , 2 , … , k ∗ {\displaystyle i=1,2,\dots ,k^{}} and w n i ∗ = 0 {\displaystyle w_{ni}^{}=0} for i = k ∗ + 1 , … , n {\displaystyle i=k^{}+1,\dots ,n} . With optimal weights the dominant term in the asymptotic expansion of the excess risk is O ( n − 4 d + 4 ) {\displaystyle {\mathcal {O}}(n^{-{\frac {4}{d+4}}})}

Sharpness aware minimization

Sharpness Aware Minimization (SAM) is an optimization algorithm used in machine learning that aims to improve model generalization. The method seeks to find model parameters that are located in regions of the loss landscape with uniformly low loss values, rather than parameters that only achieve a minimal loss value at a single point. This approach is described as finding "flat" minima instead of "sharp" ones. The rationale is that models trained this way are less sensitive to variations between training and test data, which can lead to better performance on unseen data. The algorithm was introduced in a 2020 paper by a team of researchers including Pierre Foret, Ariel Kleiner, Hossein Mobahi, and Behnam Neyshabur. == Underlying Principle == SAM modifies the standard training objective by minimizing a "sharpness-aware" loss. This is formulated as a minimax problem where the inner objective seeks to find the highest loss value in the immediate neighborhood of the current model weights, and the outer objective minimizes this value: min w max ‖ ϵ ‖ p ≤ ρ L train ( w + ϵ ) + λ ‖ w ‖ 2 2 {\displaystyle \min _{w}\max _{\|\epsilon \|_{p}\leq \rho }L_{\text{train}}(w+\epsilon )+\lambda \|w\|_{2}^{2}} In this formulation: w {\displaystyle w} represents the model's parameters (weights). L train {\displaystyle L_{\text{train}}} is the loss calculated on the training data. ϵ {\displaystyle \epsilon } is a perturbation applied to the weights. ρ {\displaystyle \rho } is a hyperparameter that defines the radius of the neighborhood (an L p {\displaystyle L_{p}} ball) to search for the highest loss. An optional L2 regularization term, scaled by λ {\displaystyle \lambda } , can be included. A direct solution to the inner maximization problem is computationally expensive. SAM approximates it by taking a single gradient ascent step to find the perturbation ϵ {\displaystyle \epsilon } . This is calculated as: ϵ ( w ) = ρ ∇ L train ( w ) ‖ ∇ L train ( w ) ‖ 2 {\displaystyle \epsilon (w)=\rho {\frac {\nabla L_{\text{train}}(w)}{\|\nabla L_{\text{train}}(w)\|_{2}}}} The optimization process for each training step involves two stages. First, an "ascent step" computes a perturbed set of weights, w adv = w + ϵ ( w ) {\displaystyle w_{\text{adv}}=w+\epsilon (w)} , by moving towards the direction of the highest local loss. Second, a "descent step" updates the original weights w {\displaystyle w} using the gradient calculated at these perturbed weights, ∇ L train ( w adv ) {\displaystyle \nabla L_{\text{train}}(w_{\text{adv}})} . This update is typically performed using a standard optimizer like SGD or Adam. == Application and Performance == SAM has been applied in various machine learning contexts, primarily in computer vision. Research has shown it can improve generalization performance in models such as Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) on image datasets including ImageNet, CIFAR-10, and CIFAR-100. The algorithm has also been found to be effective in training models with noisy labels, where it performs comparably to methods designed specifically for this problem. Some studies indicate that SAM and its variants can improve out-of-distribution (OOD) generalization, which is a model's ability to perform well on data from distributions not seen during training. Other areas where it has been applied include gradual domain adaptation and mitigating overfitting in scenarios with repeated exposure to training examples. == Limitations == A primary limitation of SAM is its computational cost. By requiring two gradient computations (one for the ascent and one for the descent) per optimization step, it approximately doubles the training time compared to standard optimizers. The theoretical convergence properties of SAM are still under investigation. Some research suggests that with a constant step size, SAM may not converge to a stationary point. The accuracy of the single gradient step approximation for finding the worst-case perturbation may also decrease during the training process. The effectiveness of SAM can also be domain-dependent. While it has shown benefits for computer vision tasks, its impact on other areas, such as GPT-style language models where each training example is seen only once, has been reported as limited in some studies. Furthermore, while SAM seeks flat minima, some research suggests that not all flat minima necessarily lead to good generalization. The algorithm also introduces the neighborhood size ρ {\displaystyle \rho } as a new hyperparameter, which requires tuning. == Research, Variants, and Enhancements == Active research on SAM focuses on reducing its computational overhead and improving its performance. Several variants have been proposed to make the algorithm more efficient. These include methods that attempt to parallelize the two gradient computations, apply the perturbation to only a subset of parameters, or reduce the number of computation steps required. Other approaches use historical gradient information or apply SAM steps intermittently to lower the computational burden. To improve performance and robustness, variants have been developed that adapt the neighborhood size based on model parameter scales (Adaptive SAM or ASAM) or incorporate information about the curvature of the loss landscape (Curvature Regularized SAM or CR-SAM). Other research explores refining the perturbation step by focusing on specific components of the gradient or combining SAM with techniques like random smoothing. Theoretical work continues to analyze the algorithm's behavior, including its implicit bias towards flatter minima and the development of broader frameworks for sharpness-aware optimization that use different measures of sharpness.

Generalized canonical correlation

In statistics, the generalized canonical correlation analysis (gCCA), is a way of making sense of cross-correlation matrices between the sets of random variables when there are more than two sets. While a conventional CCA generalizes principal component analysis (PCA) to two sets of random variables, a gCCA generalizes PCA to more than two sets of random variables. The canonical variables represent those common factors that can be found by a large PCA of all of the transformed random variables after each set underwent its own PCA. == Applications == The Helmert-Wolf blocking (HWB) method of estimating linear regression parameters can find an optimal solution only if all cross-correlations between the data blocks are zero. They can always be made to vanish by introducing a new regression parameter for each common factor. The gCCA method can be used for finding those harmful common factors that create cross-correlation between the blocks. However, no optimal HWB solution exists if the random variables do not contain enough information on all of the new regression parameters.

Database index

A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure. Indexes are used to quickly locate data without having to search every row in a database table every time said table is accessed. Indexes can be created using one or more columns of a database table, providing the basis for both rapid random lookups and efficient access of ordered records. An index is a copy of selected columns of data, from a table, that is designed to enable very efficient search. An index normally includes a "key" or direct link to the original row of data from which it was copied, to allow the complete row to be retrieved efficiently. Some databases extend the power of indexing by letting developers create indexes on column values that have been transformed by functions or expressions. For example, an index could be created on upper(last_name), which would only store the upper-case versions of the last_name field in the index. Another option sometimes supported is the use of partial index, where index entries are created only for those records that satisfy some conditional expression. A further aspect of flexibility is to permit indexing on user-defined functions, as well as expressions formed from an assortment of built-in functions. == Usage == === Support for fast lookup === Most database software includes indexing technology that enables sub-linear time lookup to improve performance, as linear search is inefficient for large databases. Suppose a database contains N data items and one must be retrieved based on the value of one of the fields. A simple implementation retrieves and examines each item according to the test. If there is only one matching item, this can stop when it finds that single item, but if there are multiple matches, it must test everything. This means that the number of operations in the average case is O(N) or linear time. Since databases may contain many objects, and since lookup is a common operation, it is often desirable to improve performance. An index is any data structure that improves the performance of lookup. There are many different data structures used for this purpose. There are complex design trade-offs involving lookup performance, index size, and index-update performance. Many index designs exhibit logarithmic (O(log(N))) lookup performance and in some applications it is possible to achieve flat (O(1)) performance. === Policing the database constraints === Indexes are used to police database constraints, such as UNIQUE, EXCLUSION, PRIMARY KEY and FOREIGN KEY. An index may be declared as UNIQUE, which creates an implicit constraint on the underlying table. Database systems usually implicitly create an index on a set of columns declared PRIMARY KEY, and some are capable of using an already-existing index to police this constraint. Many database systems require that both referencing and referenced sets of columns in a FOREIGN KEY constraint are indexed, thus improving performance of inserts, updates and deletes to the tables participating in the constraint. Some database systems support an EXCLUSION constraint that ensures that, for a newly inserted or updated record, a certain predicate holds for no other record. This can be used to implement a UNIQUE constraint (with equality predicate) or more complex constraints, like ensuring that no overlapping time ranges or no intersecting geometry objects would be stored in the table. An index supporting fast searching for records satisfying the predicate is required to police such a constraint. == Index architecture and indexing methods == === Non-clustered === The data is present in arbitrary order, but the logical ordering is specified by the index. The data rows may be spread throughout the table regardless of the value of the indexed column or expression. The non-clustered index tree contains the index keys in sorted order, with the leaf level of the index containing the pointer to the record (page and the row number in the data page in page-organized engines; row offset in file-organized engines). In a non-clustered index, The physical order of the rows is not the same as the index order. The indexed columns are typically non-primary key columns used in JOIN, WHERE, and ORDER BY clauses. There can be more than one non-clustered index on a database table. === Clustered === Clustering alters the data block into a certain distinct order to match the index, resulting in the row data being stored in order. Therefore, only one clustered index can be created on a given database table. Clustered indexes can greatly increase overall speed of retrieval, but usually only where the data is accessed sequentially in the same or reverse order of the clustered index, or when a range of items is selected. Since the physical records are in this sort order on disk, the next row item in the sequence is immediately before or after the last one, and so fewer data block reads are required. The primary feature of a clustered index is therefore the ordering of the physical data rows in accordance with the index blocks that point to them. Some databases separate the data and index blocks into separate files, others put two completely different data blocks within the same physical file(s). === Cluster === When multiple databases and multiple tables are joined, it is called a cluster (not to be confused with clustered index described previously). The records for the tables sharing the value of a cluster key shall be stored together in the same or nearby data blocks. This may improve the joins of these tables on the cluster key, since the matching records are stored together and less I/O is required to locate them. The cluster configuration defines the data layout in the tables that are parts of the cluster. A cluster can be keyed with a B-tree index or a hash table. The data block where the table record is stored is defined by the value of the cluster key. == Column order == The order that the index definition defines the columns in is important. It is possible to retrieve a set of row identifiers using only the first indexed column. However, it is not possible or efficient (on most databases) to retrieve the set of row identifiers using only the second or greater indexed column. For example, in a phone book organized by city first, then by last name, and then by first name, in a particular city, one can easily extract the list of all phone numbers. However, it would be very tedious to find all the phone numbers for a particular last name. One would have to look within each city's section for the entries with that last name. Some databases can do this, others just won't use the index. In the phone book example with a composite index created on the columns (city, last_name, first_name), if we search by giving exact values for all the three fields, search time is minimal—but if we provide the values for city and first_name only, the search uses only the city field to retrieve all matched records. Then a sequential lookup checks the matching with first_name. So, to improve the performance, one must ensure that the index is created on the order of search columns. == Applications and limitations == Indexes are useful for many applications but come with some limitations. Consider the following SQL statement: SELECT first_name FROM people WHERE last_name = 'Smith';. To process this statement without an index the database software must look at the last_name column on every row in the table (this is known as a full table scan). With an index the database simply follows the index data structure (typically a B-tree) until the Smith entry has been found; this is much less computationally expensive than a full table scan. Consider this SQL statement: SELECT email_address FROM customers WHERE email_address LIKE '%@wikipedia.org';. This query would yield an email address for every customer whose email address ends with "@wikipedia.org", but even if the email_address column has been indexed the database must perform a full index scan. This is because the index is built with the assumption that words go from left to right. With a wildcard at the beginning of the search-term, the database software is unable to use the underlying index data structure (in other words, the WHERE-clause is not sargable). This problem can be solved through the addition of another index created on reverse(email_address) and a SQL query like this: SELECT email_address FROM customers WHERE reverse(email_address) LIKE reverse('%@wikipedia.org');. This puts the wild-card at the right-most part of the query (now gro.aidepikiw@%), which the index on reverse(email_address) can satisfy. When the wildcard characters are used on both sides of the search word as %wikipedia.org%, the index available on this field is not used. Rather only a sequential search is performed, which takes ⁠ O ( N ) {\displaystyle

Iris flower data set

The Iris flower data set or Fisher's Iris data set is a multivariate data set used and made famous by the British statistician and biologist Ronald Fisher in his 1936 paper The use of multiple measurements in taxonomic problems as an example of linear discriminant analysis. It is sometimes called Anderson's Iris data set because Edgar Anderson collected the data to quantify the morphologic variation of Iris flowers of three related species. Two of the three species were collected in the Gaspé Peninsula "all from the same pasture, and picked on the same day and measured at the same time by the same person with the same apparatus". The data set consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor). Four features were measured from each sample: the length and the width of the sepals and petals, in centimeters. Based on the combination of these four features, Fisher developed a linear discriminant model to distinguish each species. Fisher's paper was published in the Annals of Eugenics (today the Annals of Human Genetics). == Use of the data set == Originally used as an example data set on which Fisher's linear discriminant analysis was applied, it became a typical test case for many statistical classification techniques in machine learning such as support vector machines. The use of this data set in cluster analysis however is not common, since the data set only contains two clusters with rather obvious separation. One of the clusters contains Iris setosa, while the other cluster contains both Iris virginica and Iris versicolor and is not separable without the species information Fisher used. This makes the data set a good example to explain the difference between supervised and unsupervised techniques in data mining: Fisher's linear discriminant model can only be obtained when the object species are known: class labels and clusters are not necessarily the same. Nevertheless, all three species of Iris are separable in the projection on the nonlinear and branching principal component. The data set is approximated by the closest tree with some penalty for the excessive number of nodes, bending and stretching. Then the so-called "metro map" is constructed. The data points are projected into the closest node. For each node the pie diagram of the projected points is prepared. The area of the pie is proportional to the number of the projected points. It is clear from the diagram (left) that the absolute majority of the samples of the different Iris species belong to the different nodes. Only a small fraction of Iris-virginica is mixed with Iris-versicolor (the mixed blue-green nodes in the diagram). Therefore, the three species of Iris (Iris setosa, Iris virginica and Iris versicolor) are separable by the unsupervising procedures of nonlinear principal component analysis. To discriminate them, it is sufficient just to select the corresponding nodes on the principal tree. == Data set == The data set contains a set of 150 records under five attributes: sepal length, sepal width, petal length, petal width and species. The iris data set is widely used as a beginner's data set for machine learning purposes. The data set is included in R base and Python in the machine learning library scikit-learn, so that users can access it without having to find a source for it. Several versions of the data set have been published. === R code illustrating usage === The example R code shown below reproduce the scatterplot displayed at the top of this article: === Python code illustrating usage === This code gives: