Yandex Announces CatBoost, a New Open-Source Machine Learning Library
Gradient boosting is a form of machine learning that analyzes a wide range of data inputs. It works by progressively training more complex models to maximize the accuracy of predictions. CatBoost was developed to support a wide variety of data formats out-of-the-box. It is particularly powerful for data sets that contain categorical attributes like user IDs or variables that have a defined set of possible values, yielding accuracy unmatched by other machine learning algorithms. It is well-equipped to handle the complexity that accompanies a wide variety of business problems like detecting fraud, predicting customer engagement and ranking recommended items. CatBoost can be applied across a range of industries to solve problems like improving weather forecasting, fraud detection, industrial process optimization, and even improving the efficiency of particle physics research.
CatBoost delivers highly accurate results even in situations where there is relatively little data.
While deep learning frameworks typically require training on a massive amount of data, and work best with sensory data like images, audio or text, CatBoost works well with relatively small data sets in a variety of domains such as sensory, transactional or historical data, while supporting a wide range of data formats, including inputs provided by deep learning models.
CatBoost is the successor to MatrixNet, the machine learning algorithm that is widely used within Yandex for numerous ranking tasks, weather forecasting and making recommendations. Over the coming months, CatBoost will be rolled out across many of Yandex products and services. Users of our Yandex.Weather service, for example, will soon see even more precise minute-to-minute hyperlocal forecasting to help them better plan for quick weather changes.
In addition to its future application in Yandex products and services, Catboost is also used in the LHCb experiment at CERN, the European Organisation for Nuclear Research. «The state-of-the-art algorithm developed using Yandex's CatBoost has been deployed in LHCb to improve the performance of our particle identification subsystems,» said XXX. «Catboost will improve how efficiently we can identify charged particles, providing greater accuracy in the selection of our data».