Intel® oneAPI Data Analytics Library


Keywords
machine, learning, scikit-learn, data, science, analytics, ai-inference, ai-machine-learning, ai-training, big-data, data-analysis, gpu, intel, machine-learning, machine-learning-algorithms, oneapi, python, swrepo
License
Apache-2.0
Install
pip install daal4py==2024.3.0

Documentation

Intel(R) Extension for Scikit-learn*

Speed up your scikit-learn applications for Intel(R) CPUs and GPUs across single- and multi-node configurations

Releases   |   Documentation   |   Examples   |   Support   |  License   

Build Status Coverity Scan Build Status Join the community on GitHub Discussions PyPI Version Conda Version python version scikit-learn supported versions


Overview

Intel(R) Extension for Scikit-learn is a free software AI accelerator designed to deliver over 10-100X acceleration to your existing scikit-learn code. The software acceleration is achieved with vector instructions, AI hardware-specific memory optimizations, threading, and optimizations for all upcoming Intel(R) platforms at launch time.

With Intel(R) Extension for Scikit-learn, you can:

  • Speed up training and inference by up to 100x with the equivalent mathematical accuracy
  • Benefit from performance improvements across different Intel(R) hardware configurations
  • Integrate the extension into your existing Scikit-learn applications without code modifications
  • Continue to use the open-source scikit-learn API
  • Enable and disable the extension with a couple of lines of code or at the command line

Intel(R) Extension for Scikit-learn is also a part of Intel(R) AI Tools.

Acceleration

Benchmarks code

Intel(R) Optimizations

  • Enable Intel(R) CPU optimizations

    import numpy as np
    from sklearnex import patch_sklearn
    patch_sklearn()
    
    from sklearn.cluster import DBSCAN
    
    X = np.array([[1., 2.], [2., 2.], [2., 3.],
                [8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
    clustering = DBSCAN(eps=3, min_samples=2).fit(X)
  • Enable Intel(R) GPU optimizations

    import numpy as np
    import dpctl
    from sklearnex import patch_sklearn, config_context
    patch_sklearn()
    
    from sklearn.cluster import DBSCAN
    
    X = np.array([[1., 2.], [2., 2.], [2., 3.],
                [8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
    with config_context(target_offload="gpu:0"):
        clustering = DBSCAN(eps=3, min_samples=2).fit(X)

👀 Check out available notebooks for more examples.

Installation

To install Intel(R) Extension for Scikit-learn, run:

pip install scikit-learn-intelex

See all installation instructions in the Installation Guide.

Integration

The software acceleration is achieved through patching. It means, replacing the stock scikit-learn algorithms with their optimized versions provided by the extension.

The patching only affects supported algorithms and their parameters. You can still use not supported ones in your code, the package simply fallbacks into the stock version of scikit-learn.

TIP: Enable verbose mode to see which implementation of the algorithm is currently used.

To patch scikit-learn, you can:

  • Use the following command-line flag:
    python -m sklearnex my_application.py
    
  • Add the following lines to the script:
    from sklearnex import patch_sklearn
    patch_sklearn()
    

👀 Read about other ways to patch scikit-learn.

Documentation

daal4py and oneDAL

The acceleration is achieved through the use of the Intel(R) oneAPI Data Analytics Library (oneDAL). Learn more:

Samples & Examples

How to Contribute

We welcome community contributions, check our Contributing Guidelines to learn more.


* The Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.