QBoost with EQC Models Package for Dirac-3
Device: Dirac-3
Introduction
The QBoost algorithm is a classification method that leverages open quantum systems (such as QCi's Dirac-3 machine) to obtain superior speed and power consumption over classical approaches. QBoost is an adaptation of the classical boosting algorithm. Boosting is a powerful technique in machine learning that combines the output of several weak classifiers to produce a strong classifier. In classical boosting algorithms, such as AdaBoost, the weights of the weak classifiers are adjusted iteratively based on their performance, with the goal of minimizing the overall classification error.
Methodology
The idea is based on the concept of boosting. Let us assume that we have a collection of "weak" classifiers where . The goal is to construct a "strong" classifier as a linear superposition of these weak classifiers, that is,
where is a vector of input features and , and . The goal is to find an optimal set of s, weights associated with the weak classifiers.
Let us have a training set of size . We can determine optimal weights by minimizing,
where the regularization term penalizes non-zero weights; is the regularization coefficient.
where
and
subject to,
Note that the above algorithm assumes that the total number of weak classifiers, that is , is less than the number of available qudits on Dirac-3.
Data
We used the IRIS dataset to build a binary classifier using QBoost. The reader can refer to
https://scikit-learn.org/stable/auto_examples/datasets/plot_iris_dataset.html
and
https://en.wikipedia.org/wiki/Iris_flower_data_set
for more information on the IRIS dataset.
QBoost Implementation in QCI's EQC Models Package
The QCi's EQC Models package includes an implementation of the QBoost algorithm.
Setting up the environment
The user should ensure that QCI_TOKEN and QCI_API_URL are set as the algorithm needs access to QCi's Dirac-3 device.
Import libraries
We start by importing some libraries. This includes importing the QBoostClassifier instance from the EQC Models ML package.
In [ ]:
- import sys
- from collections import Counter
- import numpy as np
- import pandas as pd
- from sklearn import datasets
- from sklearn.preprocessing import MinMaxScaler
- from sklearn.model_selection import train_test_split
- from sklearn.metrics import (
- confusion_matrix,
- precision_score,
- recall_score,
- accuracy_score,
- )
- from eqc_models.ml.classifierqboost import QBoostClassifier
Setting parameters
The only parameter we need to set here is the train test split ratio. We have,
In [4]:
- TEST_SIZE = 0.2
Read Dataset
We can now load the IRIS data from scikit-learn dataset library. We have,
In [5]:
- iris = datasets.load_iris()
- X = iris.data
- y = iris.target
Pre-Processing of the data
We now scale the features using a min/max scaler, and the classes to either 1 or -1 labels. The data is then split into training and testing sets.
In [10]:
- scaler = MinMaxScaler()
- X = scaler.fit_transform(X)
- for i in range(len(y)):
- if y[i] == 0:
- y[i] = -1
- elif y[i] == 2:
- y[i] = 1
- X_train, X_test, y_train, y_test = train_test_split(
- X,
- y,
- test_size=TEST_SIZE,
- random_state=42,
- )
We can echo some statistics on the training and testing sets,
In [11]:
- print("Training data counts:")
- print(Counter(y_train))
- print("Testing data counts:")
- print(Counter(y_test))
Out [ ]:
Training data counts: Counter({1: 80, -1: 40}) Testing data counts: Counter({1: 20, -1: 10})
Train the model
We now instantiate the QBoost classifier object and train a model,
In [13]:
- obj = QBoostClassifier(
- relaxation_schedule=2,
- num_samples=1,
- lambda_coef=0.0,
- weak_cls_type="lg",
- )
- obj.fit(X_train, y_train)
Out [ ]:
Runtime of _build_weak_classifiers: 0.03 seconds! Built 10 weak classifiers! Runtime of get_hamiltonian: 0.03 seconds! 2024-07-30 11:49:28 - Dirac allocation balance = 0 s (unmetered) 2024-07-30 11:49:28 - Job submitted: job_id='66a935b827d9cb8ae30094e6' 2024-07-30 11:49:28 - RUNNING 2024-07-30 11:49:39 - COMPLETED 2024-07-30 11:49:41 - Dirac allocation balance = 0 s (unmetered) {'job_info': {'job_id': '66a935b827d9cb8ae30094e6', 'job_submission': {'problem_config': {'normalized_qudit_hamiltonian_optimization': {'polynomial_file_id': '66a935b811ecab62cc23eaa4'}}, 'device_config': {'dirac-3': {'num_samples': 1, 'relaxation_schedule': 2, 'solution_precision': 1, 'sum_constraint': 1}}}, 'job_status': {'submitted_at_rfc3339nano': '2024-07-30T18:49:28.822Z', 'queued_at_rfc3339nano': '2024-07-30T18:49:28.822Z', 'running_at_rfc3339nano': '2024-07-30T18:49:28.879Z', 'completed_at_rfc3339nano': '2024-07-30T18:49:37.142Z'}, 'job_result': {'file_id': '66a935c111ecab62cc23eaa6', 'device_usage_s': 8}}, 'status': 'COMPLETED', 'results': {'counts': [1], 'energies': [-120.0000305], 'solutions': [[1.1e-06, 0, 0.1428512, 0.1428512, 0, 0.1428512, 0.1428512, 0.1428512, 0.14287, 0.14287]], 'distilled_energies': [-120], 'distilled_solutions': [[0, 0, 0, 0, 0, 0, 0, 0, 0, 1]]}}
Out [13]:
<eqc_models.ml.classifierqboost.QBoostClassifier at 0x1394addb0>
Results
We can now use the model to get predictions on the training and testing datasets,
In [14]:
- y_train_prd = obj.predict(X_train)
- y_test_prd = obj.predict(X_test)
and use the predictions to calculate precision, recall, and accuracy.
In [16]:
- from sklearn.metrics import confusion_matrix, precision_score, recall_score, accuracy_score
- import matplotlib.pyplot as plt
- import seaborn as sn
- print(
- "Train precision:",
- precision_score(y_train, y_train_prd, labels=[-1, 1], pos_label=1),
- )
- print(
- "Train recall:",
- recall_score(y_train, y_train_prd, labels=[-1, 1], pos_label=1),
- )
- print(
- "Train accuracy:",
- accuracy_score(y_train, y_train_prd),
- )
- sn.set(font_scale=1.4)
- train_conf_mat = confusion_matrix(y_train, y_train_prd, labels=[-1, 1])
- sn.heatmap(train_conf_mat, annot=True, annot_kws={"size": 16})
- plt.show()
- print(
- "Test precision:",
- precision_score(y_test, y_test_prd, labels=[-1, 1], pos_label=1),
- )
- print(
- "Test recall:",
- recall_score(y_test, y_test_prd, labels=[-1, 1], pos_label=1),
- )
- print(
- "Test accuracy:",
- accuracy_score(y_test, y_test_prd),
- )
- test_conf_mat = confusion_matrix(y_test, y_test_prd, labels=[-1, 1])
- sn.heatmap(test_conf_mat, annot=True, annot_kws={"size": 16})
- plt.show()
Out [ ]:
Train precision: 1.0 Train recall: 1.0 Train accuracy: 1.0
Out [ ]:
<Figure size 640x480 with 2 Axes>
Out [ ]:
Test precision: 1.0 Test recall: 1.0 Test accuracy: 1.0
Out [ ]:
<Figure size 640x480 with 2 Axes>