Adding a New AI Model to MaixCAM MaixPy
Update history
Date | Version | Author | Update content |
---|---|---|---|
2024-11-01 | 1.0.0 | neucrack | Added migration documentation |
Introduction
Besides the built-in AI algorithms and models, MaixPy is highly extensible, allowing you to add your own algorithms and models.
Due to the prevalence of visual applications, this guide will be divided into sections for visual applications and other applications.
Adding Visual AI Models and Algorithms in Python
For visual applications, the usual task is image recognition, specifically:
- Input: Image
- Output: Any data, such as classification, probability, image, coordinates, etc.
In MaixPy
, let’s use the common YOLO11
detection algorithm as an example:
from maix import nn, image
detector = nn.YOLO11(model="/root/models/yolo11n.mud", dual_buff=True)
img = image.Image(detector.input_width(), detector.input_height(), detector.input_format())
objs = detector.detect(img, conf_th=0.5, iou_th=0.45)
for obj in objs:
img.draw_rect(obj.x, obj.y, obj.w, obj.h, color=image.COLOR_RED)
msg = f'{detector.labels[obj.class_id]}: {obj.score:.2f}'
img.draw_string(obj.x, obj.y, msg, color=image.COLOR_RED)
In this code, we first construct the YOLO11
object to load the model, then pass an image to the detect
method for recognition. The steps include:
nn.YOLO11()
: Initializes the object, loads the model into memory, and parses it.detector.detect()
:- Preprocesses the image, usually standardizing it, such as
(value - mean) * scale
, adjusting pixel values to a suitable range like [0,1], which should match the preprocessing used during model training. - Runs the model, sending preprocessed data to the NPU for calculation following the model's network, producing output, typically floating-point data.
- Postprocesses the output, transforming the model’s output into the final result.
- Preprocesses the image, usually standardizing it, such as
To add a new model and algorithm, implement a similar class as YOLO11
. Pseudocode example:
class My_Model:
def __init__(self, model: str):
pass
# Parses the model, potentially custom parsing from a MUD file
def recognize(self, img: image.Image):
pass
# Preprocesses image
# Runs model
# Postprocesses output
# Returns result
Using the nn.NN
class, we can parse and run models; see the API documentation for details.
Using nn.NN
, we can parse our custom mud
model description file, retrieve preprocessing values like mean
and scale
, and run the model with nn.NN.forward_image()
. This method integrates preprocessing and running steps, reducing memory copy overhead for faster execution. For complex preprocessing, implement custom preprocessing, then run the model using forward()
to get the output.
Here’s an example of implementing a classification model without the built-in nn.Classifier
:
from maix import nn, image, tensor
import os
import numpy as np
def parse_str_values(value: str) -> list[float]:
return [float(v) for v in value.split(",")]
def load_labels(model_path, path_or_labels: str):
path = os.path.join(os.path.dirname(model_path), path_or_labels)
labels0 = open(path, encoding="utf-8").readlines() if os.path.exists(path) else path_or_labels.split(",")
return [label.strip() for label in labels0]
class My_Classifier:
def __init__(self, model: str):
self.model = nn.NN(model, dual_buff=False)
self.extra_info = self.model.extra_info()
self.mean = parse_str_values(self.extra_info["mean"])
self.scale = parse_str_values(self.extra_info["scale"])
self.labels = load_labels(model, self.extra_info["labels"])
def classify(self, img: image.Image):
outs = self.model.forward_image(img, self.mean, self.scale, copy_result=False)
for k in outs.keys():
out = nn.F.softmax(outs[k], replace=True)
out = tensor.tensor_to_numpy_float32(out, copy=False).flatten()
max_idx = out.argmax()
return self.labels[max_idx], out[max_idx]
classifier = My_Classifier("/root/models/mobilenetv2.mud")
file_path = "/root/cat_224.jpg"
img = image.load(file_path, image.Format.FMT_RGB888)
label, score = classifier.classify(img)
print("max score:", label, score)
This code:
- Loads the model and retrieves
mean
andscale
parameters from themud
file. - Recognizes an image by directly calling
forward_image
for model output. - Applies
softmax
as a postprocessing step and displays the class with the highest probability as an example.
More complex models may have elaborate postprocessing, like YOLO, which requires custom CPU processing for certain model parts.
Adding AI Models and Algorithms for Other Data Types
For other data types, like audio or motion sensor data:
- Input: Any data, like audio, IMU, or pressure data.
- Output: Any data, like classifications, probabilities, or control values.
For non-image inputs, use forward
to process raw float32
data. To prepare data for forward
, convert it to tensor.Tensors
from numpy
:
from maix import nn, tensor, time
import numpy as np
input_tensors = tensor.Tensors()
for layer in model.inputs_info():
data = np.zeros(layer.shape, dtype=np.float32)
t = tensor.tensor_from_numpy_float32(data)
input_tensors.add_tensor(layer.name, t, True, True)
outputs = model.forward(input_tensors, copy_result=False, dual_buff_wait=True)
del input_tensors_li
This enables you to send raw data to the model.
Alternatively, to reduce memory copy and speed up execution, use:
from maix import nn, tensor, time
import numpy as np
input_tensors = tensor.Tensors()
input_tensors_li = []
for layer in model.inputs_info():
data = np.zeros(layer.shape, dtype=np.float32)
t = tensor.tensor_from_numpy_float32(data, copy=False)
input_tensors.add_tensor(layer.name, t, False, False)
input_tensors_li.append(t)
outputs = model.forward(input_tensors, copy_result=False, dual_buff_wait=True)
del input_tensors_li
Adding AI Models and Algorithms in C++
Writing Python code allows rapid model validation, but complex preprocessing or postprocessing can slow down performance. In such cases, consider C++ for efficiency.
Refer to the YOLO11 source code for guidance.
Additionally, C++ code can be used in both C++ and MaixPy. By adding comments like @maixpy maix.nn.YOLO11
to your C++ class, it can be used in MaixPy via maix.nn.YOLO11
, providing seamless integration.