System Boot and Login
Note: The system must be powered via 12V DC input. USB power alone may cause insufficient power supply leading to system instability.
Pre-Boot Preparation
Required:
12V DC power supply
HDMI cable and display
Type-C cable (optional: for serial debugging/login)
RJ45 Ethernet cable (optional: for network connection and SSH login)
For first-time use, verify the following status matches your hardware version:
Power switch on enclosure is ON (for enclosed versions)
Jumper cap remains shorted at Button marking (for bare board versions)
Normal Boot
Connect display via HDMI1 and provide 12V DC power. After approximately 20 seconds, the LightDM login interface will appear, indicating successful system boot.
Connect mouse/keyboard to the onboard USB-A port for operation. Use default credentials (username: root
, password: root
) to access the Debian desktop system.
Advanced: Serial & SSH Login
The onboard Type-C USB port serves as the default debug UART (115200 8n1).
Use a serial tool to view kernel logs or log in directly via terminal.
For SSH access, expand network connectivity via USB Ethernet/WiFi dongles.
Note: The system only has the root superuser, and SSH password login is disabled by default for security. For temporary access, refer to here。
Interactive Image Segmentation & Inpainting
A QT-based GUI for real-time segmentation (point/box selection) and inpainting.
Open Source Official GitHub Repo: SAM-ONNX-AX650-CPP
Download prebuilt binaries or compile from source.
Example: Removing a player from a photo:
![]() |
![]() |
Live Demo (Screenshots):
![]() |
![]() |
![]() |
RAW | SAM | Inpaint |
Interactive Text-to-Image Search (CLIP)
A QT-based GUI using OpenAI’s CLIP (Contrastive Language–Image Pre-training) for zero-shot image retrieval via text input (supports Chinese/English).
Open Source Official GitHub Repo: CLIP-ONNX-AX650-CPP
Install QT:
apt update apt install cmake qt6-base-dev
Download prebuilt files (executable, models, test images/text):
Extract CLIP.zip to
/root/Desktop/
:root@m4nhat-7190c7:~/Desktop/CLIP# tree -L 1 . ├── CLIPQT ├── cn_vocab.txt ├── coco_1000 ├── libonnxruntime.so ├── libonnxruntime.so.1.16.0 ├── onnx_models ├── run_en.sh ├── run_zh.sh └── vocab.txt
Run in Desktop's terminal:
./run_zh.sh # For Chinese ./run_en.sh # For English
Screenshots:
Important Demo Pre-requisites
HDMI0 (demo output) and HDMI1 (desktop) cannot operate simultaneously due to display driver limitations.
To run demos:
Terminate fb_vo process:
bash kill -9 $(pgrep fb_vo)
Connect display to HDMI0
Execute demo scripts via SSH/serial terminal
After demo, you can restore desktop:
bash /root/runVoHook.sh
32-Channel AI BOX (Person/Vehicle Detection)
BoxDemo showcases the complete pipeline from H.264/H.265 decoding → AI analysis → HDMI display.
Features:
Default: 32-channel display (6×6 layout)
Dual HDMI support (mirror/extended)
System power consumption <7W
3.6T NPU utilization (1/3 capacity)
15-20 FPS (CPU-bound)
Configuration:
Edit /opt/bin/BoxDemo/box.conf:
streamxx: RTSP source URLs
DISP1=1: Enable HDMI1 output
Run:
bash /opt/bin/BoxDemo/run.sh
DINO v2 Monocular Depth Estimation
Leveraging Facebook's DINO v2 model for relative depth estimation using single RGB camera.
Execution:
cd ~/ax-pipeline/bin
./sample_multi_demux_ivps_npu_multi_rtsp_hdmi_vo \
-p ./config/dinov2_depth.json \
-f ~/boxvideos/13.mp4
Supports H.264 video files or RTSP streams
Results:
![]() |
![]() |
YOLOv5 Pedestrian Detection & Tracking
cd ~/ax-pipeline/bin
./sample_multi_demux_ivps_npu_multi_rtsp_hdmi_vo \
-p ./config/yolov5_seg.json \
-f ~/boxvideos/25.mp4