Deploying AI Models on M4N

2025-07-10

Deploying Large Language and Multimodal Models

Obtain models and runtime environments from the following sources. Deployment instructions can be found in each repository's README.md.

Official AXERA Models: https://huggingface.co/AXERA-TECH

China Mirror Site: https://hf-mirror.com/AXERA-TECH

Model	Link	China Mirror Link
Qwen3:0.6b	Qwen3-0.6B-Int8	Qwen3-0.6B-Int8
DeepSeek-R1:1.5b	DeepSeek-R1-Distill-Qwen-1.5B	DeepSeek-R1-Distill-Qwen-1.5B
Qwen2.5:1.5b	Qwen2.5-1.5B-Instruct-GPTQ-Int8	Qwen2.5-1.5B-Instruct-GPTQ-Int8
SD1.5	lcm-lora-sdv1-5	lcm-lora-sdv1-5
InternVL2.5:1b	InternVL2_5-1B-Int8	InternVL2_5-1B-Int8

Important Note: All above models require system images compiled with SDK 1.45.0 or later to run large models. Please update your system accordingly. Our provided TFCard&eMMC images meet this requirement and reserve 6GB memory for model loading, capable of running 7B parameter int4 models.

Quick test with Qwen3-0.6b:

# Can replace with links to other model repositories
git clone https://hf-mirror.com/AXERA-TECH/Qwen3-0.6B

cd Qwen3-0.6B

# If ModuleNotFoundError occurs, refer to FAQ for details
python3 qwen3_tokenizer_uid.py

# Switch to corresponding execution script and restore necessary permissions
chmod +x main_ax650
sh run_qwen3_0.6b_int8_ctx_ax650.sh

Flashing OS Image

FAQ

wiki

Deploying AI Models on M4N

Deploying Large Language and Multimodal Models