Compile AI Models with RUHMI¶
Before model compilation, a quantized TensorFlow Lite model
(.tflite) must be prepared or sourced.
Note
Arm provides pre-trained TensorFlow Lite models on their ML Zoo GitHub repository.
-
Copy the model file to the shared directory being used by the Docker container.
-
Compile the model with RUHMI.
Note
- Remember to replace
mobilenet_v2_1.0_224_INT8.tflitewith the correct filepath of the model. - If RUHMI is being used natively, please update the command
to use the path to the
generate-model-data.pyscript. - Feel free to use a different output directory name instead
of
model_deployment_dir.
- Remember to replace
-
Upon completion, the compilation log will be displayed on the console and the MERA model data saved in the defined output directory.
2026-03-31 13:45:02,631 INFO - ------------------------------------------------------------
2026-03-31 13:45:02,631 INFO - Model being compiled with RUHMI: mobilenet_v2_1.0_224_INT8 (mobilenet_v2_1.0_224_INT8.tflite)
2026-03-31 13:45:02,651 INFO - *** mera v2.5.0+pkg.3782 ***
2026-03-31 13:45:02,651 INFO - Starting deployment of model 'mobilenet_v2_1.0_224_INT8'...
[2026-03-31 13:45:02.668] [info] Loading and converting TFLite model ...
[2026-03-31 13:45:02.668] [info] TFLite Model Version: 3
[2026-03-31 13:45:02.822] [info] Converting to canonical ...
[2026-03-31 13:45:02.846] [info] Running TFLite transformation passes ...
[2026-03-31 13:45:02.854] [info] Converting to target IR ...
[2026-03-31 13:45:02.861] [info] Dividing and exporting regions ...
[2026-03-31 13:45:02.864] [info] Model is fully captured to target ARM_ETHOS_U55
[2026-03-31 13:45:02.895] [info] Model successfully converted.
Info: Changing const_mem_area from Sram to OnChipFlash. This will use the same characteristics as Sram.
Warning: Input TensorFlow Lite network contains a zero length buffer (index = 0) which is semantically not empty. However, it will be treated as an empty buffer.
Network summary for model
Accelerator configuration Ethos_U55_256
System configuration RZG3E_SRAM_64
Memory mode Sram_Only
Accelerator clock 1000 MHz
Design peak SRAM bandwidth 7.45 GB/s
Design peak On-chip Flash bandwidth 7.45 GB/s
Total SRAM used 1470.00 KiB
Total On-chip Flash used 3536.23 KiB
CPU operators = 0 (0.0%)
NPU operators = 95 (100.0%)
Average SRAM bandwidth 5.05 GB/s
Input SRAM bandwidth 11.21 MB/batch
Weight SRAM bandwidth 0.00 MB/batch
Output SRAM bandwidth 6.66 MB/batch
Total SRAM bandwidth 17.87 MB/batch
Total SRAM bandwidth per input 17.87 MB/inference (batch size 1)
Average On-chip Flash bandwidth 1.65 GB/s
Input On-chip Flash bandwidth 0.00 MB/batch
Weight On-chip Flash bandwidth 5.65 MB/batch
Output On-chip Flash bandwidth 0.00 MB/batch
Total On-chip Flash bandwidth 5.82 MB/batch
Total On-chip Flash bandwidth per input 5.82 MB/inference (batch size 1)
Neural network macs 304452946 MACs/batch
Network Tops/s 0.17 Tops/s
NPU cycles 3159461 cycles/batch
SRAM Access cycles 2512149 cycles/batch
DRAM Access cycles 0 cycles/batch
On-chip Flash Access cycles 763033 cycles/batch
Off-chip Flash Access cycles 0 cycles/batch
Total cycles 3536879 cycles/batch
Batch Inference time 3.54 ms, 282.74 inferences/s (batch size 1)
2026-03-31 13:45:06,847 INFO - Compilation finished successfully. Took 00h00m04s
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
2026-03-31 13:45:06,860 INFO - Input data for tensor #0 has been saved to: model_deployment_dir/mobilenet_v2_1.0_224_INT8/input-0.bin
2026-03-31 13:45:06,872 INFO - Expected output data for tensor #0 has been saved to: model_deployment_dir/mobilenet_v2_1.0_224_INT8/expected-output-0.bin
2026-03-31 13:45:06,873 INFO - ------------------------------------------------------------
2026-03-31 13:45:06,874 INFO - The configuration YAML has been saved to: model_deployment_dir/config.yaml
2026-03-31 13:45:06,874 INFO - Please copy the model_deployment_dir directory to the board!
-
Copy the model files to the shared directory being used by the Docker container.
-
Compile the models with RUHMI.
python3 /generate-model-data.py -d model_deployment_dir -m ad_large_int8.tflite kws_micronet_l.tflite tiny_wav2letter_int8.tfliteNote
The models used in the example can be downloaded from:
-
Upon completion, the compilation log will be displayed on the console and the MERA data for all models saved in the defined output directory.
2026-03-31 13:23:39,633 INFO - ------------------------------------------------------------
2026-03-31 13:23:39,633 INFO - Model being compiled with RUHMI: ad_large_int8 (ad_large_int8.tflite)
2026-03-31 13:23:39,653 INFO - *** mera v2.5.0+pkg.3782 ***
2026-03-31 13:23:39,653 INFO - Starting deployment of model 'ad_large_int8'...
...
2026-03-31 13:23:40,600 INFO - Compilation finished successfully. Took 00h00m00s
...
2026-03-31 13:23:40,605 INFO - Input data for tensor #0 has been saved to: model_deployment_dir/ad_large_int8/input-0.bin
2026-03-31 13:23:40,607 INFO - Expected output data for tensor #0 has been saved to: model_deployment_dir/ad_large_int8/expected-output-0.bin
2026-03-31 13:23:40,607 INFO - ------------------------------------------------------------
2026-03-31 13:23:40,607 INFO - ------------------------------------------------------------
2026-03-31 13:23:40,607 INFO - Model being compiled with RUHMI: kws_micronet_l (kws_micronet_l.tflite)
2026-03-31 13:23:40,626 INFO - *** mera v2.5.0+pkg.3782 ***
2026-03-31 13:23:40,626 INFO - Starting deployment of model 'kws_micronet_l'...
...
2026-03-31 13:23:41,755 INFO - Compilation finished successfully. Took 00h00m01s
2026-03-31 13:23:41,759 INFO - Input data for tensor #0 has been saved to: model_deployment_dir/kws_micronet_l/input-0.bin
2026-03-31 13:23:41,761 INFO - Expected output data for tensor #0 has been saved to: model_deployment_dir/kws_micronet_l/expected-output-0.bin
2026-03-31 13:23:41,761 INFO - ------------------------------------------------------------
2026-03-31 13:23:41,762 INFO - ------------------------------------------------------------
2026-03-31 13:23:41,762 INFO - Model being compiled with RUHMI: tiny_wav2letter_int8 (tiny_wav2letter_int8.tflite)
2026-03-31 13:23:41,781 INFO - *** mera v2.5.0+pkg.3782 ***
2026-03-31 13:23:41,781 INFO - Starting deployment of model 'tiny_wav2letter_int8'...
...
2026-03-31 13:23:43,912 INFO - Compilation finished successfully. Took 00h00m02s
2026-03-31 13:23:43,923 INFO - Input data for tensor #0 has been saved to: model_deployment_dir/tiny_wav2letter_int8/input-0.bin
2026-03-31 13:23:43,931 INFO - Expected output data for tensor #0 has been saved to: model_deployment_dir/tiny_wav2letter_int8/expected-output-0.bin
2026-03-31 13:23:43,932 INFO - ------------------------------------------------------------
2026-03-31 13:23:43,934 INFO - The configuration YAML has been saved to: model_deployment_dir/config.yaml
2026-03-31 13:23:43,935 INFO - Please copy the model_deployment_dir directory to the board!
Note
The warning Input TensorFlow Lite network contains a zero length buffer ... message is caused due to the TensorFlow Lite converter
attaching semantic flag to buffer #0, which is identified with Vela.
Please ignore this warning as the compiler correctly identifies that buffer #0 should be treated as an empty buffer.