AI Demo说明文档
1. AI Demo开发框架介绍
1.1. AI Demo开发框架
为了帮助用户简化AI部分的开发,基于K230_CanMV提供的API接口,搭建了配套的AI 开发框架。框架结构如下图所示:
Camera默认出两路图像,一路格式为YUV420,直接给到Display显示;另一路格式为RGB888,给到AI部分进行处理。AI主要实现任务的前处理、推理和后处理流程,得到后处理结果后将其绘制在osd image实例上,并送给Display叠加显示。
1.2. 接口介绍
1.2.1. PipeLine
我们将Media部分的代码封装在PipeLine类型中,通过固定的接口实现整个流程操作。
其中PipeLine类提供的接口包括:
-
初始化参数包括:
(1)rgb888p_size:list类型,预设给到AI部分的图像分辨率;如rgb888p_size=[1920,1080]。
(2)display_size:list类型,显示部分Display的分辨率;如display_size=[1920,1080]。
(3)display_mode:str类型,显示模式,包括”hdmi“和”lcd“;如display_mode=”hdmi“。
(4)debug_mode:int类型,耗时调试模式,如果大于0,打印操作耗时;如debug_mode=0。
-
create(sensor=None,hmirror=None,vfilp=None):
(1)sensor:参数为可选参数,类型为Sensor对象,可自主配置现有CanMV、01Studio和k230d zero开发板实现了自动探测,可以默认使用create()实现。
(2)hmirror:默认为None,当主动设置时为bool类型(True/False),表示是否实现水平方向镜像显示。
(3)vflip: 默认为None,当主动设置时为bool类型(True/False),表示是否实现垂直方向翻转。
-
get_frame():返回一帧ulab.numpy.ndarray类型图像数据,分辨率为rgb888p_size,排布为CHW。
-
show_image():PipeLine实例中预设一帧OSD图像,该接口将成员变量osd_img显示在屏幕上。
-
destroy():销毁PipeLine实例。
下面给出无AI部分的示例代码:
from libs.PipeLine import PipeLine, ScopedTiming
from media.media import *
import gc
import sys,os
if __name__ == "__main__":
# 显示模式,默认"hdmi",可以选择"hdmi"和"lcd"
display_mode="hdmi"
if display_mode=="hdmi":
display_size=[1920,1080]
else:
display_size=[800,480]
# 初始化PipeLine,用于图像处理流程
pl = PipeLine(rgb888p_size=[1920,1080], display_size=display_size, display_mode=display_mode)
pl.create() # 创建PipeLine实例
try:
while True:
os.exitpoint() # 检查是否有退出信号
with ScopedTiming("total",1):
img = pl.get_frame() # 获取当前帧数据
print(img.shape)
gc.collect() # 垃圾回收
except Exception as e:
sys.print_exception(e) # 打印异常信息
finally:
pl.destroy() # 销毁PipeLine实例
上述代码中,通过pl.get_frame()
接口获取一帧分辨率为rgb888p_size的图像,类型为ulab.numpy.ndarray,排布为CHW。基于上面的代码得到了一帧图像给AI处理,您可以只关注AI推理部分的操作。
图像AI开发过程包括:图像预处理、模型推理、输出后处理的过程,我们将整个过程封装在Ai2d类和AIBase类中。
1.2.2. Ai2d
对于Ai2d类,我们给出了常见的几种预处理方法,包括crop/shift/pad/resize/affine。该类别提供的接口包括:
- 初始化参数包括:
(1)debug_mode:int类型,耗时调试模式,如果大于0,打印操作耗时;如debug_mode=0。
- set_ai2d_dtype(input_format,output_format,input_type,output_type)
(1)input_format:ai2d预处理输入格式。
(2)output_format:ai2d预处理输出格式。
输入输出格式支持如下所示:
enum class ai2d_format
{
YUV420_NV12 = 0,
YUV420_NV21 = 1,
YUV420_I420 = 2,
NCHW_FMT = 3,
RGB_packed = 4,
RAW16 = 5,
}
输入格式 | 输出格式 | 备注 |
---|---|---|
YUV420_NV12 | RGB_planar/YUV420_NV12 | |
YUV420_NV21 | RGB_planar/YUV420_NV21 | |
YUV420_I420 | RGB_planar/YUV420_I420 | |
YUV400 | YUV400 | |
NCHW(RGB_planar) | NCHW(RGB_planar) | |
RGB_packed | RGB_planar/RGB_packed | |
RAW16 | RAW16/8 | 深度图,执行shift操作 |
(3)input_type:输入数据类型。
(4)output_type:输出数据类型。
下面是接口调用示例:
from libs.AI2D import Ai2d
import nncase_runtime as nn
my_ai2d=Ai2d(debug_mode=1)
my_ai2d.set_ai2d_type(nn.ai2d_format.NCHW_FMT, nn.ai2d_format.NCHW_FMT, np.uint8, np.uint8)
my_ai2d.set_ai2d_type(nn.ai2d_format.RGB_packed, nn.ai2d_format.NCHW_FMT, np.uint8, np.uint8)
-
crop(start_x,start_y,width,height):预处理crop函数。
(1)start_x:宽度方向的起始像素,int类型;
(2)start_y: 高度方向的起始像素,int类型;
(3)width: 宽度方向的crop长度,int类型;
(4)height: 高度方向的crop长度,int类型;
my_ai2d.crop(0,0,200,300)
- shift(shift_val):预处理shift函数。
(1)shift_val:右移的比特数,int类型;
my_ai2d.shift(2)
- pad(paddings,pad_mode,pad_val):预处理padding函数。
(1)paddings:list类型,各维度两侧padding的大小,对于4维的图像(NCHW),该参数包含8个值,分别表示N/C/H/W四个维 度两侧的padding大小,一般只在后两个维度做padding;
(2)pad_mode:只支持constant padding,直接设为0;
(3)pad_val:list类型,每个像素位置填充的值,比如[114,114,114]、[0,0,0]
my_ai2d.pad([0,0,0,0,5,5,15,15],0,[114,114,114])
- resize(interp_method,interp_mode):预处理resize函数。
(1)interp_method:resize插值方法,ai2d_interp_method类型,包括:nn.interp_method.tf_nearest、nn.interp_method.tf_bilinear、nn.interp_method.cv2_nearest、nn.interp_method.cv2_bilinear;
(2)interp_mode:resize模式,ai2d_interp_mode类型,包括:nn.interp_mode.none、nn.interp_mode.align_corner、nn.interp_mode.half_pixel;
my_ai2d.resize(nn.interp_method.tf_bilinear, nn.interp_mode.half_pixel)
- affine(interp_method,crop_round,bound_ind,bound_val,bound_smooth,M):预处理affine函数。
(1)interp_method:Affine采用的插值方法,ai2d_interp_method类型,包括:nn.interp_method.tf_nearest、nn.interp_method.tf_bilinear、nn.interp_method.cv2_nearest、nn.interp_method.cv2_bilinear;
(2)cord_round:整数边界0或者1,uint32_t类型;
(3)bound_ind:边界像素模式0或者1,uint32_t类型;
(4)bound_val:边界填充值,uint32_t类型;
(5)bound_smooth:边界平滑0或者1,uint32_t类型;
(6)M:仿射变换矩阵对应的vector,仿射变换为Y=[a_0, a_1; a_2, a_3] \cdot X + [b_0, b_1] $, 则 M=[a_0,a_1,b_0,a_2,a_3,b_1 ],list类型。
affine_matrix=[0.2159457, -0.031286, -59.5312, 0.031286, 0.2159457, -35.30719]
my_ai2d.affine(nn.interp_method.cv2_bilinear,0, 0, 127, 1,affine_matrix)
- build(ai2d_input_shape,ai2d_output_shape):ai2d构造函数,前面配置的预处理方法起作用。
(1)ai2d_input_shape:ai2d输入shape,list类型;
(2)ai2d_output_shape:ai2d输出shape,list类型;
my_ai2d.build([1,3,224,224],[1,3,512,512])
- run(input_np):调用配置好的ai2d进行预处理的函数,返回一个tensor类型数据,可以直接给模型使用,也可以通过to_numpy()转换成ulab.numpy.ndarray类型的数据。
(1)input_np:ulab.numpy.ndarray类型,ai2d预处理的输入数据,shape和build函数中设置的ai2d_input_shape一致。
注意:
(1) Affine和Resize功能是互斥的,不能同时开启; (2) Shift功能的输入格式只能是Raw16; (3) Pad value是按通道配置的,对应的list元素个数要与channel数相等; (4) 当配置了多个功能时,执行顺序是Crop->Shift->Resize/Affine->Pad, 配置参数时注意要匹配;如果不符合该顺序,需要初始化多个Ai2d实例实现预处理过程;
下面是一个完整的示例:
from libs.PipeLine import PipeLine, ScopedTiming
from libs.AI2D import Ai2d
from media.media import *
import nncase_runtime as nn
import gc
import sys,os
if __name__ == "__main__":
# 显示模式,默认"hdmi",可以选择"hdmi"和"lcd"
display_mode="hdmi"
if display_mode=="hdmi":
display_size=[1920,1080]
else:
display_size=[800,480]
# 初始化PipeLine,用于图像处理流程
pl = PipeLine(rgb888p_size=[512,512], display_size=display_size, display_mode=display_mode)
pl.create() # 创建PipeLine实例
my_ai2d=Ai2d(debug_mode=0) #初始化Ai2d实例
# 配置resize预处理方法
my_ai2d.resize(nn.interp_method.tf_bilinear, nn.interp_mode.half_pixel)
# 构建预处理过程
my_ai2d.build([1,3,512,512],[1,3,640,640])
try:
while True:
os.exitpoint() # 检查是否有退出信号
with ScopedTiming("total",1):
img = pl.get_frame() # 获取当前帧数据
print(img.shape) # 原图shape为[1,3,512,512]
ai2d_output_tensor=my_ai2d.run(img) # 执行resize预处理
ai2d_output_np=ai2d_output_tensor.to_numpy() # 类型转换
print(ai2d_output_np.shape) # 预处理后的shape为[1,3,640,640]
gc.collect() # 垃圾回收
except Exception as e:
sys.print_exception(e) # 打印异常信息
finally:
pl.destroy() # 销毁PipeLine实例
1.2.3. AIBase
AIBase部分封装了实现模型推理的主要接口,也是进行AI开发主要关注的部分。用户需要按照自己demo的要求实现前处理和后处理部分。
AIBase提供的接口包括:
- 初始化参数包括:
(1)kmodel_path:str类型,kmodel路径,用于初始化kpu对象并加载kmodel;
(2)model_input_size:list类型,可选,模型输入分辨率,在单输入时起作用,格式为[width,height],如:model_input_size=[512,512];
(3)rgb888p_size:list类型,可选,AI得到的图像的分辨率,在单输入时起作用,格式为[width,height],如:rgb888p_size=[640,640];
(4)debug_mode:int类型,耗时调试模式,如果大于0,打印操作耗时;如debug_mode=0。
- get_kmodel_inputs_num():返回当前模型的输入个数;
- get_kmodel_outputs_num():返回当前模型的输出个数;
- preprocess(input_np):使用ai2d对input_np做预处理,如果不使用单个ai2d实例做预处理,需要在子类重写该函数。
(1)input_np:ulab.numpy.ndarray类型,ai2d预处理输入数据;
(2)返回tensor列表;如果该方法重写,请注意返回类型:tensor类型的列表;
- inference(tensors):对预处理后得到的kmodel的输入(类型为tensor)进行推理,得到多个输出(类型为ulab.numpy.ndarray);
(1)tensors:列表类型,模型的输入,可以是一个可以是多个;
(2)返回ulab.numpy.ndarray类型的列表;
Tips:
Image对象转ulab.numpy.ndarray:
import image
img.to_rgb888().to_numpy_ref() #返回的array是HWC排布ulab.numpy.ndarray转Image对象:
import ulab.numpy as np
import image
img_np = np.zeros((height,width,4),dtype=np.uint8)
img = image.Image(width, height, image.ARGB8888, alloc=image.ALLOC_REF,data =img_np)ulab.numpy.ndarray转tensor类型:
import ulab.numpy as np
import nncase_runtime as nn
img_np = np.zeros((height,width,4),dtype=np.uint8)
tensor = nn.from_numpy(img_np)tensor 类型转ulab.numpy.ndarray:
import ulab.numpy as np
import nncase_runtime as nn
img_np=tensor.to_numpy()
- postprocess(results):模型输出后处理函数,该函数需要用户在任务子类重写,因为不同AI任务的后处理是不同的。
(1)results:list类型,list元素是ulab.numpy.ndarray类型,模型的推理输出。
- run(input_np):模型的前处理、推理、后处理流程,适用于单ai2d实例能解决的前处理的AI任务,其他任务需要用户在子类重写。
(1)input_np:ulab.numpy.ndarray类型,ai2d预处理输入数据;该数据通过ai2d预处理输出1个tensor,tensor通过模型推理得到输出列表results,results经过后处理过程得到AI结果。
- deinit():AIBase销毁函数。
1.2.4. ScopedTiming
ScopedTiming 类在PipeLine.py模块内,是一个用来测量代码块执行时间的上下文管理器。上下文管理器通过定义包含 __enter__
和 __exit__
方法的类来创建。当在 with 语句中使用该类的实例时,__enter__
在进入 with 块时被调用,__exit__
在离开时被调用。
from libs.PipeLine import ScopedTiming
def test_time():
with ScopedTiming("test",1):
#####代码#####
# ...
##############
1.3. 应用方法和示例
1.3.1. 概述
用户可根据具体的AI场景自写任务类继承AIBase,可以将任务分为如下四类:单模型任务、多模型任务,自定义预处理任务、无预处理任务。不同任务需要编写不同的代码实现,具体如下图所示:
关于不同任务的介绍:
任务类型 | 任务描述 | 代码说明 |
---|---|---|
单模型任务 | 该任务只有一个模型,只需要关注该模型的前处理、推理、后处理过程,此类任务的前处理使用Ai2d实现,可能使用一个Ai2d实例,也可能使用多个Ai2d实例,后处理基于场景自定义。 | 编写自定义任务类,主要关注任务类的config_preprocess、postprocess、以及该任务需要的其他方法如:draw_result等。 如果该任务包含多个Ai2d实例,则需要重写preprocess,按照预处理的顺序设置预处理阶段的计算过程。 |
自定义预处理任务 | 该任务只有一个模型,只需要关注该模型的前处理、推理、后处理过程,此类任务的前处理不使用Ai2d实现,可以使用ulab.numpy自定义,后处理基于场景自定义。 | 编写自定义任务类,主要关注任务类的preprocess、postprocess、以及该任务需要的其他方法如:draw_result等 |
无预处理任务 | 该任务只有一个模型且不需要预处理,只需要关注该模型的推理和后处理过程,此类任务一般作为多模型任务的一部分,直接对前一个模型的输出做为输入推理,后处理基于需求自定义。 | 编写自定义任务类,主要关注任务类的run(模型推理的整个过程,包括preprocess、inference、postprocess中的全部或某一些步骤)、postprocess、以及该任务需要的其他方法如:draw_results等 |
多模型任务 | 该任务包含多个模型,可能是串联,也可能是其他组合方式。对于每个模型基本上属于前三种模型中的一种,最后通过一个完整的任务类将上述模型子任务统一起来。 | 编写多个子模型任务类,不同子模型任务参照前三种任务定义。不同任务关注不同的方法。 编写多模型任务类,将子模型任务类统一起来实现整个场景。 |
1.3.2. 单模型任务
单模型任务的伪代码结构如下:
from libs.PipeLine import PipeLine, ScopedTiming
from libs.AIBase import AIBase
from libs.AI2D import Ai2d
import os
from media.media import *
import nncase_runtime as nn
import ulab.numpy as np
import image
import gc
import sys
# 自定义AI任务类,继承自AIBase基类
class MyAIApp(AIBase):
def __init__(self, kmodel_path, model_input_size, rgb888p_size=[224,224], display_size=[1920,1080], debug_mode=0):
# 调用基类的构造函数
super().__init__(kmodel_path, model_input_size, rgb888p_size, debug_mode)
# 模型文件路径
self.kmodel_path = kmodel_path
# 模型输入分辨率
self.model_input_size = model_input_size
# sensor给到AI的图像分辨率,并对宽度进行16的对齐
self.rgb888p_size = [ALIGN_UP(rgb888p_size[0], 16), rgb888p_size[1]]
# 显示分辨率,并对宽度进行16的对齐
self.display_size = [ALIGN_UP(display_size[0], 16), display_size[1]]
# 是否开启调试模式
self.debug_mode = debug_mode
# 实例化Ai2d,用于实现模型预处理
self.ai2d = Ai2d(debug_mode)
# 设置Ai2d的输入输出格式和类型
self.ai2d.set_ai2d_dtype(nn.ai2d_format.NCHW_FMT, nn.ai2d_format.NCHW_FMT, np.uint8, np.uint8)
# 配置预处理操作,这里使用了pad和resize,Ai2d支持crop/shift/pad/resize/affine,具体代码请打开/sdcard/app/libs/AI2D.py查看
def config_preprocess(self, input_image_size=None):
with ScopedTiming("set preprocess config", self.debug_mode > 0):
# 初始化ai2d预处理配置,默认为sensor给到AI的尺寸,可以通过设置input_image_size自行修改输入尺寸
ai2d_input_size = input_image_size if input_image_size else self.rgb888p_size
# 配置resize预处理方法
self.ai2d.resize(nn.interp_method.tf_bilinear, nn.interp_mode.half_pixel)
# 构建预处理流程
self.ai2d.build([1,3,ai2d_input_size[1],ai2d_input_size[0]],[1,3,self.model_input_size[1],self.model_input_size[0]])
# 自定义当前任务的后处理,results是模型输出array列表,需要根据实际任务重写
def postprocess(self, results):
with ScopedTiming("postprocess", self.debug_mode > 0):
pass
# 绘制结果到画面上,需要根据任务自己写
def draw_result(self, pl, dets):
with ScopedTiming("display_draw", self.debug_mode > 0):
pass
if __name__ == "__main__":
# 显示模式,默认"hdmi",可以选择"hdmi"和"lcd"
display_mode="hdmi"
if display_mode=="hdmi":
display_size=[1920,1080]
else:
display_size=[800,480]
# 设置模型路径,这里要替换成当前任务模型
kmodel_path = "example_test.kmodel"
rgb888p_size = [1920, 1080]
###### 其它参数########
...
######################
# 初始化PipeLine,用于图像处理流程
pl = PipeLine(rgb888p_size=rgb888p_size, display_size=display_size, display_mode=display_mode)
pl.create() # 创建PipeLine实例
# 初始化自定义AI任务实例
my_ai = MyAIApp(kmodel_path, model_input_size=[320, 320],rgb888p_size=rgb888p_size, display_size=display_size, debug_mode=0)
my_ai.config_preprocess() # 配置预处理
try:
while True:
os.exitpoint() # 检查是否有退出信号
with ScopedTiming("total",1):
img = pl.get_frame() # 获取当前帧数据
res = my_ai.run(img) # 推理当前帧
my_ai.draw_result(pl, res) # 绘制结果
pl.show_image() # 显示结果
gc.collect() # 垃圾回收
except Exception as e:
sys.print_exception(e) # 打印异常信息
finally:
my_ai.deinit() # 反初始化
pl.destroy() # 销毁PipeLine实例
下面以人脸检测为例给出示例代码:
from libs.PipeLine import PipeLine, ScopedTiming
from libs.AIBase import AIBase
from libs.AI2D import Ai2d
import os
import ujson
from media.media import *
from time import *
import nncase_runtime as nn
import ulab.numpy as np
import time
import utime
import image
import random
import gc
import sys
import aidemo
# 自定义人脸检测类,继承自AIBase基类
class FaceDetectionApp(AIBase):
def __init__(self, kmodel_path, model_input_size, anchors, confidence_threshold=0.5, nms_threshold=0.2, rgb888p_size=[224,224], display_size=[1920,1080], debug_mode=0):
super().__init__(kmodel_path, model_input_size, rgb888p_size, debug_mode) # 调用基类的构造函数
self.kmodel_path = kmodel_path # 模型文件路径
self.model_input_size = model_input_size # 模型输入分辨率
self.confidence_threshold = confidence_threshold # 置信度阈值
self.nms_threshold = nms_threshold # NMS(非极大值抑制)阈值
self.anchors = anchors # 锚点数据,用于目标检测
self.rgb888p_size = [ALIGN_UP(rgb888p_size[0], 16), rgb888p_size[1]] # sensor给到AI的图像分辨率,并对宽度进行16的对齐
self.display_size = [ALIGN_UP(display_size[0], 16), display_size[1]] # 显示分辨率,并对宽度进行16的对齐
self.debug_mode = debug_mode # 是否开启调试模式
self.ai2d = Ai2d(debug_mode) # 实例化Ai2d,用于实现模型预处理
self.ai2d.set_ai2d_dtype(nn.ai2d_format.NCHW_FMT, nn.ai2d_format.NCHW_FMT, np.uint8, np.uint8) # 设置Ai2d的 输入输出格式和类型
# 配置预处理操作,这里使用了pad和resize,Ai2d支持crop/shift/pad/resize/affine,具体代码请打开/sdcard/app/libs/AI2D.py查看
def config_preprocess(self, input_image_size=None):
with ScopedTiming("set preprocess config", self.debug_mode > 0): # 计时器,如果debug_mode大于0则开启
ai2d_input_size = input_image_size if input_image_size else self.rgb888p_size # 初始化ai2d预处理配置,默认为sensor给到AI的尺寸,可以通过设置input_image_size自行修改输入尺寸
top, bottom, left, right = self.get_padding_param() # 获取padding参数
self.ai2d.pad([0, 0, 0, 0, top, bottom, left, right], 0, [104, 117, 123]) # 填充边缘
self.ai2d.resize(nn.interp_method.tf_bilinear, nn.interp_mode.half_pixel) # 缩放图像
self.ai2d.build([1,3,ai2d_input_size[1],ai2d_input_size[0]],[1,3,self.model_input_size[1],self.model_input_size[0]]) # 构建预处理流程
# 自定义当前任务的后处理,results是模型输出array列表,这里使用了aidemo库的face_det_post_process接口
def postprocess(self, results):
with ScopedTiming("postprocess", self.debug_mode > 0):
post_ret = aidemo.face_det_post_process(self.confidence_threshold, self.nms_threshold, self.model_input_size[1], self.anchors, self.rgb888p_size, results)
if len(post_ret) == 0:
return post_ret
else:
return post_ret[0]
# 绘制检测结果到画面上
def draw_result(self, pl, dets):
with ScopedTiming("display_draw", self.debug_mode > 0):
if dets:
pl.osd_img.clear() # 清除OSD图像
for det in dets:
# 将检测框的坐标转换为显示分辨率下的坐标
x, y, w, h = map(lambda x: int(round(x, 0)), det[:4])
x = x * self.display_size[0] // self.rgb888p_size[0]
y = y * self.display_size[1] // self.rgb888p_size[1]
w = w * self.display_size[0] // self.rgb888p_size[0]
h = h * self.display_size[1] // self.rgb888p_size[1]
pl.osd_img.draw_rectangle(x, y, w, h, color=(255, 255, 0, 255), thickness=2) # 绘制矩形框
else:
pl.osd_img.clear()
# 获取padding参数
def get_padding_param(self):
dst_w = self.model_input_size[0] # 模型输入宽度
dst_h = self.model_input_size[1] # 模型输入高度
ratio_w = dst_w / self.rgb888p_size[0] # 宽度缩放比例
ratio_h = dst_h / self.rgb888p_size[1] # 高度缩放比例
ratio = min(ratio_w, ratio_h) # 取较小的缩放比例
new_w = int(ratio * self.rgb888p_size[0]) # 新宽度
new_h = int(ratio * self.rgb888p_size[1]) # 新高度
dw = (dst_w - new_w) / 2 # 宽度差
dh = (dst_h - new_h) / 2 # 高度差
top = int(round(0))
bottom = int(round(dh * 2 + 0.1))
left = int(round(0))
right = int(round(dw * 2 - 0.1))
return top, bottom, left, right
if __name__ == "__main__":
# 显示模式,默认"hdmi",可以选择"hdmi"和"lcd"
display_mode="hdmi"
if display_mode=="hdmi":
display_size=[1920,1080]
else:
display_size=[800,480]
# 设置模型路径和其他参数
kmodel_path = "/sdcard/app/tests/kmodel/face_detection_320.kmodel"
# 其它参数
confidence_threshold = 0.5
nms_threshold = 0.2
anchor_len = 4200
det_dim = 4
anchors_path = "/sdcard/app/tests/utils/prior_data_320.bin"
anchors = np.fromfile(anchors_path, dtype=np.float)
anchors = anchors.reshape((anchor_len, det_dim))
rgb888p_size = [1920, 1080]
# 初始化PipeLine,用于图像处理流程
pl = PipeLine(rgb888p_size=rgb888p_size, display_size=display_size, display_mode=display_mode)
pl.create() # 创建PipeLine实例
# 初始化自定义人脸检测实例
face_det = FaceDetectionApp(kmodel_path, model_input_size=[320, 320], anchors=anchors, confidence_threshold=confidence_threshold, nms_threshold=nms_threshold, rgb888p_size=rgb888p_size, display_size=display_size, debug_mode=0)
face_det.config_preprocess() # 配置预处理
try:
while True:
os.exitpoint() # 检查是否有退出信号
with ScopedTiming("total",1):
img = pl.get_frame() # 获取当前帧数据
res = face_det.run(img) # 推理当前帧
face_det.draw_result(pl, res) # 绘制结果
pl.show_image() # 显示结果
gc.collect() # 垃圾回收
except Exception as e:
sys.print_exception(e) # 打印异常信息
finally:
face_det.deinit() # 反初始化
pl.destroy() # 销毁PipeLine实例
多个Ai2d实例时的伪代码如下:
from libs.PipeLine import PipeLine, ScopedTiming
from libs.AIBase import AIBase
from libs.AI2D import Ai2d
import os
from media.media import *
import nncase_runtime as nn
import ulab.numpy as np
import image
import gc
import sys
# 自定义AI任务类,继承自AIBase基类
class MyAIApp(AIBase):
def __init__(self, kmodel_path, model_input_size, rgb888p_size=[224,224], display_size=[1920,1080], debug_mode=0):
# 调用基类的构造函数
super().__init__(kmodel_path, model_input_size, rgb888p_size, debug_mode)
# 模型文件路径
self.kmodel_path = kmodel_path
# 模型输入分辨率
self.model_input_size = model_input_size
# sensor给到AI的图像分辨率,并对宽度进行16的对齐
self.rgb888p_size = [ALIGN_UP(rgb888p_size[0], 16), rgb888p_size[1]]
# 显示分辨率,并对宽度进行16的对齐
self.display_size = [ALIGN_UP(display_size[0], 16), display_size[1]]
# 是否开启调试模式
self.debug_mode = debug_mode
# 实例化Ai2d,用于实现模型预处理
self.ai2d_resize = Ai2d(debug_mode)
# 设置Ai2d的输入输出格式和类型
self.ai2d_resize.set_ai2d_dtype(nn.ai2d_format.NCHW_FMT, nn.ai2d_format.NCHW_FMT, np.uint8, np.uint8)
# 实例化Ai2d,用于实现模型预处理
self.ai2d_resize = Ai2d(debug_mode)
# 设置Ai2d的输入输出格式和类型
self.ai2d_resize.set_ai2d_dtype(nn.ai2d_format.NCHW_FMT, nn.ai2d_format.NCHW_FMT, np.uint8, np.uint8)
# 实例化Ai2d,用于实现模型预处理
self.ai2d_crop = Ai2d(debug_mode)
# 设置Ai2d的输入输出格式和类型
self.ai2d_crop.set_ai2d_dtype(nn.ai2d_format.NCHW_FMT, nn.ai2d_format.NCHW_FMT, np.uint8, np.uint8)
# 配置预处理操作,这里使用了pad和resize,Ai2d支持crop/shift/pad/resize/affine,具体代码请打开/sdcard/app/libs/AI2D.py查看
def config_preprocess(self, input_image_size=None):
with ScopedTiming("set preprocess config", self.debug_mode > 0):
# 初始化ai2d预处理配置,默认为sensor给到AI的尺寸,可以通过设置input_image_size自行修改输入尺寸
ai2d_input_size = input_image_size if input_image_size else self.rgb888p_size
# 配置resize预处理方法
self.ai2d_resize.resize(nn.interp_method.tf_bilinear, nn.interp_mode.half_pixel)
# 构建预处理流程
self.ai2d_resize.build([1,3,ai2d_input_size[1],ai2d_input_size[0]],[1,3,640,640])
# 配置crop预处理方法
self.ai2d_crop.crop(0,0,320,320)
# 构建预处理流程
self.ai2d_crop.build([1,3,640,640],[1,3,320,320])
# 假设该任务需要crop和resize预处理,顺序是先resize再crop,该顺序不符合ai2d的处理顺序,因此需要设置两个Ai2d实例分别处理
def preprocess(self,input_np):
resize_tensor=self.ai2d_resize.run(input_np)
resize_np=resize_tensor.to_numpy()
crop_tensor=self.ai2d_crop.run(resize_np)
return [crop_tensor]
# 自定义当前任务的后处理,results是模型输出array列表,需要根据实际任务重写
def postprocess(self, results):
with ScopedTiming("postprocess", self.debug_mode > 0):
pass
# 绘制结果到画面上,需要根据任务自己写
def draw_result(self, pl, dets):
with ScopedTiming("display_draw", self.debug_mode > 0):
pass
# 重写deinit,释放多个ai2d资源
def deinit(self):
with ScopedTiming("deinit",self.debug_mode > 0):
del self.ai2d_resize
del self.ai2d_crop
super().deinit()
if __name__ == "__main__":
# 显示模式,默认"hdmi",可以选择"hdmi"和"lcd"
display_mode="hdmi"
if display_mode=="hdmi":
display_size=[1920,1080]
else:
display_size=[800,480]
# 设置模型路径,这里要替换成当前任务模型
kmodel_path = "example_test.kmodel"
rgb888p_size = [1920, 1080]
###### 其它参数########
...
######################
# 初始化PipeLine,用于图像处理流程
pl = PipeLine(rgb888p_size=rgb888p_size, display_size=display_size, display_mode=display_mode)
pl.create() # 创建PipeLine实例
# 初始化自定义AI任务实例
my_ai = MyAIApp(kmodel_path, model_input_size=[320, 320],rgb888p_size=rgb888p_size, display_size=display_size, debug_mode=0)
my_ai.config_preprocess() # 配置预处理
try:
while True:
os.exitpoint() # 检查是否有退出信号
with ScopedTiming("total",1):
img = pl.get_frame() # 获取当前帧数据
res = my_ai.run(img) # 推理当前帧
my_ai.draw_result(pl, res) # 绘制结果
pl.show_image() # 显示结果
gc.collect() # 垃圾回收
except Exception as e:
sys.print_exception(e) # 打印异常信息
finally:
my_ai.deinit() # 反初始化
pl.destroy() # 销毁PipeLine实例
1.3.3. 自定义预处理任务
对于需要重写前处理(不使用提供的ai2d类,自己手动写预处理)的AI任务伪代码如下:
from libs.PipeLine import PipeLine, ScopedTiming
from libs.AIBase import AIBase
from libs.AI2D import Ai2d
import os
from media.media import *
import nncase_runtime as nn
import ulab.numpy as np
import image
import gc
import sys
# 自定义AI任务类,继承自AIBase基类
class MyAIApp(AIBase):
def __init__(self, kmodel_path, model_input_size, rgb888p_size=[224,224], display_size=[1920,1080], debug_mode=0):
# 调用基类的构造函数
super().__init__(kmodel_path, model_input_size, rgb888p_size, debug_mode)
# 模型文件路径
self.kmodel_path = kmodel_path
# 模型输入分辨率
self.model_input_size = model_input_size
# sensor给到AI的图像分辨率,并对宽度进行16的对齐
self.rgb888p_size = [ALIGN_UP(rgb888p_size[0], 16), rgb888p_size[1]]
# 显示分辨率,并对宽度进行16的对齐
self.display_size = [ALIGN_UP(display_size[0], 16), display_size[1]]
# 是否开启调试模式
self.debug_mode = debug_mode
# 实例化Ai2d,用于实现模型预处理
self.ai2d = Ai2d(debug_mode)
# 设置Ai2d的输入输出格式和类型
self.ai2d.set_ai2d_dtype(nn.ai2d_format.NCHW_FMT, nn.ai2d_format.NCHW_FMT, np.uint8, np.uint8)
# 对于不使用ai2d完成预处理的AI任务,使用封装的接口或者ulab.numpy实现预处理,需要在子类中重写该函数
def preprocess(self,input_np):
#############
#注意自定义预处理过程
#############
return [tensor]
# 自定义当前任务的后处理,results是模型输出array列表,需要根据实际任务重写
def postprocess(self, results):
with ScopedTiming("postprocess", self.debug_mode > 0):
pass
# 绘制结果到画面上,需要根据任务自己写
def draw_result(self, pl, dets):
with ScopedTiming("display_draw", self.debug_mode > 0):
pass
if __name__ == "__main__":
# 显示模式,默认"hdmi",可以选择"hdmi"和"lcd"
display_mode="hdmi"
if display_mode=="hdmi":
display_size=[1920,1080]
else:
display_size=[800,480]
# 设置模型路径,这里要替换成当前任务模型
kmodel_path = "example_test.kmodel"
rgb888p_size = [1920, 1080]
###### 其它参数########
...
######################
# 初始化PipeLine,用于图像处理流程
pl = PipeLine(rgb888p_size=rgb888p_size, display_size=display_size, display_mode=display_mode)
pl.create() # 创建PipeLine实例
# 初始化自定义AI任务实例
my_ai = MyAIApp(kmodel_path, model_input_size=[320, 320],rgb888p_size=rgb888p_size, display_size=display_size, debug_mode=0)
my_ai.config_preprocess() # 配置预处理
try:
while True:
os.exitpoint() # 检查是否有退出信号
with ScopedTiming("total",1):
img = pl.get_frame() # 获取当前帧数据
res = my_ai.run(img) # 推理当前帧
my_ai.draw_result(pl, res) # 绘制结果
pl.show_image() # 显示结果
gc.collect() # 垃圾回收
except Exception as e:
sys.print_exception(e) # 打印 异常信息
finally:
my_ai.deinit() # 反初始化
pl.destroy() # 销毁PipeLine实例
以关键词唤醒keyword_spotting为例:
from libs.PipeLine import ScopedTiming
from libs.AIBase import AIBase
from libs.AI2D import Ai2d
from media.pyaudio import * # 音频模块
from media.media import * # 软件抽象模块,主要封装媒体数据链路以及媒体缓冲区
import media.wave as wave # wav音频处理模块
import nncase_runtime as nn # nncase运行模块,封装了kpu(kmodel推理)和ai2d(图片预处理加速)操作
import ulab.numpy as np # 类似python numpy操作,但也会有一些接口不同
import aidemo # aidemo模块,封装ai demo相关前处理、后处理等操作
import time # 时间统计
import struct # 字节字符转换模块
import gc # 垃圾回收模块
import os,sys # 操作系统接口模块
# 自定义关键词唤醒类,继承自AIBase基类
class KWSApp(AIBase):
def __init__(self, kmodel_path, threshold, debug_mode=0):
super().__init__(kmodel_path) # 调用基类的构造函数
self.kmodel_path = kmodel_path # 模型文件路径
self.threshold=threshold
self.debug_mode = debug_mode # 是否开启调试模式
self.cache_np = np.zeros((1, 256, 105), dtype=np.float)
# 自定义预处理,返回模型输入tensor列表
def preprocess(self,pcm_data):
pcm_data_list=[]
# 获取音频流数据
for i in range(0, len(pcm_data), 2):
# 每两个字节组织成一个有符号整数,然后将其转换为浮点数,即为一次采样的数据,加入到当前一帧(0.3s)的数据列表中
int_pcm_data = struct.unpack("<h", pcm_data[i:i+2])[0]
float_pcm_data = float(int_pcm_data)
pcm_data_list.append(float_pcm_data)
# 将pcm数据处理为模型输入的特征向量
mp_feats = aidemo.kws_preprocess(fp, pcm_data_list)[0]
mp_feats_np = np.array(mp_feats).reshape((1, 30, 40))
audio_input_tensor = nn.from_numpy(mp_feats_np)
cache_input_tensor = nn.from_numpy(self.cache_np)
return [audio_input_tensor,cache_input_tensor]
# 自定义当前任务的后处理,results是模型输出array列表
def postprocess(self, results):
with ScopedTiming("postprocess", self.debug_mode > 0):
logits_np = results[0]
self.cache_np= results[1]
max_logits = np.max(logits_np, axis=1)[0]
max_p = np.max(max_logits)
idx = np.argmax(max_logits)
# 如果分数大于阈值,且idx==1(即包含唤醒词),播放回复音频
if max_p > self.threshold and idx == 1:
return 1
else:
return 0
if __name__ == "__main__":
os.exitpoint(os.EXITPOINT_ENABLE)
nn.shrink_memory_pool()
# 设置模型路径和其他参数
kmodel_path = "/sdcard/app/tests/kmodel/kws.kmodel"
# 其它参数
THRESH = 0.5 # 检测阈值
SAMPLE_RATE = 16000 # 采样率16000Hz,即每秒采样16000次
CHANNELS = 1 # 通道数 1为单声道,2为立体声
FORMAT = paInt16 # 音频输入输出格式 paInt16
CHUNK = int(0.3 * 16000) # 每次读取音频数据的帧数,设置为0.3s的帧数16000*0.3=4800
reply_wav_file = "/sdcard/app/tests/utils/wozai.wav" # kws唤醒词回复音频路径
# 初始化音频预处理接口
fp = aidemo.kws_fp_create()
# 初始化音频流
p = PyAudio()
p.initialize(CHUNK)
MediaManager.init() #vb buffer初始化
# 用于采集实时音频数据
input_stream = p.open(format=FORMAT,channels=CHANNELS,rate=SAMPLE_RATE,input=True,frames_per_buffer=CHUNK)
# 用于播放回复音频
output_stream = p.open(format=FORMAT,channels=CHANNELS,rate=SAMPLE_RATE,output=True,frames_per_buffer=CHUNK)
# 初始化自定义关键词唤醒实例
kws = KWSApp(kmodel_path,threshold=THRESH,debug_mode=0)
try:
while True:
os.exitpoint() # 检查是否有退出信号
with ScopedTiming("total",1):
pcm_data=input_stream.read()
res=kws.run(pcm_data)
if res:
print("====Detected XiaonanXiaonan!====")
wf = wave.open(reply_wav_file, "rb")
wav_data = wf.read_frames(CHUNK)
while wav_data:
output_stream.write(wav_data)
wav_data = wf.read_frames(CHUNK)
time.sleep(1) # 时间缓冲,用于播放回复声音
wf.close()
else:
print("Deactivated!")
gc.collect() # 垃圾回收
except Exception as e:
sys.print_exception(e) # 打印异常信息
finally:
input_stream.stop_stream()
output_stream.stop_stream()
input_stream.close()
output_stream.close()
p.terminate()
MediaManager.deinit() #释放vb buffer
aidemo.kws_fp_destroy(fp)
kws.deinit() # 反初始化
1.3.4. 无预处理任务
对于不需要预处理(直接输入推理)的AI任务伪代码如下:
from libs.PipeLine import PipeLine, ScopedTiming
from libs.AIBase import AIBase
from libs.AI2D import Ai2d
import os
from media.media import *
import nncase_runtime as nn
import ulab.numpy as np
import image
import gc
import sys
# 自定义AI任务类,继承自AIBase基类
class MyAIApp(AIBase):
def __init__(self, kmodel_path, model_input_size, rgb888p_size=[224,224], display_size=[1920,1080], debug_mode=0):
# 调用基类的构造函数
super().__init__(kmodel_path, model_input_size, rgb888p_size, debug_mode)
# 模型文件路径
self.kmodel_path = kmodel_path
# 模型输入分辨率
self.model_input_size = model_input_size
# sensor给到AI的图像分辨率,并对宽度进行16的对齐
self.rgb888p_size = [ALIGN_UP(rgb888p_size[0], 16), rgb888p_size[1]]
# 显示分辨率,并对宽度进行16的对齐
self.display_size = [ALIGN_UP(display_size[0], 16), display_size[1]]
# 是否开启调试模式
self.debug_mode = debug_mode
# 自定义当前任务的后处理,results是模型输出array列表,需要根据实际任务重写
def postprocess(self, results):
with ScopedTiming("postprocess", self.debug_mode > 0):
pass
# 对于用预处理的AI任务,需要在子类中重写该函数
def run(self,inputs_np):
# 先将ulab.numpy.ndarray列表转换成tensor列表
tensors=[]
for input_np in inputs_np:
tensors.append(nn.from_numpy(input_np))
# 调用AIBase内的inference函数进行模型推理
results=self.inference(tensors)
# 调用当前子类的postprocess方法进行自定义后处理
outputs=self.postprocess(results)
return outputs
# 绘制结果到画面上,需要根据任务自己写
def draw_result(self, pl, dets):
with ScopedTiming("display_draw", self.debug_mode > 0):
pass
if __name__ == "__main__":
# 显示模式,默认"hdmi",可以选择"hdmi"和"lcd"
display_mode="hdmi"
if display_mode=="hdmi":
display_size=[1920,1080]
else:
display_size=[800,480]
# 设置模型路径,这里要替换成当前任务模型
kmodel_path = "example_test.kmodel"
rgb888p_size = [1920, 1080]
###### 其它参数########
...
######################
# 初始化PipeLine,用于图像处理流程
pl = PipeLine(rgb888p_size=rgb888p_size, display_size=display_size, display_mode=display_mode)
pl.create() # 创建PipeLine实例
# 初始化自定义AI任务实例
my_ai = MyAIApp(kmodel_path, model_input_size=[320, 320],rgb888p_size=rgb888p_size, display_size=display_size, debug_mode=0)
my_ai.config_preprocess() # 配置预处理
try:
while True:
os.exitpoint() # 检查是否有退出信号
with ScopedTiming("total",1):
img = pl.get_frame() # 获取当前帧数据
res = my_ai.run(img) # 推理当前帧
my_ai.draw_result(pl, res) # 绘制结果
pl.show_image() # 显示结果
gc.collect() # 垃圾回收
except Exception as e:
sys.print_exception(e) # 打印异常信息
finally:
my_ai.deinit() # 反初始化
pl.destroy() # 销毁PipeLine实例
比如单目标跟踪(nanotracker.py)中的追踪模块,只需要对模版模型和实时推理模型的输出作为追踪模型的输入,不需要预处理:
class TrackerApp(AIBase):
def __init__(self,kmodel_path,crop_input_size,thresh,rgb888p_size=[1280,720],display_size=[1920,1080],debug_mode=0):
super().__init__(kmodel_path,rgb888p_size,debug_mode)
# kmodel路径
self.kmodel_path=kmodel_path
# crop模型的输入尺寸
self.crop_input_size=crop_input_size
# 跟踪框阈值
self.thresh=thresh
# 跟踪框宽、高调整系数
self.CONTEXT_AMOUNT = 0.5
# sensor给到AI的图像分辨率,宽16字节对齐
self.rgb888p_size=[ALIGN_UP(rgb888p_size[0],16),rgb888p_size[1]]
# 视频输出VO分辨率,宽16字节对 齐
self.display_size=[ALIGN_UP(display_size[0],16),display_size[1]]
# debug模式
self.debug_mode=debug_mode
# 可以不定义
self.ai2d=Ai2d(debug_mode)
self.ai2d.set_ai2d_dtype(nn.ai2d_format.NCHW_FMT,nn.ai2d_format.NCHW_FMT,np.uint8, np.uint8)
# 可以不定义
def config_preprocess(self,input_image_size=None):
with ScopedTiming("set preprocess config",self.debug_mode > 0):
pass
# 重写run函数,因为没有预处理过程,所以原来run操作中包含的preprocess->inference->postprocess不合适,这里只包含inference->postprocess
def run(self,input_np_1,input_np_2,center_xy_wh):
input_tensors=[]
input_tensors.append(nn.from_numpy(input_np_1))
input_tensors.append(nn.from_numpy(input_np_2))
results=self.inference(input_tensors)
return self.postprocess(results,center_xy_wh)
# 自定义后处理,results是模型输出array的列表,这里使用了aidemo的nanotracker_postprocess列表
def postprocess(self,results,center_xy_wh):
with ScopedTiming("postprocess",self.debug_mode > 0):
det = aidemo.nanotracker_postprocess(results[0],results[1],[self.rgb888p_size[1],self.rgb888p_size[0]],self.thresh,center_xy_wh,self.crop_input_size[0],self.CONTEXT_AMOUNT)
return det
1.3.5. 多模型任务
这里以双模型串联推理为例,给出的伪代码如下:
from libs.PipeLine import PipeLine, ScopedTiming
from libs.AIBase import AIBase
from libs.AI2D import Ai2d
import os
from media.media import *
import nncase_runtime as nn
import ulab.numpy as np
import image
import gc
import sys
# 自定义AI任务类,继承自AIBase基类
class MyAIApp_1(AIBase):
def __init__(self, kmodel_path, model_input_size, rgb888p_size=[224,224], display_size=[1920,1080], debug_mode=0):
# 调用基类的构造函数
super().__init__(kmodel_path, model_input_size, rgb888p_size, debug_mode)
# 模型文件路径
self.kmodel_path = kmodel_path
# 模型输入分辨率
self.model_input_size = model_input_size
# sensor给到AI的图像分辨率,并对宽度进行16的对齐
self.rgb888p_size = [ALIGN_UP(rgb888p_size[0], 16), rgb888p_size[1]]
# 显示分辨率,并对宽度进行16的对齐
self.display_size = [ALIGN_UP(display_size[0], 16), display_size[1]]
# 是否开启调试模式
self.debug_mode = debug_mode
# 实例化Ai2d,用于实现模型预处理
self.ai2d = Ai2d(debug_mode)
# 设置Ai2d的输入输出格式和类型
self.ai2d.set_ai2d_dtype(nn.ai2d_format.NCHW_FMT, nn.ai2d_format.NCHW_FMT, np.uint8, np.uint8)
# 配置预处理操作,这里使用了pad和resize,Ai2d支持crop/shift/pad/resize/affine,具体代码请打开/sdcard/app/libs/AI2D.py查看
def config_preprocess(self, input_image_size=None):
with ScopedTiming("set preprocess config", self.debug_mode > 0):
# 初始化ai2d预处理配置,默认为sensor给到AI的尺寸,可以通过设置input_image_size自行修改输入尺寸
ai2d_input_size = input_image_size if input_image_size else self.rgb888p_size
# 配置resize预处理方法
self.ai2d.resize(nn.interp_method.tf_bilinear, nn.interp_mode.half_pixel)
# 构建预处理流程
self.ai2d.build([1,3,ai2d_input_size[1],ai2d_input_size[0]],[1,3,self.model_input_size[1],self.model_input_size[0]])
# 自定义当前任务的后处理,results是模型输出array列表,需要根据实际任务重写
def postprocess(self, results):
with ScopedTiming("postprocess", self.debug_mode > 0):
pass
# 自定义AI任务类,继承自AIBase基类
class MyAIApp_2(AIBase):
def __init__(self, kmodel_path, model_input_size, rgb888p_size=[224,224], display_size=[1920,1080], debug_mode=0):
# 调用基类的构造函数
super().__init__(kmodel_path, model_input_size, rgb888p_size, debug_mode)
# 模型文件路径
self.kmodel_path = kmodel_path
# 模型输入分辨率
self.model_input_size = model_input_size
# sensor给到AI的图像分辨率,并对宽度进行16的对齐
self.rgb888p_size = [ALIGN_UP(rgb888p_size[0], 16), rgb888p_size[1]]
# 显示分辨率,并对宽度进行16的对齐
self.display_size = [ALIGN_UP(display_size[0], 16), display_size[1]]
# 是否开启调试模式
self.debug_mode = debug_mode
# 实例化Ai2d,用于实现模型预处理
self.ai2d = Ai2d(debug_mode)
# 设置Ai2d的输入输出格式和类型
self.ai2d.set_ai2d_dtype(nn.ai2d_format.NCHW_FMT, nn.ai2d_format.NCHW_FMT, np.uint8, np.uint8)
# 配置预处理操作,这里使用了pad和resize,Ai2d支持crop/shift/pad/resize/affine,具体代码请打开/sdcard/app/libs/AI2D.py查看
def config_preprocess(self, input_image_size=None):
with ScopedTiming("set preprocess config", self.debug_mode > 0):
# 初始化ai2d预处理配置,默认为sensor给到AI的尺寸,可以通过设置input_image_size自行修改输入尺寸
ai2d_input_size = input_image_size if input_image_size else self.rgb888p_size
# 配置resize预处理方法
self.ai2d.resize(nn.interp_method.tf_bilinear, nn.interp_mode.half_pixel)
# 构建预处理流程
self.ai2d.build([1,3,ai2d_input_size[1],ai2d_input_size[0]],[1,3,self.model_input_size[1],self.model_input_size[0]])
# 自定义当前任务的后处理,results是模型输出array列表,需要根据实际任务重写
def postprocess(self, results):
with ScopedTiming("postprocess", self.debug_mode > 0):
pass
class MyApp:
def __init__(kmodel1_path,kmodel2_path,kmodel1_input_size,kmodel2_input_size,rgb888p_size,display_size,debug_mode):
# 创建两个模型推理的实例
self.app_1=MyApp_1(kmodel1_path,kmodel1_input_size,rgb888p_size,display_size,debug_mode)
self.app_2=MyApp_2(kmodel2_path,kmodel2_input_size,rgb888p_size,display_size,debug_mode)
self.app_1.config_preprocess()
# 编写run函数,具体代码根据AI任务的需求编写,此处只是给出一个示例
def run(self,input_np):
outputs_1=self.app_1.run(input_np)
outputs_2=[]
for out in outputs_1:
self.app_2.config_preprocess(out)
out_2=self.app_2.run(input_np)
outputs_2.append(out_2)
return outputs_1,outputs_2
# 绘制
def draw_result(self,pl,outputs_1,outputs_2):
pass
######其他函数########
...
####################
if __name__ == "__main__":
# 显示模式,默认"hdmi",可以选择"hdmi"和"lcd"
display_mode="hdmi"
if display_mode=="hdmi":
display_size=[1920,1080]
else:
display_size=[800,480]
rgb888p_size = [1920, 1080]
# 设置模型路径,这里要替换成当前任务模型
kmodel1_path = "test_kmodel1.kmodel"
kmdoel1_input_size=[320,320]
kmodel2_path = "test_kmodel2.kmodel"
kmodel2_input_size=[48,48]
###### 其它参数########
...
######################
# 初始化PipeLine,用于图像处理流程
pl = PipeLine(rgb888p_size=rgb888p_size, display_size=display_size, display_mode=display_mode)
pl.create() # 创建PipeLine实例
# 初始化自定义AI任务实例
my_ai = MyApp(kmodel1_path,kmodel2_path, kmodel1_input_size,kmodel2_input_size,rgb888p_size=rgb888p_size, display_size=display_size, debug_mode=0)
my_ai.config_preprocess() # 配置预处理
try:
while True:
os.exitpoint() # 检查是否有退出信号
with ScopedTiming("total",1):
img = pl.get_frame() # 获取当前帧数据
outputs_1,outputs_2 = my_ai.run(img) # 推理当前帧
my_ai.draw_result(pl, outputs_1,outputs_2) # 绘制结果
pl.show_image() # 显示结果
gc.collect() # 垃圾回收
except Exception as e:
sys.print_exception(e) # 打印异常信息
finally:
my_ai.app_1.deinit() # 反初始化
my_ai.app_2.deinit()
pl.destroy() # 销毁PipeLine实例
下面以车牌检测为例给出示例代码:
from libs.PipeLine import PipeLine, ScopedTiming
from libs.AIBase import AIBase
from libs.AI2D import Ai2d
import os
import ujson
from media.media import *
from time import *
import nncase_runtime as nn
import ulab.numpy as np
import time
import image
import aidemo
import random
import gc
import sys
# 自定义车牌检测类
class LicenceDetectionApp(AIBase):
# 初始化函数,设置车牌检测应用的参数
def __init__(self, kmodel_path, model_input_size, confidence_threshold=0.5, nms_threshold=0.2, rgb888p_size=[224,224], display_size=[1920,1080], debug_mode=0):
super().__init__(kmodel_path, model_input_size, rgb888p_size, debug_mode) # 调用基类的初始化函数
self.kmodel_path = kmodel_path # 模型路径
# 模型输入分辨率
self.model_input_size = model_input_size
# 分类阈值
self.confidence_threshold = confidence_threshold
self.nms_threshold = nms_threshold
# sensor给到AI的图像分辨率
self.rgb888p_size = [ALIGN_UP(rgb888p_size[0], 16), rgb888p_size[1]]
# 显示分辨率
self.display_size = [ALIGN_UP(display_size[0], 16), display_size[1]]
self.debug_mode = debug_mode
# Ai2d实例,用于实现模型预处理
self.ai2d = Ai2d(debug_mode)
# 设置Ai2d的输入输出格式和类型
self.ai2d.set_ai2d_dtype(nn.ai2d_format.NCHW_FMT, nn.ai2d_format.NCHW_FMT, np.uint8, np.uint8)
# 配置预处理操作,这里使用了pad和resize,Ai2d支持crop/shift/pad/resize/affine
def config_preprocess(self, input_image_size=None):
with ScopedTiming("set preprocess config", self.debug_mode > 0):
# 初始化ai2d预处 理配置,默认为sensor给到AI的尺寸,可以通过设置input_image_size自行修改输入尺寸
ai2d_input_size = input_image_size if input_image_size else self.rgb888p_size
self.ai2d.resize(nn.interp_method.tf_bilinear, nn.interp_mode.half_pixel)
self.ai2d.build([1,3,ai2d_input_size[1],ai2d_input_size[0]],[1,3,self.model_input_size[1],self.model_input_size[0]])
# 自定义当前任务的后处理
def postprocess(self, results):
with ScopedTiming("postprocess", self.debug_mode > 0):
# 对检测结果进行后处理
det_res = aidemo.licence_det_postprocess(results, [self.rgb888p_size[1], self.rgb888p_size[0]], self.model_input_size, self.confidence_threshold, self.nms_threshold)
return det_res
# 自定义车牌识别任务类
class LicenceRecognitionApp(AIBase):
def __init__(self,kmodel_path,model_input_size,rgb888p_size=[1920,1080],display_size=[1920,1080],debug_mode=0):
super().__init__(kmodel_path,model_input_size,rgb888p_size,debug_mode)
# kmodel路径
self.kmodel_path=kmodel_path
# 检测模型输入分辨率
self.model_input_size=model_input_size
# sensor给到AI的图像分辨率,宽16字节对齐
self.rgb888p_size=[ALIGN_UP(rgb888p_size[0],16),rgb888p_size[1]]
# 视频输出VO分辨率,宽16字节对齐
self.display_size=[ALIGN_UP(display_size[0],16),display_size[1]]
# debug模式
self.debug_mode=debug_mode
# 车牌字符字典
self.dict_rec = ["挂", "使", "领", "澳", "港", "皖", "沪", "津", "渝", "冀", "晋", "蒙", "辽", "吉", "黑", "苏", "浙", "京", "闽", "赣", "鲁", "豫", "鄂", "湘", "粤", "桂", "琼", "川", "贵", "云", "藏", "陕", "甘", "青", "宁", "新", "警", "学", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "A", "B", "C", "D", "E", "F", "G", "H", "J", "K", "L", "M", "N", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "_", "-"]
self.dict_size = len(self.dict_rec)
self.ai2d=Ai2d(debug_mode)
self.ai2d.set_ai2d_dtype(nn.ai2d_format.NCHW_FMT,nn.ai2d_format.NCHW_FMT,np.uint8, np.uint8)
# 配置预处理操作,这里使用了resize,Ai2d支持crop/shift/pad/resize/affine
def config_preprocess(self,input_image_size=None):
with ScopedTiming("set preprocess config",self.debug_mode > 0):
ai2d_input_size=input_image_size if input_image_size else self.rgb888p_size
self.ai2d.resize(nn.interp_method.tf_bilinear, nn.interp_mode.half_pixel)
self.ai2d.build([1,3,ai2d_input_size[1],ai2d_input_size[0]],[1,3,self.model_input_size[1],self.model_input_size[0]])
# 自定义后处理,results是模型输出的array列表
def postprocess(self,results):
with ScopedTiming("postprocess",self.debug_mode > 0):
output_data=results[0].reshape((-1,self.dict_size))
max_indices = np.argmax(output_data, axis=1)
result_str = ""
for i in range(max_indices.shape[0]):
index = max_indices[i]
if index > 0 and (i == 0 or index != max_indices[i - 1]):
result_str += self.dict_rec[index - 1]
return result_str
# 车牌识别任务类
class LicenceRec:
def __init__(self,licence_det_kmodel,licence_rec_kmodel,det_input_size,rec_input_size,confidence_threshold=0.25,nms_threshold=0.3,rgb888p_size=[1920,1080],display_size=[1920,1080],debug_mode=0):
# 车牌检测模型路径
self.licence_det_kmodel=licence_det_kmodel
# 车牌识别模型路径
self.licence_rec_kmodel=licence_rec_kmodel
# 人脸检测模型输入分辨率
self.det_input_size=det_input_size
# 人脸姿态模型输入分辨率
self.rec_input_size=rec_input_size
# 置信度阈值
self.confidence_threshold=confidence_threshold
# nms阈值
self.nms_threshold=nms_threshold
# sensor给到AI的图像分辨率,宽16字节对齐
self.rgb888p_size=[ALIGN_UP(rgb888p_size[0],16),rgb888p_size[1]]
# 视频输出VO分辨率,宽16字节对齐
self.display_size=[ALIGN_UP(display_size[0],16),display_size[1]]
# debug_mode模式
self.debug_mode=debug_mode
self.licence_det=LicenceDetectionApp(self.licence_det_kmodel,model_input_size=self.det_input_size,confidence_threshold=self.confidence_threshold,nms_threshold=self.nms_threshold,rgb888p_size=self.rgb888p_size,display_size=self.display_size,debug_mode=0)
self.licence_rec=LicenceRecognitionApp(self.licence_rec_kmodel,model_input_size=self.rec_input_size,rgb888p_size=self.rgb888p_size)
self.licence_det.config_preprocess()
# run函数
def run(self,input_np):
# 执行车牌检测
det_boxes=self.licence_det.run(input_np)
# 将车牌部分抠出来
imgs_array_boxes = aidemo.ocr_rec_preprocess(input_np,[self.rgb888p_size[1],self.rgb888p_size[0]],det_boxes)
imgs_array = imgs_array_boxes[0]
boxes = imgs_array_boxes[1]
rec_res = []
for img_array in imgs_array:
# 对每一个检测到的车牌进行识别
self.licence_rec.config_preprocess(input_image_size=[img_array.shape[3],img_array.shape[2]])
licence_str=self.licence_rec.run(img_array)
rec_res.append(licence_str)
gc.collect()
return det_boxes,rec_res
# 绘制车牌检测识别效果
def draw_result(self,pl,det_res,rec_res):
pl.osd_img.clear()
if det_res:
point_8 = np.zeros((8),dtype=np.int16)
for det_index in range(len(det_res)):
for i in range(4):
x = det_res[det_index][i * 2 + 0]/self.rgb888p_size[0]*self.display_size[0]
y = det_res[det_index][i * 2 + 1]/self.rgb888p_size[1]*self.display_size[1]
point_8[i * 2 + 0] = int(x)
point_8[i * 2 + 1] = int(y)
for i in range(4):
pl.osd_img.draw_line(point_8[i * 2 + 0],point_8[i * 2 + 1],point_8[(i+1) % 4 * 2 + 0],point_8[(i+1) % 4 * 2 + 1],color=(255, 0, 255, 0),thickness=4)
pl.osd_img.draw_string_advanced( point_8[6], point_8[7] + 20, 40,rec_res[det_index] , color=(255,255,153,18))
if __name__=="__main__":
# 显示模式,默认"hdmi",可以选择"hdmi"和"lcd"
display_mode="hdmi"
if display_mode=="hdmi":
display_size=[1920,1080]
else:
display_size=[800,480]
# 车牌检测模型路径
licence_det_kmodel_path="/sdcard/app/tests/kmodel/LPD_640.kmodel"
# 车牌识别模型路径
licence_rec_kmodel_path="/sdcard/app/tests/kmodel/licence_reco.kmodel"
# 其它参数
rgb888p_size=[640,360]
licence_det_input_size=[640,640]
licence_rec_input_size=[220,32]
confidence_threshold=0.2
nms_threshold=0.2
# 初始化PipeLine,只关注传给AI的图像分辨率,显示的分辨率
pl=PipeLine(rgb888p_size=rgb888p_size,display_size=display_size,display_mode=display_mode)
pl.create()
lr=LicenceRec(licence_det_kmodel_path,licence_rec_kmodel_path,det_input_size=licence_det_input_size,rec_input_size=licence_rec_input_size,confidence_threshold=confidence_threshold,nms_threshold=nms_threshold,rgb888p_size=rgb888p_size,display_size=display_size)
try:
while True:
os.exitpoint()
with ScopedTiming("total",1):
img=pl.get_frame() # 获取当前帧
det_res,rec_res=lr.run(img) # 推理当前帧
lr.draw_result(pl,det_res,rec_res) # 绘制当前帧推理结果
pl.show_image() # 展示推理结果
gc.collect()
except Exception as e:
sys.print_exception(e)
finally:
lr.licence_det.deinit()
lr.licence_rec.deinit()
pl.destroy()
1.4. 参考文档
1.4.1. k230 canmv文档
文档链接:Welcome to K230 CanMV’s documentation! — K230 CanMV 文档 (canaan-creative.com)
1.4.2. Ulab库支持
链接: ulab – Manipulate numeric data similar to numpy — Adafruit CircuitPython 9.1.0-beta.3 documentation
2. AI Demo
AI Demo分为两种类型:单模型、多模型,涵盖物体、人脸、人手、人体、车牌、OCR、音频(KWS、TTS)等方向;参考该文档,k230用户可以更快上手K230 AI应用的开发,实现预期效果。
Demo名称 | 场景 | 任务类型 |
---|---|---|
dynamic_gesture | 动态手势识别 | 多模型任务 |
eye_gaze | 注视估计 | 多模型任务 |
face_detection | 人脸检测 | 单模型任务 |
face_landmark | 人脸关键部位 | 多模型任务 |
face_mesh | 人脸3D网格 | 多模型任务 |
face_parse | 人脸解析 | 多模型任务 |
face_pose | 人脸姿态 | 多模型任务 |
face_recognition | 人脸识别 | 多模型任务 |
face_registration | 人脸注册 | 多模型任务 |
falldown_detection | 跌倒检测 | 单模型任务 |
finger_guessing | 猜拳游戏 | 多模型任务 |
hand_detection | 手掌检测 | 单模型任务 |
hand_keypoint_class | 手掌关键点分类 | 多模型任务 |
hand_keypoint_detection | 手掌关键点检测 | 多模型任务 |
hand_recognition | 手势识别 | 多模型任务 |
keyword_spotting | 关键词唤醒 | 单模型任务 |
licence_det | 车牌检测 | 单模型任务 |
licence_det_rec | 车牌识别 | 多模型任务 |
nanotracker | 单目标跟踪 | 多模型任务 |
object_detect_yolov8n | yolov8n目标检测 | 单模型任务 |
ocr_det | OCR检测 | 单模型任务 |
ocr_rec | OCR识别 | 多模型任务 |
person_detection | 人体检测 | 单模型任务 |
person_kp_detect | 人体关键点检测 | 多模型任务 |
puzzle_game | 拼图游戏 | 多模型任务 |
segment_yolov8n | yolov8分割 | 单模型任务 |
self_learning | 自学习 | 单模型任务 |
space_resize | 局部放大器 | 多模型任务 |
tts_zh | 中文文本转语音 | 多模型任务 |
2.1. 动态手势识别
from libs.PipeLine import PipeLine, ScopedTiming
from libs.AIBase import AIBase
from libs.AI2D import Ai2d
from random import randint
import os
import ujson
from media.media import *
from time import *
import nncase_runtime as nn
import ulab.numpy as np
import time
import image
import aicube
import random
import gc
import sys
# 自定义手掌检测任务类
class HandDetApp(AIBase):
def __init__(self,kmodel_path,labels,model_input_size,anchors,confidence_threshold=0.2,nms_threshold=0.5,nms_option=False, strides=[8,16,32],rgb888p_size=[1920,1080],display_size=[1920,1080],debug_mode=0):
super().__init__(kmodel_path,model_input_size,rgb888p_size,debug_mode)
# kmodel路径
self.kmodel_path=kmodel_path
# 检测标签
self.labels=labels
# 检测模型输入分辨率
self.model_input_size=model_input_size
# 置信度阈值
self.confidence_threshold=confidence_threshold
# nms阈值
self.nms_threshold=nms_threshold
# 检测锚框
self.anchors=anchors
self.strides = strides # 特征下采样倍数
self.nms_option = nms_option # NMS选项,如果为True做类间NMS,如果为False做类内NMS
# sensor给到AI的图像分辨率,宽16字节对齐
self.rgb888p_size=[ALIGN_UP(rgb888p_size[0],16),rgb888p_size[1]]
# 视频输出VO分辨率,宽16字节对齐
self.display_size=[ALIGN_UP(display_size[0],16),display_size[1]]
# debug模式
self.debug_mode=debug_mode
# Ai2d实例,用于实现模型预处理
self.ai2d=Ai2d(debug_mode)
# 设置Ai2d的输入输出格式和类型
self.ai2d.set_ai2d_dtype(nn.ai2d_format.NCHW_FMT,nn.ai2d_format.NCHW_FMT,np.uint8, np.uint8)
# 配置预处理操作,这里使用了padding和resize,Ai2d支持crop/shift/pad/resize/affine,具体代码请打开/sdcard/app/libs/AI2D.py查看
def config_preprocess(self,input_image_size=None):
with ScopedTiming("set preprocess config",self.debug_mode > 0):
# 初始化ai2d预处理配置,默认为sensor给到AI的尺寸,可以通过设置input_image_size自行修改输入尺寸
ai2d_input_size = input_image_size if input_image_size else self.rgb888p_size
# 计算padding参数并应用pad操作,以确保输入图像尺寸与模型输入尺寸匹配
top, bottom, left, right = self.get_padding_param()
self.ai2d.pad([0, 0, 0, 0, top, bottom, left, right], 0, [114, 114, 114])
# 使用双线性插值进行resize操作,调整图像尺寸以符合模型输入要求
self.ai2d.resize(nn.interp_method.tf_bilinear, nn.interp_mode.half_pixel)
# 构建预处理流程,参数为预处理输入tensor的shape和预处理输出的tensor的shape
self.ai2d.build([1,3,ai2d_input_size[1],ai2d_input_size[0]],[1,3,self.model_input_size[1],self.model_input_size[0]])
# 自定义后处理过程,这里使用了aicube的anchorbasedet_post_process接口
def postprocess(self,results):
with ScopedTiming("postprocess",self.debug_mode > 0):
dets = aicube.anchorbasedet_post_process(results[0], results[1], results[2], self.model_input_size, self.rgb888p_size, self.strides, len(self.labels), self.confidence_threshold, self.nms_threshold, self.anchors, self.nms_option)
# 返回手掌检测结果
return dets
# 计算padding参数,确保输入图像尺寸与模型输入尺寸匹配
def get_padding_param(self):
# 根据目标宽度和高度计算比例因子
dst_w = self.model_input_size[0]
dst_h = self.model_input_size[1]
input_width = self.rgb888p_size[0]
input_high = self.rgb888p_size[1]
ratio_w = dst_w / input_width
ratio_h = dst_h / input_high
# 选择较小的比例因子,以确保图像内容完整
if ratio_w < ratio_h:
ratio = ratio_w
else:
ratio = ratio_h
# 计算新的宽度和高度
new_w = int(ratio * input_width)
new_h = int(ratio * input_high)
# 计算宽度和高度的差值,并确定padding的位置
dw = (dst_w - new_w) / 2
dh = (dst_h - new_h) / 2
top = int(round(dh - 0.1))
bottom = int(round(dh + 0.1))
left = int(round(dw - 0.1))
right = int(round(dw + 0.1))
return top, bottom, left, right
# 自定义手势关键点分类任务类
class HandKPClassApp(AIBase):
def __init__(self,kmodel_path,model_input_size,rgb888p_size=[1920,1080],display_size=[1920,1080],debug_mode=0):
super().__init__(kmodel_path,model_input_size,rgb888p_size,debug_mode)
# kmodel路径
self.kmodel_path=kmodel_path
# 手掌关键点模型输入分辨率
self.model_input_size=model_input_size
# sensor给 到AI的图像分辨率,宽16字节对齐
self.rgb888p_size=[ALIGN_UP(rgb888p_size[0],16),rgb888p_size[1]]
# 视频输出VO分辨率,宽16字节对齐
self.display_size=[ALIGN_UP(display_size[0],16),display_size[1]]
# crop参数列表
self.crop_params=[]
# debug模式
self.debug_mode=debug_mode
# Ai2d实例,用于实现模型预处理
self.ai2d=Ai2d(debug_mode)
# 设置Ai2d的输入输出格式和类型
self.ai2d.set_ai2d_dtype(nn.ai2d_format.NCHW_FMT,nn.ai2d_format.NCHW_FMT,np.uint8, np.uint8)
# 配置预处理操作,这里使用了crop和resize,Ai2d支持crop/shift/pad/resize/affine,具体代码请打开/sdcard/app/libs/AI2D.py查看
def config_preprocess(self,det,input_image_size=None):
with ScopedTiming("set preprocess config",self.debug_mode > 0):
# 如果input_image_size为None,使用视频出图大小,否则按照自定义设置
ai2d_input_size=input_image_size if input_image_size else self.rgb888p_size
# 计算crop参数
self.crop_params = self.get_crop_param(det)
# 设置crop预处理过程
self.ai2d.crop(self.crop_params[0],self.crop_params[1],self.crop_params[2],self.crop_params[3])
# 设置resize预处理过程
self.ai2d.resize(nn.interp_method.tf_bilinear, nn.interp_mode.half_pixel)
# build预处理过程,参数为输入tensor的shape和输出tensor的shape
self.ai2d.build([1,3,ai2d_input_size[1],ai2d_input_size[0]],[1,3,self.model_input_size[1],self.model_input_size[0]])
# 自定义后处理,results是模型输出的array列表
def postprocess(self,results):
with ScopedTiming("postprocess",self.debug_mode > 0):
results=results[0].reshape(results[0].shape[0]*results[0].shape[1])
results_show = np.zeros(results.shape,dtype=np.int16)
results_show[0::2] = results[0::2] * self.crop_params[3] + self.crop_params[0]
results_show[1::2] = results[1::2] * self.crop_params[2] + self.crop_params[1]
# 根据输出计算手势
gesture=self.hk_gesture(results_show)
return results_show,gesture
# 计算crop参数
def get_crop_param(self,det_box):
x1, y1, x2, y2 = det_box[2],det_box[3],det_box[4],det_box[5]
w,h= int(x2 - x1),int(y2 - y1)
w_det = int(float(x2 - x1) * self.display_size[0] // self.rgb888p_size[0])
h_det = int(float(y2 - y1) * self.display_size[1] // self.rgb888p_size[1])
x_det = int(x1*self.display_size[0] // self.rgb888p_size[0])
y_det = int(y1*self.display_size[1] // self.rgb888p_size[1])
length = max(w, h)/2
cx = (x1+x2)/2
cy = (y1+y2)/2
ratio_num = 1.26*length
x1_kp = int(max(0,cx-ratio_num))
y1_kp = int(max(0,cy-ratio_num))
x2_kp = int(min(self.rgb888p_size[0]-1, cx+ratio_num))
y2_kp = int(min(self.rgb888p_size[1]-1, cy+ratio_num))
w_kp = int(x2_kp - x1_kp + 1)
h_kp = int(y2_kp - y1_kp + 1)
return [x1_kp, y1_kp, w_kp, h_kp]
# 求两个vector之间的夹角
def hk_vector_2d_angle(self,v1,v2):
with ScopedTiming("hk_vector_2d_angle",self.debug_mode > 0):
v1_x,v1_y,v2_x,v2_y = v1[0],v1[1],v2[0],v2[1]
v1_norm = np.sqrt(v1_x * v1_x+ v1_y * v1_y)
v2_norm = np.sqrt(v2_x * v2_x + v2_y * v2_y)
dot_product = v1_x * v2_x + v1_y * v2_y
cos_angle = dot_product/(v1_norm*v2_norm)
angle = np.acos(cos_angle)*180/np.pi
return angle
# 根据手掌关键点检测结果判断手势类别
def hk_gesture(self,results):
with ScopedTiming("hk_gesture",self.debug_mode > 0):
angle_list = []
for i in range(5):
angle = self.hk_vector_2d_angle([(results[0]-results[i*8+4]), (results[1]-results[i*8+5])],[(results[i*8+6]-results[i*8+8]),(results[i*8+7]-results[i*8+9])])
angle_list.append(angle)
thr_angle,thr_angle_thumb,thr_angle_s,gesture_str = 65.,53.,49.,None
if 65535. not in angle_list:
if (angle_list[0]>thr_angle_thumb) and (angle_list[1]>thr_angle) and (angle_list[2]>thr_angle) and (angle_list[3]>thr_angle) and (angle_list[4]>thr_angle):
gesture_str = "fist"
elif (angle_list[0]<thr_angle_s) and (angle_list[1]<thr_angle_s) and (angle_list[2]<thr_angle_s) and (angle_list[3]<thr_angle_s) and (angle_list[4]<thr_angle_s):
gesture_str = "five"
elif (angle_list[0]<thr_angle_s) and (angle_list[1]<thr_angle_s) and (angle_list[2]>thr_angle) and (angle_list[3]>thr_angle) and (angle_list[4]>thr_angle):
gesture_str = "gun"
elif (angle_list[0]<thr_angle_s) and (angle_list[1]<thr_angle_s) and (angle_list[2]>thr_angle) and (angle_list[3]>thr_angle) and (angle_list[4]<thr_angle_s):
gesture_str = "love"
elif (angle_list[0]>5) and (angle_list[1]<thr_angle_s) and (angle_list[2]>thr_angle) and (angle_list[3]>thr_angle) and (angle_list[4]>thr_angle):
gesture_str = "one"
elif (angle_list[0]<thr_angle_s) and (angle_list[1]>thr_angle) and (angle_list[2]>thr_angle) and (angle_list[3]>thr_angle) and (angle_list[4]<thr_angle_s):
gesture_str = "six"
elif (angle_list[0]>thr_angle_thumb) and (angle_list[1]<thr_angle_s) and (angle_list[2]<thr_angle_s) and (angle_list[3]<thr_angle_s) and (angle_list[4]>thr_angle):
gesture_str = "three"
elif (angle_list[0]<thr_angle_s) and (angle_list[1]>thr_angle) and (angle_list[2]>thr_angle) and (angle_list[3]>thr_angle) and (angle_list[4]>thr_angle):
gesture_str = "thumbUp"
elif (angle_list[0]>thr_angle_thumb) and (angle_list[1]<thr_angle_s) and (angle_list[2]<thr_angle_s) and (angle_list[3]>thr_angle) and (angle_list[4]>thr_angle):
gesture_str = "yeah"
return gesture_str
# 自定义动态手势识别任务类
class DynamicGestureApp(AIBase):
def __init__(self,kmodel_path,model_input_size,rgb888p_size=[1920,1080],display_size=[1920,1080],debug_mode=0):
super().__init__(kmodel_path,model_input_size,rgb888p_size,debug_mode)
# kmodel路径
self.kmodel_path=kmodel_path
# 检测模型输入分辨率
self.model_input_size=model_input_size
# sensor给到AI的图像分辨率,宽16字节对齐
self.rgb888p_size=[ALIGN_UP(rgb888p_size[0],16),rgb888p_size[1]]
# 视频输出VO分辨率,宽16字节对齐
self.display_size=[ALIGN_UP(display_size[0],16),display_size[1]]
# debug模式
self.debug_mode=debug_mode
# 注意:ai2d设置多个预处理时执行的顺序为:crop->shift->resize/affine->pad,如果不符合该顺序,需要配置多个ai2d对象;
# 如下模型预处理要先做resize再做crop,因此要配置两个Ai2d对象
self.ai2d_resize=Ai2d(debug_mode)
self.ai2d_resize.set_ai2d_dtype(nn.ai2d_format.NCHW_FMT,nn.ai2d_format.NCHW_FMT,np.uint8, np.uint8)
self.ai2d_crop=Ai2d(debug_mode)
self.ai2d_crop.set_ai2d_dtype(nn.ai2d_format.NCHW_FMT,nn.ai2d_format.NCHW_FMT,np.uint8, np.uint8)
# 动态手势识别模型输入tensors列表
self.input_tensors=[]
# 动态手势识别模型的输入tensor的shape
self.gesture_kmodel_input_shape = [[1, 3, 224, 224], # 动态手势识别kmodel输入分辨率
[1,3,56,56],
[1,4,28,28],
[1,4,28,28],
[1,8,14,14],
[1,8,14,14],
[1,8,14,14],
[1,12,14,14],
[1,12,14,14],
[1,20,7,7],
[1,20,7,7]]
# 预处理参数
self.resize_shape = 256
self.mean_values = np.array([0.485, 0.456, 0.406]).reshape((3,1,1)) # 动态手势识别预处理均值
self.std_values = np.array([0.229, 0.224, 0.225]).reshape((3,1,1)) # 动态手势识别预处理方差
self.first_data=None
self.max_hist_len=20
self.crop_params=self.get_crop_param()
# 配置预处理
def config_preprocess(self,input_image_size=None):
with ScopedTiming("set preprocess config",self.debug_mode > 0):
# 初始化ai2d预处理配置
ai2d_input_size=input_image_size if input_image_size else self.rgb888p_size
# 配置resize和crop预处理
self.ai2d_resize.resize(nn.interp_method.tf_bilinear, nn.interp_mode.half_pixel)
self.ai2d_resize.build([1,3,ai2d_input_size[1],ai2d_input_size[0]],[1,3,self.crop_params[1],self.crop_params[0]])
self.ai2d_crop.crop(self.crop_params[2],self.crop_params[3],self.crop_params[4],self.crop_params[5])
self.ai2d_crop.build([1,3,self.crop_params[1],self.crop_params[0]],[1,3,self.model_input_size[1],self.model_input_size[0]])
# 初始化动态手势识别模型输入列表
inputs_num=self.get_kmodel_inputs_num()
self.first_data = np.ones(self.gesture_kmodel_input_shape[0], dtype=np.float)
for i in range(inputs_num):
data = np.zeros(self.gesture_kmodel_input_shape[i], dtype=np.float)
self.input_tensors.append(nn.from_numpy(data))
# 重写预处理,因为该部分不是单纯的走一个ai2d做预处理,所以该函数需要重写
def preprocess(self,input_np):
# 先走resize,再走crop
resize_tensor=self.ai2d_resize.run(input_np)
crop_output_tensor=self.ai2d_crop.run(resize_tensor.to_numpy())
ai2d_output = crop_output_tensor.to_numpy()
self.first_data[0] = ai2d_output[0].copy()
self.first_data[0] = (self.first_data[0]*1.0/255 -self.mean_values)/self.std_values
self.input_tensors[0]=nn.from_numpy(self.first_data)
return
# run函数重写
def run(self,input_np,his_logit,history):
# 预处理
self.preprocess(input_np)
# 推理
outputs=self.inference(self.input_tensors)
# 使用当前帧的输出更新下一帧的输入列表
outputs_num=self.get_kmodel_outputs_num()
for i in range(1,outputs_num):
self.input_tensors[i]=nn.from_numpy(outputs[i])
# 返回后处理结果
return self.postprocess(outputs,his_logit,history)
# 自定义后处理
def postprocess(self,results,his_logit, history):
with ScopedTiming("postprocess",self.debug_mode > 0):
his_logit.append(results[0])
avg_logit = sum(np.array(his_logit))
idx_ = np.argmax(avg_logit)
idx = self.gesture_process_output(idx_, history)
if (idx_ != idx):
his_logit_last = his_logit[-1]
his_logit = []
his_logit.append(his_logit_last)
return idx, avg_logit
# 手势处理函数
def gesture_process_output(self,pred,history):
if (pred == 7 or pred == 8 or pred == 21 or pred == 22 or pred == 3 ):
pred = history[-1]
if (pred == 0 or pred == 4 or pred == 6 or pred == 9 or pred == 14 or pred == 1 or pred == 19 or pred == 20 or pred == 23 or pred == 24) :
pred = history[-1]
if (pred == 0) :
pred = 2
if (pred != history[-1]) :
if (len(history)>= 2) :
if (history[-1] != history[len(history)-2]) :
pred = history[-1]
history.append(pred)
if (len(history) > self.max_hist_len) :
history = history[-self.max_hist_len:]
return history[-1]
# 计算crop参数
def get_crop_param(self):
ori_w = self.rgb888p_size[0]
ori_h = self.rgb888p_size[1]
width = self.model_input_size[0]
height = self.model_input_size[1]
ratiow = float(self.resize_shape) / ori_w
ratioh = float(self.resize_shape) / ori_h
if ratiow < ratioh:
ratio = ratioh
else:
ratio = ratiow
new_w = int(ratio * ori_w)
new_h = int(ratio * ori_h)
top = int((new_h-height)/2)
left = int((new_w-width)/2)
return new_w,new_h,left,top,width,height
# 重写逆初始化
def deinit(self):
with ScopedTiming("deinit",self.debug_mode > 0):
del self.kpu
del self.ai2d_resize
del self.ai2d_crop
self.tensors.clear()
del self.tensors
gc.collect()
nn.shrink_memory_pool()
os.exitpoint(os.EXITPOINT_ENABLE_SLEEP)
time.sleep_ms(100)
# 自定义动态手势识别任务
class DynamicGesture:
def __init__(self,hand_det_kmodel,hand_kp_kmodel,gesture_kmodel,det_input_size,kp_input_size,gesture_input_size,labels,anchors,confidence_threshold=0.25,nms_threshold=0.3,nms_option=False,strides=[8,16,32],rgb888p_size=[1280,720],display_size=[1920,1080],debug_mode=0):
# 手掌检测模型路径
self.hand_det_kmodel=hand_det_kmodel
# 手掌关键点模型路径
self.hand_kp_kmodel=hand_kp_kmodel
# 动态手势识别路径
self.gesture_kmodel=gesture_kmodel
# 手掌检测模型输入分辨率
self.det_input_size=det_input_size
# 手掌关键点模型输入分辨率
self.kp_input_size=kp_input_size
# 动态手势识别模型输入分辨率
self.gesture_input_size=gesture_input_size
self.labels=labels
# anchors
self.anchors=anchors
# 置信度阈值
self.confidence_threshold=confidence_threshold
# nms阈值
self.nms_threshold=nms_threshold
self.nms_option=nms_option
self.strides=strides
# sensor给到AI的图像分辨率,宽16字节对齐
self.rgb888p_size=[ALIGN_UP(rgb888p_size[0],16),rgb888p_size[1]]
# 视频输出VO分辨率,宽16字节对齐
self.display_size=[ALIGN_UP(display_size[0],16),display_size[1]]
# 动态手势识别贴图
self.bin_width = 150 # 动态手势识别屏幕坐上角标志状态文件的短边尺寸
self.bin_height = 216 # 动态手势识别屏幕坐上角标志状态文件的长边尺寸
shang_argb = np.fromfile("/sdcard/app/tests/utils/shang.bin", dtype=np.uint8)
self.shang_argb = shang_argb.reshape((self.bin_height, self.bin_width, 4))
xia_argb = np.fromfile("/sdcard/app/tests/utils/xia.bin", dtype=np.uint8)
self.xia_argb = xia_argb.reshape((self.bin_height, self.bin_width, 4))
zuo_argb = np.fromfile("/sdcard/app/tests/utils/zuo.bin", dtype=np.uint8)
self.zuo_argb = zuo_argb.reshape((self.bin_width, self.bin_height, 4))
you_argb = np.fromfile("/sdcard/app/tests/utils/you.bin", dtype=np.uint8)
self.you_argb = you_argb.reshape((self.bin_width, self.bin_height, 4))
#其他参数
self.TRIGGER = 0 # 动态手势识别应用的结果状态
self.MIDDLE = 1
self.UP = 2
self.DOWN = 3
self.LEFT = 4
self.RIGHT = 5
self.max_hist_len = 20 # 最多存储多少帧的结果
# debug_mode模式
self.debug_mode=debug_mode
self.cur_state = self.TRIGGER
self.pre_state = self.TRIGGER
self.draw_state = self.TRIGGER
self.vec_flag = []
self.his_logit = []
self.history = [2]
self.s_start = time.time_ns()
self.m_start=None
self.hand_det=HandDetApp(self.hand_det_kmodel,self.labels,model_input_size=self.det_input_size,anchors=self.anchors,confidence_threshold=self.confidence_threshold,nms_threshold=self.nms_threshold,nms_option=self.nms_option,strides=self.strides,rgb888p_size=self.rgb888p_size,display_size=self.display_size,debug_mode=0)
self.hand_kp=HandKPClassApp(self.hand_kp_kmodel,model_input_size=self.kp_input_size,rgb888p_size=self.rgb888p_size,display_size=self.display_size)
self.dg=DynamicGestureApp(self.gesture_kmodel,model_input_size=self.gesture_input_size,rgb888p_size=self.rgb888p_size,display_size=self.display_size)
self.hand_det.config_preprocess()
self.dg.config_preprocess()
# run函数
def run(self,input_np):
if self.cur_state == self.TRIGGER:
# 手掌检测
det_boxes=self.hand_det.run(input_np)
boxes=[]
gesture_res=[]
for det_box in det_boxes:
# 筛选检测框
x1, y1, x2, y2 = det_box[2],det_box[3],det_box[4],det_box[5]
w,h= int(x2 - x1),int(y2 - y1)
if (h<(0.1*self.rgb888p_size[1])):
continue
if (w<(0.25*self.rgb888p_size[0]) and ((x1<(0.03*self.rgb888p_size[0])) or (x2>(0.97*self.rgb888p_size[0])))):
continue
if (w<(0.15*self.rgb888p_size[0]) and ((x1<(0.01*self.rgb888p_size[0])) or (x2>(0.99*self.rgb888p_size[0])))):
continue
# 手掌关键点预处理配置
self.hand_kp.config_preprocess(det_box)
# 手掌关键点检测
hk_results,gesture_str=self.hand_kp.run(input_np)
boxes.append(det_box)
gesture_res.append((hk_results,gesture_str))
return boxes,gesture_res
else:
# 动态手势识别
idx, avg_logit = self.dg.run(input_np, self.his_logit, self.history)
return idx,avg_logit
# 根据输出结果绘制效果
def draw_result(self,pl,output1,output2):
pl.osd_img.clear()
draw_img_np = np.zeros((self.display_size[1],self.display_size[0],4),dtype=np.uint8)
draw_img=image.Image(self.display_size[0], self.display_size[1], image.ARGB8888,alloc=image.ALLOC_REF,data=draw_img_np)
if self.cur_state == self.TRIGGER:
for i in range(len(output1)):
hk_results,gesture=output2[i][0],output2[i][1]
if ((gesture == "five") or (gesture == "yeah")):
v_x = hk_results[24]-hk_results[0]
v_y = hk_results[25]-hk_results[1]
angle = self.hand_kp.hk_vector_2d_angle([v_x,v_y],[1.0,0.0])
if (v_y>0):
angle = 360-angle
if ((70.0<=angle) and (angle<110.0)):
if ((self.pre_state != self.UP) or (self.pre_state != self.MIDDLE)):
self.vec_flag.append(self.pre_state)
if ((len(self.vec_flag)>10)or(self.pre_state == self.UP) or (self.pre_state == self.MIDDLE) or(self.pre_state == self.TRIGGER)):
draw_img_np[:self.bin_height,:self.bin_width,:] = self.shang_argb
self.cur_state = self.UP
elif ((110.0<=angle) and (angle<225.0)): # 手指向右(实际方向)
if (self.pre_state != self.RIGHT):
self.vec_flag.append(self.pre_state)
if ((len(self.vec_flag)>10)or(self.pre_state == self.RIGHT)or(self.pre_state == self.TRIGGER)):
draw_img_np[:self.bin_width,:self.bin_height,:] = self.you_argb
self.cur_state = self.RIGHT
elif((225.0<=angle) and (angle<315.0)): # 手指向下
if (self.pre_state != self.DOWN):
self.vec_flag.append(self.pre_state)
if ((len(self.vec_flag)>10)or(self.pre_state == self.DOWN)or(self.pre_state == self.TRIGGER)):
draw_img_np[:self.bin_height,:self.bin_width,:] = self.xia_argb
self.cur_state = self.DOWN
else: # 手指向左(实际方向)
if (self.pre_state != self.LEFT):
self.vec_flag.append(self.pre_state)
if ((len(self.vec_flag)>10)or(self.pre_state == self.LEFT)or(self.pre_state == self.TRIGGER)):
draw_img_np[:self.bin_width,:self.bin_height,:] = self.zuo_argb
self.cur_state = self.LEFT
self.m_start = time.time_ns()
self.his_logit = []
else:
idx,avg_logit=output1,output2[0]
if (self.cur_state == self.UP):
draw_img_np[:self.bin_height,:self.bin_width,:] = self.shang_argb
if ((idx==15) or (idx==10)):
self.vec_flag.clear()
if (((avg_logit[idx] >= 0.7) and (len(self.his_logit) >= 2)) or ((avg_logit[idx] >= 0.3) and (len(self.his_logit) >= 4))):
self.s_start = time.time_ns()
self.cur_state = self.TRIGGER
self.draw_state = self.DOWN
self.history = [2]
self.pre_state = self.UP
elif ((idx==25)or(idx==26)) :
self.vec_flag.clear()
if (((avg_logit[idx] >= 0.4) and (len(self.his_logit) >= 2)) or ((avg_logit[idx] >= 0.3) and (len(self.his_logit) >= 3))):
self.s_start = time.time_ns()
self.cur_state = self.TRIGGER
self.draw_state = self.MIDDLE
self.history = [2]
self.pre_state = self.MIDDLE
else:
self.his_logit.clear()
elif (self.cur_state == self.RIGHT):
draw_img_np[:self.bin_width,:self.bin_height,:] = self.you_argb
if ((idx==16)or(idx==11)) :
self.vec_flag.clear()
if (((avg_logit[idx] >= 0.4) and (len(self.his_logit) >= 2)) or ((avg_logit[idx] >= 0.3) and (len(self.his_logit) >= 3))):
self.s_start = time.time_ns()
self.cur_state = self.TRIGGER
self.draw_state = self.RIGHT
self.history = [2]
self.pre_state = self.RIGHT
else:
self.his_logit.clear()
elif (self.cur_state == self.DOWN):
draw_img_np[:self.bin_height,:self.bin_width,:] = self.xia_argb
if ((idx==18)or(idx==13)):
self.vec_flag.clear()
if (((avg_logit[idx] >= 0.4) and (len(self.his_logit) >= 2)) or ((avg_logit[idx] >= 0.3) and (len(self.his_logit) >= 3))):
self.s_start = time.time_ns()
self.cur_state = self.TRIGGER
self.draw_state = self.UP
self.history = [2]
self.pre_state = self.DOWN
else:
self.his_logit.clear()
elif (self.cur_state == self.LEFT):
draw_img_np[:self.bin_width,:self.bin_height,:] = self.zuo_argb
if ((idx==17)or(idx==12)):
self.vec_flag.clear()
if (((avg_logit[idx] >= 0.4) and (len(self.his_logit) >= 2)) or ((avg_logit[idx] >= 0.3) and (len(self.his_logit) >= 3))):
self.s_start = time.time_ns()
self.cur_state = self.TRIGGER
self.draw_state = self.LEFT
self.history = [2]
self.pre_state = self.LEFT
else:
self.his_logit.clear()
self.elapsed_time = round((time.time_ns() - self.m_start)/1000000)
if ((self.cur_state != self.TRIGGER) and (self.elapsed_time>2000)):
self.cur_state = self.TRIGGER
self.pre_state = self.TRIGGER
self.elapsed_ms_show = round((time.time_ns()-self.s_start)/1000000)
if (self.elapsed_ms_show<1000):
if (self.draw_state == self.UP):
draw_img.draw_arrow(1068,330,1068,130, (255,170,190,230), thickness=13) # 判断为向上挥动时,画一个向上的箭头
draw_img.draw_string_advanced(self.display_size[0]//2-50,self.display_size[1]//2-50,32,"向上")
elif (self.draw_state == self.RIGHT):
draw_img.draw_arrow(1290,540,1536,540, (255,170,190,230), thickness=13) # 判断为向右挥动时,画一个向右的箭头
draw_img.draw_string_advanced(self.display_size[0]//2-50,self.display_size[1]//2-50,32,"向右")
elif (self.draw_state == self.DOWN):
draw_img.draw_arrow(1068,750,1068,950, (255,170,190,230), thickness=13) # 判断为向下挥动时,画一个向下的箭头
draw_img.draw_string_advanced(self.display_size[0]//2-50,self.display_size[1]//2-50,32,"向下")
elif (self.draw_state == self.LEFT):
draw_img.draw_arrow(846,540,600,540, (255,170,190,230), thickness=13) # 判断为向左挥动时,画一个向左的箭头
draw_img.draw_string_advanced(self.display_size[0]//2-50,self.display_size[1]//2-50,32,"向左")
elif (self.draw_state == self.MIDDLE):
draw_img.draw_circle(1068,540,100, (255,170,190,230), thickness=2, fill=True) # 判断为五指捏合手势时,画一个实心圆
draw_img.draw_string_advanced(self.display_size[0]//2-50,self.display_size[1]//2-50,32,"中间")
else:
self.draw_state = self.TRIGGER
pl.osd_img.copy_from(draw_img)
if __name__=="__main__":
# 显示模式,默认"hdmi",可以选择"hdmi"和"lcd"
display_mode="hdmi"
if display_mode=="hdmi":
display_size=[1920,1080]
else:
display_size=[800,480]
# 手掌检测模型路径
hand_det_kmodel_path="/sdcard/app/tests/kmodel/hand_det.kmodel"
# 手部关键点模型路径
hand_kp_kmodel_path="/sdcard/app/tests/kmodel/handkp_det.kmodel"
# 动态手势识别模型路径
gesture_kmodel_path="/sdcard/app/tests/kmodel/gesture.kmodel"
# 其他参数
rgb888p_size=[1920,1080]
hand_det_input_size=[512,512]
hand_kp_input_size=[256,256]
gesture_input_size=[224,224]
confidence_threshold=0.2
nms_threshold=0.5
labels=["hand"]
anchors = [26,27, 53,52, 75,71, 80,99, 106,82, 99,134, 140,113, 161,172, 245,276]
# 初始化PipeLine,只关注传给AI的图像分辨率,显示的分辨率
pl=PipeLine(rgb888p_size=rgb888p_size,display_size=display_size,display_mode=display_mode)
pl.create()
# 自定义动态手势识别任务实例
dg=DynamicGesture(hand_det_kmodel_path,hand_kp_kmodel_path,gesture_kmodel_path,det_input_size=hand_det_input_size,kp_input_size=hand_kp_input_size,gesture_input_size=gesture_input_size,labels=labels,anchors=anchors,confidence_threshold=confidence_threshold,nms_threshold=nms_threshold,nms_option=False,strides=[8,16,32],rgb888p_size=rgb888p_size,display_size=display_size)
try:
while True:
os.exitpoint()
with ScopedTiming("total",1):
img=pl.get_frame() # 获取当前帧
output1,output2=dg.run(img) # 推理当前帧
dg.draw_result(pl,output1,output2) # 绘制推理结果
pl.show_image() # 展示推理结果
gc.collect()
except Exception as e:
sys.print_exception(e)
finally:
dg.hand_det.deinit()
dg.hand_kp.deinit()
dg.dg.deinit()
pl.destroy()
2.2. 注视估计
from libs.PipeLine import PipeLine, ScopedTiming
from libs.AIBase import AIBase
from libs.AI2D import Ai2d
import os
import ujson
from media.media import *
from time import *
import nncase_runtime as nn
import ulab.numpy as np
import time
import image
import aidemo
import random
import gc
import sys
import math
# 自定义人脸检测任务类
class FaceDetApp(AIBase):
def __init__(self,kmodel_path,model_input_size,anchors,confidence_threshold=0.25,nms_threshold=0.3,rgb888p_size=[1280,720],display_size=[1920,1080],debug_mode=0):
super().__init__(kmodel_path,model_input_size,rgb888p_size,debug_mode)
# kmodel路径
self.kmodel_path=kmodel_path
# 检测模型输入分辨率
self.model_input_size=model_input_size
# 置信度阈值
self.confidence_threshold=confidence_threshold
# nms阈值
self.nms_threshold=nms_threshold
self.anchors=anchors
# sensor给到AI的图像分辨率,宽16字节对齐
self.rgb888p_size=[ALIGN_UP(rgb888p_size[0],16),rgb888p_size[1]]
# 视频输出VO分辨率,宽16字节对齐
self.display_size=[ALIGN_UP(display_size[0],16),display_size[1]]
# debug模式
self.debug_mode=debug_mode
# Ai2d实例,用于实现模型预处 理
self.ai2d=Ai2d(debug_mode)
# 设置Ai2d的输入输出格式和类型
self.ai2d.set_ai2d_dtype(nn.ai2d_format.NCHW_FMT,nn.ai2d_format.NCHW_FMT,np.uint8, np.uint8)
# 配置预处理操作,这里使用了padding和resize,Ai2d支持crop/shift/pad/resize/affine,具体代码请打开/sdcard/app/libs/AI2D.py查看
def config_preprocess(self,input_image_size=None):
with ScopedTiming("set preprocess config",self.debug_mode > 0):
# 初始化ai2d预处理配置,默认为sensor给到AI的尺寸,可以通过设置input_image_size自行修改输入尺寸
ai2d_input_size=input_image_size if input_image_size else self.rgb888p_size
# 设置padding预处理
self.ai2d.pad(self.get_pad_param(), 0, [104,117,123])
# 设置resize预处理
self.ai2d.resize(nn.interp_method.tf_bilinear, nn.interp_mode.half_pixel)
# 构建预处理流程,参数为预处理输入tensor的shape和预处理输出的tensor的shape
self.ai2d.build([1,3,ai2d_input_size[1],ai2d_input_size[0]],[1,3,self.model_input_size[1],self.model_input_size[0]])
# 自定义任务后处理,这里使用了aidemo库的face_det_post_process接口
def postprocess(self,results):
with ScopedTiming("postprocess",self.debug_mode > 0):
res = aidemo.face_det_post_process(self.confidence_threshold,self.nms_threshold,self.model_input_size[0],self.anchors,self.rgb888p_size,results)
if len(res)==0:
return res
else:
return res[0]
# 计算padding参数
def get_pad_param(self):
dst_w = self.model_input_size[0]
dst_h = self.model_input_size[1]
# 计算最小的缩放比例,等比例缩放
ratio_w = dst_w / self.rgb888p_size[0]
ratio_h = dst_h / self.rgb888p_size[1]
if ratio_w < ratio_h:
ratio = ratio_w
else:
ratio = ratio_h
new_w = (int)(ratio * self.rgb888p_size[0])
new_h = (int)(ratio * self.rgb888p_size[1])
dw = (dst_w - new_w) / 2
dh = (dst_h - new_h) / 2
top = (int)(round(0))
bottom = (int)(round(dh * 2 + 0.1))
left = (int)(round(0))
right = (int)(round(dw * 2 - 0.1))
return [0,0,0,0,top, bottom, left, right]
# 自定义注视估计任务类
class EyeGazeApp(AIBase):
def __init__(self,kmodel_path,model_input_size,rgb888p_size=[1920,1080],display_size=[1920,1080],debug_mode=0):
super().__init__(kmodel_path,model_input_size,rgb888p_size,debug_mode)
# kmodel路径
self.kmodel_path=kmodel_path
# 注视估计模型输入分辨率
self.model_input_size=model_input_size
# sensor给到AI的图像分辨率,宽16字节对齐
self.rgb888p_size=[ALIGN_UP(rgb888p_size[0],16),rgb888p_size[1]]
# 视频输出VO分辨率,宽16字节对齐
self.display_size=[ALIGN_UP(display_size[0],16),display_size[1]]
# debug模式
self.debug_mode=debug_mode
# Ai2d实例,用于实现模型预处理
self.ai2d=Ai2d(debug_mode)
# 设置Ai2d的输入输出格式和类型
self.ai2d.set_ai2d_dtype(nn.ai2d_format.NCHW_FMT,nn.ai2d_format.NCHW_FMT,np.uint8, np.uint8)
# 配置预处理操作,这里使用了crop和resize,Ai2d支持crop/shift/pad/resize/affine,具体代码请打开/sdcard/app/libs/AI2D.py查看
def config_preprocess(self,det,input_image_size=None):
with ScopedTiming("set preprocess config",self.debug_mode > 0):
# 初始化ai2d预处理配置
ai2d_input_size=input_image_size if input_image_size else self.rgb888p_size
# 计算crop预处理参数
x, y, w, h = map(lambda x: int(round(x, 0)), det[:4])
# 设置crop预处理
self.ai2d.crop(x,y,w,h)
# 设置resize预处理
self.ai2d.resize(nn.interp_method.tf_bilinear, nn.interp_mode.half_pixel)
# 构建预处理流程,参数为预处理输入tensor的shape和预处理输出的tensor的shape
self.ai2d.build([1,3,ai2d_input_size[1],ai2d_input_size[0]],[1,3,self.model_input_size[1],self.model_input_size[0]])
# 自定义后处理,results是模型输出的array列表,这里调用了aidemo库的eye_gaze_post_process接口
def postprocess(self,results):
with ScopedTiming("postprocess",self.debug_mode > 0):
post_ret = aidemo.eye_gaze_post_process(results)
return post_ret[0],post_ret[1]
# 自定义注视估计类
class EyeGaze:
def __init__(self,face_det_kmodel,eye_gaze_kmodel,det_input_size,eye_gaze_input_size,anchors,confidence_threshold=0.25,nms_threshold=0.3,rgb888p_size=[1920,1080],display_size=[1920,1080],debug_mode=0):
# 人脸检测模型路径
self.face_det_kmodel=face_det_kmodel
# 人脸注视估计模型路径
self.eye_gaze_kmodel=eye_gaze_kmodel
# 人脸检测模型输入分辨率
self.det_input_size=det_input_size
# 人脸注视估计模型输入分辨率
self.eye_gaze_input_size=eye_gaze_input_size
# anchors
self.anchors=anchors
# 置信度阈值
self.confidence_threshold=confidence_threshold
# nms阈值
self.nms_threshold=nms_threshold
# sensor给到AI的图像分辨率,宽16字节对齐
self.rgb888p_size=[ALIGN_UP(rgb888p_size[0],16),rgb888p_size[1]]
# 视频输出VO分辨率,宽16字节对齐
self.display_size=[ALIGN_UP(display_size[0],16),display_size[1]]
# debug_mode模式
self.debug_mode=debug_mode
# 人脸检测实例
self.face_det=FaceDetApp(self.face_det_kmodel,model_input_size=self.det_input_size,anchors=self.anchors,confidence_threshold=self.confidence_threshold,nms_threshold=self.nms_threshold,rgb888p_size=self.rgb888p_size,display_size=self.display_size,debug_mode=0)
# 注视估计实例
self.eye_gaze=EyeGazeApp(self.eye_gaze_kmodel,model_input_size=self.eye_gaze_input_size,rgb888p_size=self.rgb888p_size,display_size=self.display_size)
# 人脸检测配置预处理
self.face_det.config_preprocess()
#run方法
def run(self,input_np):
# 先进行人脸检测
det_boxes=self.face_det.run(input_np)
eye_gaze_res=[]
for det_box in det_boxes:
# 对每一个检测到的人脸做注视估计
self.eye_gaze.config_preprocess(det_box)
pitch,yaw=self.eye_gaze.run(input_np)
eye_gaze_res.append((pitch,yaw))
return det_boxes,eye_gaze_res
# 绘制注视估计效果
def draw_result(self,pl,dets,eye_gaze_res):
pl.osd_img.clear()
if dets:
for det,gaze_ret in zip(dets,eye_gaze_res):
pitch , yaw = gaze_ret
length = self.display_size[0]/ 2
x, y, w, h = map(lambda x: int(round(x, 0)), det[:4])
x = x * self.display_size[0] // self.rgb888p_size[0]
y = y * self.display_size[1] // self.rgb888p_size[1]
w = w * self.display_size[0] // self.rgb888p_size[0]
h = h * self.display_size[1] // self.rgb888p_size[1]
center_x = (x + w / 2.0)
center_y = (y + h / 2.0)
dx = -length * math.sin(pitch) * math.cos(yaw)
target_x = int(center_x + dx)
dy = -length * math.sin(yaw)
target_y = int(center_y + dy)
pl.osd_img.draw_arrow(int(center_x), int(center_y), target_x, target_y, color = (255,255,0,0), size = 30, thickness = 2)
if __name__=="__main__":
# 显示模式,默认"hdmi",可以选择"hdmi"和"lcd"
display_mode="hdmi"
if display_mode=="hdmi":
display_size=[1920,1080]
else:
display_size=[800,480]
# 人脸检测模型路径
face_det_kmodel_path="/sdcard/app/tests/kmodel/face_detection_320.kmodel"
# 人脸注视估计模型路径
eye_gaze_kmodel_path="/sdcard/app/tests/kmodel/eye_gaze.kmodel"
# 其他参数
anchors_path="/sdcard/app/tests/utils/prior_data_320.bin"
rgb888p_size=[1920,1080]
face_det_input_size=[320,320]
eye_gaze_input_size=[448,448]
confidence_threshold=0.5
nms_threshold=0.2
anchor_len=4200
det_dim=4
anchors = np.fromfile(anchors_path, dtype=np.float)
anchors = anchors.reshape((anchor_len,det_dim))
# 初始化PipeLine,只关注传给AI的图像分辨率,显示的分辨率
pl=PipeLine(rgb888p_size=rgb888p_size,display_size=display_size,display_mode=display_mode)
pl.create()
eg=EyeGaze(face_det_kmodel_path,eye_gaze_kmodel_path,det_input_size=face_det_input_size,eye_gaze_input_size=eye_gaze_input_size,anchors=anchors,confidence_threshold=confidence_threshold,nms_threshold=nms_threshold,rgb888p_size=rgb888p_size,display_size=display_size)
try:
while True:
os.exitpoint()
with ScopedTiming("total",1):
img=pl.get_frame() # 获取当前帧
det_boxes,eye_gaze_res=eg.run(img) # 推理当前帧
eg.draw_result(pl,det_boxes,eye_gaze_res) # 绘制推理效果
pl.show_image() # 展示推理效果
gc.collect()
except Exception as e:
sys.print_exception(e)
finally:
eg.face_det.deinit()
eg.eye_gaze.deinit()
pl.destroy()
2.3. 人脸检测
from libs.PipeLine import PipeLine, ScopedTiming
from libs.AIBase import AIBase
from libs.AI2D import Ai2d
import os
import ujson
from media.media import *
from time import *
import nncase_runtime as nn
import ulab.numpy as np
import time
import utime
import image
import random
import gc
import sys
import aidemo
# 自定义人脸检测类,继承自AIBase基类
class FaceDetectionApp(AIBase):
def __init__(self, kmodel_path, model_input_size, anchors, confidence_threshold=0.5, nms_threshold=0.2, rgb888p_size=[224,224], display_size=[1920,1080], debug_mode=0):
super().__init__(kmodel_path, model_input_size, rgb888p_size, debug_mode) # 调用基类的构造函数
self.kmodel_path = kmodel_path # 模型文件路径
self.model_input_size = model_input_size # 模型输入分辨率
self.confidence_threshold = confidence_threshold # 置信度阈值
self.nms_threshold = nms_threshold # NMS(非极大值抑制)阈值
self.anchors = anchors # 锚点数据,用于目标检测
self.rgb888p_size = [ALIGN_UP(rgb888p_size[0], 16), rgb888p_size[1]] # sensor给到AI的图像分辨率,并对宽度进行16的对齐
self.display_size = [ALIGN_UP(display_size[0], 16), display_size[1]] # 显示分辨率,并对宽度进行16的对齐
self.debug_mode = debug_mode # 是否开启调试模式
self.ai2d = Ai2d(debug_mode) # 实例化Ai2d,用于实现模型预处理
self.ai2d.set_ai2d_dtype(nn.ai2d_format.NCHW_FMT, nn.ai2d_format.NCHW_FMT, np.uint8, np.uint8) # 设置Ai2d的输入输出格式和类型
# 配置预处理操作,这里使用了pad和resize,Ai2d支持crop/shift/pad/resize/affine,具体代码请打开/sdcard/app/libs/AI2D.py查看
def config_preprocess(self, input_image_size=None):
with ScopedTiming("set preprocess config", self.debug_mode > 0): # 计时器,如果debug_mode大于0则开启
ai2d_input_size = input_image_size if input_image_size else self.rgb888p_size # 初始化ai2d预处理配置,默认为sensor给到AI的尺寸,可以通过设置input_image_size自行修改输入尺寸
top, bottom, left, right = self.get_padding_param() # 获取padding参数
self.ai2d.pad([0, 0, 0, 0, top, bottom, left, right], 0, [104, 117, 123]) # 填充边缘
self.ai2d.resize(nn.interp_method.tf_bilinear, nn.interp_mode.half_pixel) # 缩放图像
self.ai2d.build([1,3,ai2d_input_size[1],ai2d_input_size[0]],[1,3,self.model_input_size[1],self.model_input_size[0]]) # 构建预处理流程
# 自定义当前任务的后处理,results是模型输出array列表,这里使用了aidemo库的face_det_post_process接口
def postprocess(self, results):
with ScopedTiming("postprocess", self.debug_mode > 0):
post_ret = aidemo.face_det_post_process(self.confidence_threshold, self.nms_threshold, self.model_input_size[1], self.anchors, self.rgb888p_size, results)
if len(post_ret) == 0:
return post_ret
else:
return post_ret[0]
# 绘制检测结果到画面上
def draw_result(self, pl, dets):
with ScopedTiming("display_draw", self.debug_mode > 0):
if dets:
pl.osd_img.clear() # 清除OSD图像
for det in dets:
# 将检测框的坐标转换为显示分辨率下的坐标
x, y, w, h = map(lambda x: int(round(x, 0)), det[:4])
x = x * self.display_size[0] // self.rgb888p_size[0]
y = y * self.display_size[1] // self.rgb888p_size[1]
w = w * self.display_size[0] // self.rgb888p_size[0]
h = h * self.display_size[1] // self.rgb888p_size[1]
pl.osd_img.draw_rectangle(x, y, w, h, color=(255, 255, 0, 255), thickness=2) # 绘制矩形框
else:
pl.osd_img.clear()
# 获取padding参数
def get_padding_param(self):
dst_w = self.model_input_size[0] # 模型输入宽度
dst_h = self.model_input_size[1] # 模型输入高度
ratio_w = dst_w / self.rgb888p_size[0] # 宽度缩放比例
ratio_h = dst_h / self.rgb888p_size[1] # 高度缩放比例
ratio = min(ratio_w, ratio_h) # 取较小的缩放比例
new_w = int(ratio * self.rgb888p_size[0]) # 新宽度
new_h = int(ratio * self.rgb888p_size[1]) # 新高度
dw = (dst_w - new_w) / 2 # 宽度差
dh = (dst_h - new_h) / 2 # 高度差
top = int(round(0))
bottom = int(round(dh * 2 + 0.1))
left = int(round(0))
right = int(round(dw * 2 - 0.1))
return top, bottom, left, right
if __name__ == "__main__":
# 显示模式,默认"hdmi",可以选择"hdmi"和"lcd"
display_mode="hdmi"
if display_mode=="hdmi":
display_size=[1920,1080]
else:
display_size=[800,480]
# 设置模型路径和其他参数
kmodel_path = "/sdcard/app/tests/kmodel/face_detection_320.kmodel"
# 其它参数
confidence_threshold = 0.5
nms_threshold = 0.2
anchor_len = 4200
det_dim = 4
anchors_path = "/sdcard/app/tests/utils/prior_data_320.bin"
anchors = np.fromfile(anchors_path, dtype=np.float)
anchors = anchors.reshape((anchor_len, det_dim))
rgb888p_size = [1920, 1080]
# 初始化PipeLine,用于图像处理流程
pl = PipeLine(rgb888p_size=rgb888p_size, display_size=display_size, display_mode=display_mode)
pl.create() # 创建PipeLine实例
# 初始化自定义人脸检测实例
face_det = FaceDetectionApp(kmodel_path, model_input_size=[320, 320], anchors=anchors, confidence_threshold=confidence_threshold, nms_threshold=nms_threshold, rgb888p_size=rgb888p_size, display_size=display_size, debug_mode=0)
face_det.config_preprocess() # 配置预处理
try:
while True:
os.exitpoint() # 检查是否有退出信号
with ScopedTiming("total",1):
img = pl.get_frame() # 获取当前帧数据
res = face_det.run(img) # 推理当前帧
face_det.draw_result(pl, res) # 绘制结果
pl.show_image() # 显示结果
gc.collect() # 垃圾回收
except Exception as e:
sys.print_exception(e) # 打印异常信息
finally:
face_det.deinit() # 反初始化
pl.destroy() # 销毁PipeLine实例
2.4. 人脸关键部位
from libs.PipeLine import PipeLine, ScopedTiming
from libs.AIBase import AIBase
from libs.AI2D import Ai2d
import os
import ujson
from media.media import *
from time import *
import nncase_runtime as nn
import ulab.numpy as np
import time
import image
import aidemo
import random
import gc
import sys
# 自定义人脸检测任务类
class FaceDetApp(AIBase):
def __init__(self,kmodel_path,model_input_size,anchors,confidence_threshold=0.25,nms_threshold=0.3,rgb888p_size=[1280,720],display_size=[1920,1080],debug_mode=0):
super().__init__(kmodel_path,model_input_size,rgb888p_size,debug_mode)
# kmodel路径
self.kmodel_path=kmodel_path
# 检测模型输入分辨率
self.model_input_size=model_input_size
# 置信度阈值
self.confidence_threshold=confidence_threshold
# nms阈值
self.nms_threshold=nms_threshold
# 检测任务锚框
self.anchors=anchors
# sensor给到AI的图像分辨率,宽16字节对齐
self.rgb888p_size=[ALIGN_UP(rgb888p_size[0],16),rgb888p_size[1]]
# 视频输出VO分辨率,宽16字节对齐
self.display_size=[ALIGN_UP(display_size[0],16),display_size[1]]
# debug模式
self.debug_mode=debug_mode
# 实例化Ai2d,用于实现模型预处理
self.ai2d=Ai2d(debug_mode)
# 设置Ai2d的输入输出格式和类型
self.ai2d.set_ai2d_dtype(nn.ai2d_format.NCHW_FMT,nn.ai2d_format.NCHW_FMT,np.uint8, np.uint8)
# 配置预处理操作,这里使用了pad和resize,Ai2d支持crop/shift/pad/resize/affine,具体代码请打开/sdcard/app/libs/AI2D.py查看
def config_preprocess(self,input_image_size=None):
with ScopedTiming("set preprocess config",self.debug_mode > 0):
# 初始化ai2d预处理配置,默认为sensor给到AI的尺寸,可以通过设置input_image_size自行修改输入尺寸
ai2d_input_size=input_image_size if input_image_size else self.rgb888p_size
# 设置padding预处理
self.ai2d.pad(self.get_pad_param(), 0, [104,117,123])
# 设置resize预处理
self.ai2d.resize(nn.interp_method.tf_bilinear, nn.interp_mode.half_pixel)
# 构建预处理流程,参数为预处理输入tensor的shape和预处理输出的tensor的shape
self.ai2d.build([1,3,ai2d_input_size[1],ai2d_input_size[0]],[1,3,self.model_input_size[1],self.model_input_size[0]])
# 自定义后处理,results是模型输出的array列表,这里使用了aidemo的face_det_post_process列表
def postprocess(self,results):
with ScopedTiming("postprocess",self.debug_mode > 0):
res = aidemo.face_det_post_process(self.confidence_threshold,self.nms_threshold,self.model_input_size[0],self.anchors,self.rgb888p_size,results)
if len(res)==0:
return res
else:
return res[0]
# 计算padding参数
def get_pad_param(self):
dst_w = self.model_input_size[0]
dst_h = self.model_input_size[1]
# 计算最小的缩放比例,等比例缩放
ratio_w = dst_w / self.rgb888p_size[0]
ratio_h = dst_h / self.rgb888p_size[1]
if ratio_w < ratio_h:
ratio = ratio_w
else:
ratio = ratio_h
new_w = (int)(ratio * self.rgb888p_size[0])
new_h = (int)(ratio * self.rgb888p_size[1])
dw = (dst_w - new_w) / 2
dh = (dst_h - new_h) / 2
top = (int)(round(0))
bottom = (int)(round(dh * 2 + 0.1))
left = (int)(round(0))
right = (int)(round(dw * 2 - 0.1))
return [0,0,0,0,top, bottom, left, right]
# 自定义人脸关键点任务类
class FaceLandMarkApp(AIBase):
def __init__(self,kmodel_path,model_input_size,rgb888p_size=[1920,1080],display_size=[1920,1080],debug_mode=0):
super().__init__(kmodel_path,model_input_size,rgb888p_size,debug_mode)
# kmodel路径
self.kmodel_path=kmodel_path
# 关键点模型输入分辨率
self.model_input_size=model_input_size
# sensor给到AI的图像分辨率,宽16字节对齐
self.rgb888p_size=[ALIGN_UP(rgb888p_size[0],16),rgb888p_size[1]]
# 视频输出VO分辨率,宽16字节对齐
self.display_size=[ALIGN_UP(display_size[0],16),display_size[1]]
# debug模式
self.debug_mode=debug_mode
# 目标矩阵
self.matrix_dst=None
self.ai2d=Ai2d(debug_mode)
self.ai2d.set_ai2d_dtype(nn.ai2d_format.NCHW_FMT,nn.ai2d_format.NCHW_FMT,np.uint8, np.uint8)
# 配置预处理操作,这里使用了affine,Ai2d支持crop/shift/pad/resize/affine,具体代码请打开/sdcard/app/libs/AI2D.py查看
def config_preprocess(self,det,input_image_size=None):
with ScopedTiming("set preprocess config",self.debug_mode > 0):
# 初始化ai2d预处理配置,默认为sensor给到AI的尺寸,可以通过设置input_image_size自行修改输入尺寸
ai2d_input_size=input_image_size if input_image_size else self.rgb888p_size
# 计算目标矩阵,并获取仿射变换矩阵
self.matrix_dst = self.get_affine_matrix(det)
affine_matrix = [self.matrix_dst[0][0],self.matrix_dst[0][1],self.matrix_dst[0][2],
self.matrix_dst[1][0],self.matrix_dst[1][1],self.matrix_dst[1][2]]
# 设置仿射变换预处理
self.ai2d.affine(nn.interp_method.cv2_bilinear,0, 0, 127, 1,affine_matrix)
# 构建预处理流程,参数为预处理输入tensor的shape和预处理输出的tensor的shape
self.ai2d.build([1,3,ai2d_input_size[1],ai2d_input_size[0]],[1,3,self.model_input_size[1],self.model_input_size[0]])
# 自定义后处理,results是模型输出的array列表,这里使用了aidemo库的invert_affine_transform接口
def postprocess(self,results):
with ScopedTiming("postprocess",self.debug_mode > 0):
pred=results[0]
# (1)将人脸关键点输出变换模型输入
half_input_len = self.model_input_size[0] // 2
pred = pred.flatten()
for i in range(len(pred)):
pred[i] += (pred[i] + 1) * half_input_len
# (2)获取仿射矩阵的逆矩阵
matrix_dst_inv = aidemo.invert_affine_transform(self.matrix_dst)
matrix_dst_inv = matrix_dst_inv.flatten()
# (3)对每个关键点进行逆变换
half_out_len = len(pred) // 2
for kp_id in range(half_out_len):
old_x = pred[kp_id * 2]
old_y = pred[kp_id * 2 + 1]
# 逆变换公式
new_x = old_x * matrix_dst_inv[0] + old_y * matrix_dst_inv[1] + matrix_dst_inv[2]
new_y = old_x * matrix_dst_inv[3] + old_y * matrix_dst_inv[4] + matrix_dst_inv[5]
pred[kp_id * 2] = new_x
pred[kp_id * 2 + 1] = new_y
return pred
def get_affine_matrix(self,bbox):
# 获取仿射矩阵,用于将边界框映射到模型输入空间
with ScopedTiming("get_affine_matrix", self.debug_mode > 1):
# 从边界框提取坐标和尺寸
x1, y1, w, h = map(lambda x: int(round(x, 0)), bbox[:4])
# 计算缩放比例,使得边界框映射到模型输入空间的一部分
scale_ratio = (self.model_input_size[0]) / (max(w, h) * 1.5)
# 计算边界框中心点在模型输入空间的坐标
cx = (x1 + w / 2) * scale_ratio
cy = (y1 + h / 2) * scale_ratio
# 计算模型输入空间的一半长度
half_input_len = self.model_input_size[0] / 2
# 创建仿射矩阵并进行设置
matrix_dst = np.zeros((2, 3), dtype=np.float)
matrix_dst[0, 0] = scale_ratio
matrix_dst[0, 1] = 0
matrix_dst[0, 2] = half_input_len - cx
matrix_dst[1, 0] = 0
matrix_dst[1, 1] = scale_ratio
matrix_dst[1, 2] = half_input_len - cy
return matrix_dst
# 人脸标志解析
class FaceLandMark:
def __init__(self,face_det_kmodel,face_landmark_kmodel,det_input_size,landmark_input_size,anchors,confidence_threshold=0.25,nms_threshold=0.3,rgb888p_size=[1920,1080],display_size=[1920,1080],debug_mode=0):
# 人脸检测模型路径
self.face_det_kmodel=face_det_kmodel
# 人脸标志解析模型路径
self.face_landmark_kmodel=face_landmark_kmodel
# 人脸检测模型输入分辨率
self.det_input_size=det_input_size
# 人脸标志解析模型输入分辨率
self.landmark_input_size=landmark_input_size
# anchors
self.anchors=anchors
# 置信度阈值
self.confidence_threshold=confidence_threshold
# nms阈值
self.nms_threshold=nms_threshold
# sensor给到AI的图像分辨率,宽16字节对齐
self.rgb888p_size=[ALIGN_UP(rgb888p_size[0],16),rgb888p_size[1]]
# 视频输出VO分辨率,宽16字节对齐
self.display_size=[ALIGN_UP(display_size[0],16),display_size[1]]
# debug_mode模式
self.debug_mode=debug_mode
# 人脸关键点不同部位关键点列表
self.dict_kp_seq = [
[43, 44, 45, 47, 46, 50, 51, 49, 48], # left_eyebrow
[97, 98, 99, 100, 101, 105, 104, 103, 102], # right_eyebrow
[35, 36, 33, 37, 39, 42, 40, 41], # left_eye
[89, 90, 87, 91, 93, 96, 94, 95], # right_eye
[34, 88], # pupil
[72, 73, 74, 86], # bridge_nose
[77, 78, 79, 80, 85, 84, 83], # wing_nose
[52, 55, 56, 53, 59, 58, 61, 68, 67, 71, 63, 64], # out_lip
[65, 54, 60, 57, 69, 70, 62, 66], # in_lip
[1, 9, 10, 11, 12, 13, 14, 15, 16, 2, 3, 4, 5, 6, 7, 8, 0, 24, 23, 22, 21, 20, 19, 18, 32, 31, 30, 29, 28, 27, 26, 25, 17] # basin
]
# 人脸关键点不同部位(顺序同dict_kp_seq)颜色配置,argb
self.color_list_for_osd_kp = [
(255, 0, 255, 0),
(255, 0, 255, 0),
(255, 255, 0, 255),
(255, 255, 0, 255),
(255, 255, 0, 0),
(255, 255, 170, 0),
(255, 255, 255, 0),
(255, 0, 255, 255),
(255, 255, 220, 50),
(255, 30, 30, 255)
]
# 人脸检测实例
self.face_det=FaceDetApp(self.face_det_kmodel,model_input_size=self.det_input_size,anchors=self.anchors,confidence_threshold=self.confidence_threshold,nms_threshold=self.nms_threshold,rgb888p_size=self.rgb888p_size,display_size=self.display_size,debug_mode=0)
# 人脸标志解析实例
self.face_landmark=FaceLandMarkApp(self.face_landmark_kmodel,model_input_size=self.landmark_input_size,rgb888p_size=self.rgb888p_size,display_size=self.display_size)
# 配置人脸检测的预处理
self.face_det.config_preprocess()
# run函数
def run(self,input_np):
# 执行人脸检测
det_boxes=self.face_det.run(input_np)
landmark_res=[]
for det_box in det_boxes:
# 对每一个检测到的人脸解析关键部位
self.face_landmark.config_preprocess(det_box)
res=self.face_landmark.run(input_np)
landmark_res.append(res)
return det_boxes,landmark_res
# 绘制人脸解析效果
def draw_result(self,pl,dets,landmark_res):
pl.osd_img.clear()
if dets:
draw_img_np = np.zeros((self.display_size[1],self.display_size[0],4),dtype=np.uint8)
draw_img = image.Image(self.display_size[0], self.display_size[1], image.ARGB8888, alloc=image.ALLOC_REF,data = draw_img_np)
for pred in landmark_res:
# (1)获取单个人脸框对应的人脸关键点
for sub_part_index in range(len(self.dict_kp_seq)):
# (2)构建人脸某个区域关键点集
sub_part = self.dict_kp_seq[sub_part_index]
face_sub_part_point_set = []
for kp_index in range(len(sub_part)):
real_kp_index = sub_part[kp_index]
x, y = pred[real_kp_index * 2], pred[real_kp_index * 2 + 1]
x = int(x * self.display_size[0] // self.rgb888p_size[0])
y = int(y * self.display_size[1] // self.rgb888p_size[1])
face_sub_part_point_set.append((x, y))
# (3)画人脸不同区域的轮廓
if sub_part_index in (9, 6):
color = np.array(self.color_list_for_osd_kp[sub_part_index],dtype = np.uint8)
face_sub_part_point_set = np.array(face_sub_part_point_set)
aidemo.polylines(draw_img_np, face_sub_part_point_set,False,color,5,8,0)
elif sub_part_index == 4:
color = self.color_list_for_osd_kp[sub_part_index]
for kp in face_sub_part_point_set:
x,y = kp[0],kp[1]
draw_img.draw_circle(x,y ,2, color, 1)
else:
color = np.array(self.color_list_for_osd_kp[sub_part_index],dtype = np.uint8)
face_sub_part_point_set = np.array(face_sub_part_point_set)
aidemo.contours(draw_img_np, face_sub_part_point_set,-1,color,2,8)
pl.osd_img.copy_from(draw_img)
if __name__=="__main__":
# 显示模式,默认"hdmi",可以选择"hdmi"和"lcd"
display_mode="hdmi"
if display_mode=="hdmi":
display_size=[1920,1080]
else:
display_size=[800,480]
# 人脸检测模型路径
face_det_kmodel_path="/sdcard/app/tests/kmodel/face_detection_320.kmodel"
# 人脸关键标志模型路径
face_landmark_kmodel_path="/sdcard/app/tests/kmodel/face_landmark.kmodel"
# 其它参数
anchors_path="/sdcard/app/tests/utils/prior_data_320.bin"
rgb888p_size=[1920,1080]
face_det_input_size=[320,320]
face_landmark_input_size=[192,192]
confidence_threshold=0.5
nms_threshold=0.2
anchor_len=4200
det_dim=4
anchors = np.fromfile(anchors_path, dtype=np.float)
anchors = anchors.reshape((anchor_len,det_dim))
# 初始化PipeLine,只关注传给AI的图像分辨率,显示的分辨率
pl=PipeLine(rgb888p_size=rgb888p_size,display_size=display_size,display_mode=display_mode)
pl.create()
flm=FaceLandMark(face_det_kmodel_path,face_landmark_kmodel_path,det_input_size=face_det_input_size,landmark_input_size=face_landmark_input_size,anchors=anchors,confidence_threshold=confidence_threshold,nms_threshold=nms_threshold,rgb888p_size=rgb888p_size,display_size=display_size)
try:
while True:
os.exitpoint()
with ScopedTiming("total",1):
img=pl.get_frame() # 获取当前帧
det_boxes,landmark_res=flm.run(img) # 推理当前帧
flm.draw_result(pl,det_boxes,landmark_res) # 绘制推理结果
pl.show_image() # 展示推理效果
gc.collect()
except Exception as e:
sys.print_exception(e)
finally:
flm.face_det.deinit()
flm.face_landmark.deinit()
pl.destroy()
2.5. 人脸3D网络
from libs.PipeLine import PipeLine, ScopedTiming
from libs.AIBase import AIBase
from libs.AI2D import Ai2d
import os
import ujson
from media.media import *
from time import *
import nncase_runtime as nn
import ulab.numpy as np
import time
import image
import aidemo
import random
import gc
import sys
# 自定义人脸检测任务类
class FaceDetApp(AIBase):
def __init__(self,kmodel_path,model_input_size,anchors,confidence_threshold=0.25,nms_threshold=0.3,rgb888p_size=[1280,720],display_size=[1920,1080],debug_mode=0):
super().__init__(kmodel_path,model_input_size,rgb888p_size,debug_mode)
# kmodel路径
self.kmodel_path=kmodel_path
# 检测模型输入分辨率
self.model_input_size=model_input_size
# 置信度阈值
self.confidence_threshold=confidence_threshold
# nms阈值
self.nms_threshold=nms_threshold
# 检测任务锚框
self.anchors=anchors
# sensor给到AI的图像分辨率,宽16字节对齐
self.rgb888p_size=[ALIGN_UP(rgb888p_size[0],16),rgb888p_size[1]]
# 视频输出VO分辨率,宽16字节对齐
self.display_size=[ALIGN_UP(display_size[0],16),display_size[1]]
# debug模式
self.debug_mode=debug_mode
# 实例化Ai2d,用于实现模型预处理
self.ai2d=Ai2d(debug_mode)
# 设置Ai2d的输入输出格式和类型
self.ai2d.set_ai2d_dtype(nn.ai2d_format.NCHW_FMT,nn.ai2d_format.NCHW_FMT,np.uint8, np.uint8)
# 配置预处理操作,这里使用了pad和resize,Ai2d支持crop/shift/pad/resize/affine,具体代码请打开/sdcard/app/libs/AI2D.py查看
def config_preprocess(self,input_image_size=None):
with ScopedTiming("set preprocess config",self.debug_mode > 0):
# 初始化ai2d预处理配置,默认为sensor给到AI的尺寸,可以通过设置input_image_size自行修改输入尺寸
ai2d_input_size=input_image_size if input_image_size else self.rgb888p_size
# 设置padding预处理
self.ai2d.pad(self.get_pad_param(), 0, [104,117,123])
# 设置resize预处理
self.ai2d.resize(nn.interp_method.tf_bilinear, nn.interp_mode.half_pixel)
# 构建预处理流程,参数为预处理输入tensor的shape和预处理输出的tensor的shape
self.ai2d.build([1,3,ai2d_input_size[1],ai2d_input_size[0]],[1,3,self.model_input_size[1],self.model_input_size[0]])
# 自定义后处理,results是模型推理输出的array列表,这里使用了aidemo库的face_det_post_process接口
def postprocess(self,results):
with ScopedTiming("postprocess",self.debug_mode > 0):
res = aidemo.face_det_post_process(self.confidence_threshold,self.nms_threshold,self.model_input_size[0],self.anchors,self.rgb888p_size,results)
if len(res)==0:
return res
else:
return res[0]
# padding参数计算
def get_pad_param(self):
dst_w = self.model_input_size[0]
dst_h = self.model_input_size[1]
# 计算最小的缩放比例,等比例缩放
ratio_w = dst_w / self.rgb888p_size[0]
ratio_h = dst_h / self.rgb888p_size[1]
if ratio_w < ratio_h:
ratio = ratio_w
else:
ratio = ratio_h
new_w = (int)(ratio * self.rgb888p_size[0])
new_h = (int)(ratio * self.rgb888p_size[1])
dw = (dst_w - new_w) / 2
dh = (dst_h - new_h) / 2
top = (int)(round(0))
bottom = (int)(round(dh * 2 + 0.1))
left = (int)(round(0))
right = (int)(round(dw * 2 - 0.1))
return [0,0,0,0,top, bottom, left, right]
# 自定义人脸网格任务类
class FaceMeshApp(AIBase):
def __init__(self,kmodel_path,model_input_size,rgb888p_size=[1920,1080],display_size=[1920,1080],debug_mode=0):
super().__init__(kmodel_path,model_input_size,rgb888p_size,debug_mode)
# kmodel路径
self.kmodel_path=kmodel_path
# 人脸网格模型输入分辨率
self.model_input_size=model_input_size
# sensor给到AI的图像分辨率,宽16字节对齐
self.rgb888p_size=[ALIGN_UP(rgb888p_size[0],16),rgb888p_size[1]]
# 视频输出VO分辨率,宽16字节对齐
self.display_size=[ALIGN_UP(display_size[0],16),display_size[1]]
# debug模式
self.debug_mode=debug_mode
# 人脸mesh参数均值
self.param_mean = np.array([0.0003492636315058917,2.52790130161884e-07,-6.875197868794203e-07,60.1679573059082,-6.295513230725192e-07,0.0005757200415246189,-5.085391239845194e-05,74.2781982421875,5.400917189035681e-07,6.574138387804851e-05,0.0003442012530285865,-66.67157745361328,-346603.6875,-67468.234375,46822.265625,-15262.046875,4350.5888671875,-54261.453125,-18328.033203125,-1584.328857421875,-84566.34375,3835.960693359375,-20811.361328125,38094.9296875,-19967.85546875,-9241.3701171875,-19600.71484375,13168.08984375,-5259.14404296875,1848.6478271484375,-13030.662109375,-2435.55615234375,-2254.20654296875,-14396.5615234375,-6176.3291015625,-25621.919921875,226.39447021484375,-6326.12353515625,-10867.2509765625,868.465087890625,-5831.14794921875,2705.123779296875,-3629.417724609375,2043.9901123046875,-2446.6162109375,3658.697021484375,-7645.98974609375,-6674.45263671875,116.38838958740234,7185.59716796875,-1429.48681640625,2617.366455078125,-1.2070955038070679,0.6690792441368103,-0.17760828137397766,0.056725528091192245,0.03967815637588501,-0.13586315512657166,-0.09223993122577667,-0.1726071834564209,-0.015804484486579895,-0.1416848599910736],dtype=np.float)
# 人脸mesh参数方差
self.param_std = np.array([0.00017632152594160289,6.737943476764485e-05,0.00044708489440381527,26.55023193359375,0.0001231376954820007,4.493021697271615e-05,7.923670636955649e-05,6.982563018798828,0.0004350444069132209,0.00012314890045672655,0.00017400001524947584,20.80303955078125,575421.125,277649.0625,258336.84375,255163.125,150994.375,160086.109375,111277.3046875,97311.78125,117198.453125,89317.3671875,88493.5546875,72229.9296875,71080.2109375,50013.953125,55968.58203125,47525.50390625,49515.06640625,38161.48046875,44872.05859375,46273.23828125,38116.76953125,28191.162109375,32191.4375,36006.171875,32559.892578125,25551.1171875,24267.509765625,27521.3984375,23166.53125,21101.576171875,19412.32421875,19452.203125,17454.984375,22537.623046875,16174.28125,14671.640625,15115.6884765625,13870.0732421875,13746.3125,12663.1337890625,1.5870834589004517,1.5077009201049805,0.5881357789039612,0.5889744758605957,0.21327851712703705,0.2630201280117035,0.2796429395675659,0.38030216097831726,0.16162841022014618,0.2559692859649658],dtype=np.float)
# 实例化Ai2d,用于实现模型预处理
self.ai2d=Ai2d(debug_mode)
# 设置Ai2d的输入输出格式和类型
self.ai2d.set_ai2d_dtype(nn.ai2d_format.NCHW_FMT,nn.ai2d_format.NCHW_FMT,np.uint8, np.uint8)
# 配置预处理操作,这里使用了crop和resize,Ai2d支持crop/shift/pad/resize/affine,具体代码请打开/sdcard/app/libs/AI2D.py查看
def config_preprocess(self,det,input_image_size=None):
with ScopedTiming("set preprocess config",self.debug_mode > 0):
# 初始化ai2d预处理配置,默认为sensor给到AI的尺寸,可以通过设置input_image_size自行修改输入尺寸
ai2d_input_size=input_image_size if input_image_size else self.rgb888p_size
# 计算crop参数,并设置crop预处理
roi = self.parse_roi_box_from_bbox(det)
self.ai2d.crop(int(roi[0]),int(roi[1]),int(roi[2]),int(roi[3]))
# 设置resize预处理
self.ai2d.resize(nn.interp_method.tf_bilinear, nn.interp_mode.half_pixel)
# 构建预处理流程,参数为预处理输入tensor的shape和预处理输出的tensor的shape
self.ai2d.build([1,3,ai2d_input_size[1],ai2d_input_size[0]],[1,3,self.model_input_size[1],self.model_input_size[0]])
return roi
# 自定义后处理,results是模型输出的array列表
def postprocess(self,results):
with ScopedTiming("postprocess",self.debug_mode > 0):
param = results[0] * self.param_std + self.param_mean
return param
def parse_roi_box_from_bbox(self,bbox):
# 获取人脸roi
x1, y1, w, h = map(lambda x: int(round(x, 0)), bbox[:4])
old_size = (w + h) / 2
center_x = x1 + w / 2
center_y = y1 + h / 2 + old_size * 0.14
size = int(old_size * 1.58)
x0 = center_x - float(size) / 2
y0 = center_y - float(size) / 2
x1 = x0 + size
y1 = y0 + size
x0 = max(0, min(x0, self.rgb888p_size[0]))
y0 = max(0, min(y0, self.rgb888p_size[1]))
x1 = max(0, min(x1, self.rgb888p_size[0]))
y1 = max(0, min(y1, self.rgb888p_size[1]))
roi = (x0, y0, x1 - x0, y1 - y0)
return roi
# 自定义人脸网格后处理任务类
class FaceMeshPostApp(AIBase):
def __init__(self,kmodel_path,model_input_size,rgb888p_size=[1920,1080],display_size=[1920,1080],debug_mode=0):
super().__init__(kmodel_path,model_input_size,rgb888p_size,debug_mode)
# kmodel路径
self.kmodel_path=kmodel_path
# 人脸网格模型输入分辨率
self.model_input_size=model_input_size
# sensor给到AI的图像分辨率,宽16字节对齐
self.rgb888p_size=[ALIGN_UP(rgb888p_size[0],16),rgb888p_size[1]]
# 视频输出VO分辨率,宽16字节对齐
self.display_size=[ALIGN_UP(display_size[0],16),display_size[1]]
# debug模式
self.debug_mode=debug_mode
# 实例化Ai2d,用于实现模型预处理
self.ai2d=Ai2d(debug_mode)
# 设置Ai2d的输入输出格式和类型
self.ai2d.set_ai2d_dtype(nn.ai2d_format.NCHW_FMT,nn.ai2d_format.NCHW_FMT,np.uint8, np.uint8)
# 重写预处理函数preprocess,因为该模型的预处理不是单纯调用一个ai2d能实现的,返回模型输入的tensor列表
def preprocess(self,param):
with ScopedTiming("set preprocess config",self.debug_mode > 0):
# face mesh post模型预处理,param解析
param = param[0]
trans_dim, shape_dim, exp_dim = 12, 40, 10
R_ = param[:trans_dim].copy().reshape((3, -1))
R = R_[:, :3].copy()
offset = R_[:, 3].copy()
offset = offset.reshape((3, 1))
alpha_shp = param[trans_dim:trans_dim + shape_dim].copy().reshape((-1, 1))
alpha_exp = param[trans_dim + shape_dim:].copy().reshape((-1, 1))
R_tensor = nn.from_numpy(R)
offset_tensor = nn.from_numpy(offset)
alpha_shp_tensor = nn.from_numpy(alpha_shp)
alpha_exp_tensor = nn.from_numpy(alpha_exp)
return [R_tensor,offset_tensor,alpha_shp_tensor,alpha_exp_tensor]
# 自定义模型后处理,这里调用了aidemo的face_mesh_post_process接口
def postprocess(self,results,roi):
with ScopedTiming("postprocess",self.debug_mode > 0):
x, y, w, h = map(lambda x: int(round(x, 0)), roi[:4])
x = x * self.display_size[0] // self.rgb888p_size[0]
y = y * self.display_size[1] // self.rgb888p_size[1]
w = w * self.display_size[0] // self.rgb888p_size[0]
h = h * self.display_size[1] // self.rgb888p_size[1]
roi_array = np.array([x,y,w,h],dtype=np.float)
aidemo.face_mesh_post_process(roi_array,results[0])
return results[0]
# 3D人脸网格
class FaceMesh:
def __init__(self,face_det_kmodel,face_mesh_kmodel,mesh_post_kmodel,det_input_size,mesh_input_size,anchors,confidence_threshold=0.25,nms_threshold=0.3,rgb888p_size=[1920,1080],display_size=[1920,1080],debug_mode=0):
# 人脸检测模型路径
self.face_det_kmodel=face_det_kmodel
# 人脸3D网格模型路径
self.face_mesh_kmodel=face_mesh_kmodel
# 人脸3D网格后处理模型路径
self.mesh_post_kmodel=mesh_post_kmodel
# 人脸检测模型输入分辨率
self.det_input_size=det_input_size
# 人脸3D网格模型输入分辨率
self.mesh_input_size=mesh_input_size
# anchors
self.anchors=anchors
# 置信度阈值
self.confidence_threshold=confidence_threshold
# nms阈值
self.nms_threshold=nms_threshold
# sensor给到AI的图像分辨率,宽16字节对齐
self.rgb888p_size=[ALIGN_UP(rgb888p_size[0],16),rgb888p_size[1]]
# 视频输出VO分辨率,宽16字节对齐
self.display_size=[ALIGN_UP(display_size[0],16),display_size[1]]
# debug_mode模式
self.debug_mode=debug_mode
# 人脸检测实例
self.face_det=FaceDetApp(self.face_det_kmodel,model_input_size=self.det_input_size,anchors=self.anchors,confidence_threshold=self.confidence_threshold,nms_threshold=self.nms_threshold,rgb888p_size=self.rgb888p_size,display_size=self.display_size,debug_mode=0)
# 人脸网格实例
self.face_mesh=FaceMeshApp(self.face_mesh_kmodel,model_input_size=self.mesh_input_size,rgb888p_size=self.rgb888p_size,display_size=self.display_size)
# 人脸网格后处理实例
self.face_mesh_post=FaceMeshPostApp(self.mesh_post_kmodel,model_input_size=self.mesh_input_size,rgb888p_size=self.rgb888p_size,display_size=self.display_size)
# 人脸检测预处理配置
self.face_det.config_preprocess()
# run函数
def run(self,input_np):
# 执行人脸检测
det_boxes=self.face_det.run(input_np)
mesh_res=[]
for det_box in det_boxes:
# 对检测到的每一个人脸配置预处理,执行人脸网格和人脸网格后处理
roi=self.face_mesh.config_preprocess(det_box)
param=self.face_mesh.run(input_np)
tensors=self.face_mesh_post.preprocess(param)
results=self.face_mesh_post.inference(tensors)
res=self.face_mesh_post.postprocess(results,roi)
mesh_res.append(res)
return det_boxes,mesh_res
# 绘制人脸解析效果
def draw_result(self,pl,dets,mesh_res):
pl.osd_img.clear()
if dets:
draw_img_np = np.zeros((self.display_size[1],self.display_size[0],4),dtype=np.uint8)
draw_img = image.Image(self.display_size[0], self.display_size[1], image.ARGB8888, alloc=image.ALLOC_REF,data = draw_img_np)
for vertices in mesh_res:
aidemo.face_draw_mesh(draw_img_np, vertices)
pl.osd_img.copy_from(draw_img)
if __name__=="__main__":
# 显示模式,默认"hdmi",可以选择"hdmi"和"lcd"
display_mode="hdmi"
if display_mode=="hdmi":
display_size=[1920,1080]
else:
display_size=[800,480]
# 人脸检测模型路径
face_det_kmodel_path="/sdcard/app/tests/kmodel/face_detection_320.kmodel"
# 人脸网格模型路径
face_mesh_kmodel_path="/sdcard/app/tests/kmodel/face_alignment.kmodel"
# 人脸网格后处理模型路径
face_mesh_post_kmodel_path="/sdcard/app/tests/kmodel/face_alignment_post.kmodel"
# 其他参数
anchors_path="/sdcard/app/tests/utils/prior_data_320.bin"
rgb888p_size=[1920,1080]
face_det_input_size=[320,320]
face_mesh_input_size=[120,120]
confidence_threshold=0.5
nms_threshold=0.2
anchor_len=4200
det_dim=4
anchors = np.fromfile(anchors_path, dtype=np.float)
anchors = anchors.reshape((anchor_len,det_dim))
# 初始化PipeLine,只关注传给AI的图像分辨率,显示的分辨率
pl=PipeLine(rgb888p_size=rgb888p_size,display_size=display_size,display_mode=display_mode)
pl.create()
fm=FaceMesh(face_det_kmodel_path,face_mesh_kmodel_path,face_mesh_post_kmodel_path,det_input_size=face_det_input_size,mesh_input_size=face_mesh_input_size,anchors=anchors,confidence_threshold=confidence_threshold,nms_threshold=nms_threshold,rgb888p_size=rgb888p_size,display_size=display_size)
try:
while True:
os.exitpoint()
with ScopedTiming("total",1):
img=pl.get_frame() # 获取当前帧
det_boxes,mesh_res=fm.run(img) # 推理当前帧
fm.draw_result(pl,det_boxes,mesh_res) # 绘制推理结果
pl.show_image() # 显示推理效果
gc.collect()
except Exception as e:
sys.print_exception(e)
finally:
fm.face_det.deinit()
fm.face_mesh.deinit()
fm.face_mesh_post.deinit()
pl.destroy()
2.6. 人脸解析
from libs.PipeLine import PipeLine, ScopedTiming
from libs.AIBase import AIBase
from libs.AI2D import Ai2d
import os
import ujson
from media.media import *
from time import *
import nncase_runtime as nn
import ulab.numpy as np
import time
import image
import aidemo
import random
import gc
import sys
# 自定义人脸检测任务类
class FaceDetApp(AIBase):
def __init__(self,kmodel_path,model_input_size,anchors,confidence_threshold=0.25,nms_threshold=0.3,rgb888p_size=[1280,720],display_size=[1920,1080],debug_mode=0):
super().__init__(kmodel_path,model_input_size,rgb888p_size,debug_mode)
# kmodel路径
self.kmodel_path=kmodel_path
# 检测模型输入分辨率
self.model_input_size=model_input_size
# 置信度阈值
self.confidence_threshold=confidence_threshold
# nms阈值
self.nms_threshold=nms_threshold
self.anchors=anchors
# sensor给到AI的图像分辨率,宽16字节对齐
self.rgb888p_size=[ALIGN_UP(rgb888p_size[0],16),rgb888p_size[1]]
# 视频输出VO分辨率,宽16字节对齐
self.display_size=[ALIGN_UP(display_size[0],16),display_size[1]]
# debug模式
self.debug_mode=debug_mode
# 实例化Ai2d,用于实现模型预处理
self.ai2d=Ai2d(debug_mode)
# 设置Ai2d的输入输出格式和类型
self.ai2d.set_ai2d_dtype(nn.ai2d_format.NCHW_FMT,nn.ai2d_format.NCHW_FMT,np.uint8, np.uint8)
# 配置预处理操作,这里使用了pad和resize,Ai2d支持crop/shift/pad/resize/affine,具体代码请打开/sdcard/app/libs/AI2D.py查看
def config_preprocess(self,input_image_size=None):
with ScopedTiming("set preprocess config",self.debug_mode > 0):
# 初始化ai2d预处理配置,默认为sensor给到AI的尺寸,可以通过设置input_image_size自行修改输入尺寸
ai2d_input_size=input_image_size if input_image_size else self.rgb888p_size
# 计算padding参数,并设置padding预处理
self.ai2d.pad(self.get_pad_param(), 0, [104,117,123])
# 设置resize预处理
self.ai2d.resize(nn.interp_method.tf_bilinear, nn.interp_mode.half_pixel)
# 构建预处理流程,参数为预处理输入tensor的shape和预处理输出的tensor的shape
self.ai2d.build([1,3,ai2d_input_size[1],ai2d_input_size[0]],[1,3,self.model_input_size[1],self.model_input_size[0]])
# 自定义后处理,results是模型输出的array列表,这里调用了aidemo库的face_det_post_process接口
def postprocess(self,results):
with ScopedTiming("postprocess",self.debug_mode > 0):
res = aidemo.face_det_post_process(self.confidence_threshold,self.nms_threshold,self.model_input_size[0],self.anchors,self.rgb888p_size,results)
if len(res)==0:
return res
else:
return res[0]
# 计算padding参数
def get_pad_param(self):
dst_w = self.model_input_size[0]
dst_h = self.model_input_size[1]
# 计算最小的缩放比例,等比例缩放
ratio_w = dst_w / self.rgb888p_size[0]
ratio_h = dst_h / self.rgb888p_size[1]
if ratio_w < ratio_h:
ratio = ratio_w
else:
ratio = ratio_h
new_w = (int)(ratio * self.rgb888p_size[0])
new_h = (int)(ratio * self.rgb888p_size[1])
dw = (dst_w - new_w) / 2
dh = (dst_h - new_h) / 2
top = (int)(round(0))
bottom = (int)(round(dh * 2 + 0.1))
left = (int)(round(0))
right = (int)(round(dw * 2 - 0.1))
return [0,0,0,0,top, bottom, left, right]
# 自定义人脸解析任务类
class FaceParseApp(AIBase):
def __init__(self,kmodel_path,model_input_size,rgb888p_size=[1920,1080],display_size=[1920,1080],debug_mode=0):
super().__init__(kmodel_path,model_input_size,rgb888p_size,debug_mode)
# kmodel路径
self.kmodel_path=kmodel_path
# 检测模型输入分辨率
self.model_input_size=model_input_size
# sensor给到AI的图像分辨率,宽16字节对齐
self.rgb888p_size=[ALIGN_UP(rgb888p_size[0],16),rgb888p_size[1]]
# 视频输出VO分辨率,宽16字节对齐
self.display_size=[ALIGN_UP(display_size[0],16),display_size[1]]
# debug模式
self.debug_mode=debug_mode
# 实例化Ai2d,用于实现模型预处理
self.ai2d=Ai2d(debug_mode)
# 设置Ai2d的输入输出格式和类型
self.ai2d.set_ai2d_dtype(nn.ai2d_format.NCHW_FMT,nn.ai2d_format.NCHW_FMT,np.uint8, np.uint8)
# 配置预处理操作,这里使用了affine,Ai2d支持crop/shift/pad/resize/affine,具体代码请打开/sdcard/app/libs/AI2D.py查看
def config_preprocess(self,det,input_image_size=None):
with ScopedTiming("set preprocess config",self.debug_mode > 0):
# 初始化ai2d预处理配置,默认为sensor给到AI的尺寸,可以通过设置input_image_size自行修改输入尺寸
ai2d_input_size=input_image_size if input_image_size else self.rgb888p_size
# 计算仿射变换矩阵并设置affine预处理
matrix_dst = self.get_affine_matrix(det)
self.ai2d.affine(nn.interp_method.cv2_bilinear,0, 0, 127, 1,matrix_dst)
# 构建预处理流程,参数为预处理输入tensor的shape和预处理输出的tensor的shape
self.ai2d.build([1,3,ai2d_input_size[1],ai2d_input_size[0]],[1,3,self.model_input_size[1],self.model_input_size[0]])
# 自定义后处理,results是模型输出的array列表,这里将第一个输出返回
def postprocess(self,results):
with ScopedTiming("postprocess",self.debug_mode > 0):
return results[0]
def get_affine_matrix(self,bbox):
# 获取仿射矩阵,用于将边界框映射到模型输入空间
with ScopedTiming("get_affine_matrix", self.debug_mode > 1):
# 设置缩放因子
factor = 2.7
# 从边界框提取坐标和尺寸
x1, y1, w, h = map(lambda x: int(round(x, 0)), bbox[:4])
# 模型输入大小
edge_size = self.model_input_size[1]
# 平移距离,使得模型输入空间的中心对准原点
trans_distance = edge_size / 2.0
# 计算边界框中心点的坐标
center_x = x1 + w / 2.0
center_y = y1 + h / 2.0
# 计算最大边长
maximum_edge = factor * (h if h > w else w)
# 计算缩放比例
scale = edge_size * 2.0 / maximum_edge
# 计算平移参数
cx = trans_distance - scale * center_x
cy = trans_distance - scale * center_y
# 创建仿射矩阵
affine_matrix = [scale, 0, cx, 0, scale, cy]
return affine_matrix
# 人脸解析任务
class FaceParse:
def __init__(self,face_det_kmodel,face_parse_kmodel,det_input_size,parse_input_size,anchors,confidence_threshold=0.25,nms_threshold=0.3,rgb888p_size=[1920,1080],display_size=[1920,1080],debug_mode=0):
# 人脸检测模型路径
self.face_det_kmodel=face_det_kmodel
# 人脸解析模型路径
self.face_pose_kmodel=face_parse_kmodel
# 人脸检测模型输入分辨率
self.det_input_size=det_input_size
# 人脸解析模型输入分辨率
self.parse_input_size=parse_input_size
# anchors
self.anchors=anchors
# 置信度阈值
self.confidence_threshold=confidence_threshold
# nms阈值
self.nms_threshold=nms_threshold
# sensor给到AI的图像分辨率,宽16字节对齐
self.rgb888p_size=[ALIGN_UP(rgb888p_size[0],16),rgb888p_size[1]]
# 视频输出VO分辨率,宽16字节对齐
self.display_size=[ALIGN_UP(display_size[0],16),display_size[1]]
# debug_mode模式
self.debug_mode=debug_mode
# 人脸检测任务类实例
self.face_det=FaceDetApp(self.face_det_kmodel,model_input_size=self.det_input_size,anchors=self.anchors,confidence_threshold=self.confidence_threshold,nms_threshold=self.nms_threshold,rgb888p_size=self.rgb888p_size,display_size=self.display_size,debug_mode=0)
# 人脸解析实例
self.face_parse=FaceParseApp(self.face_pose_kmodel,model_input_size=self.parse_input_size,rgb888p_size=self.rgb888p_size,display_size=self.display_size)
# 人脸检测预处理配置
self.face_det.config_preprocess()
# run函数
def run(self,input_np):
# 执行人脸检测
det_boxes=self.face_det.run(input_np)
parse_res=[]
for det_box in det_boxes:
# 对检测到每一个人脸进行人脸解析
self.face_parse.config_preprocess(det_box)
res=self.face_parse.run(input_np)
parse_res.append(res)
return det_boxes,parse_res
# 绘制人脸解析效果
def draw_result(self,pl,dets,parse_res):
pl.osd_img.clear()
if dets:
draw_img_np = np.zeros((self.display_size[1],self.display_size[0],4),dtype=np.uint8)
draw_img=image.Image(self.display_size[0], self.display_size[1], image.ARGB8888,alloc=image.ALLOC_REF,data=draw_img_np)
for i,det in enumerate(dets):
# (1)将人脸检测框画到draw_img
x, y, w, h = map(lambda x: int(round(x, 0)), det[:4])
x = x * self.display_size[0] // self.rgb888p_size[0]
y = y * self.display_size[1] // self.rgb888p_size[1]
w = w * self.display_size[0] // self.rgb888p_size[0]
h = h * self.display_size[1] // self.rgb888p_size[1]
aidemo.face_parse_post_process(draw_img_np,self.rgb888p_size,self.display_size,self.parse_input_size[0],det.tolist(),parse_res[i])
pl.osd_img.copy_from(draw_img)
if __name__=="__main__":
# 显示模式,默认"hdmi",可以选择"hdmi"和"lcd"
display_mode="hdmi"
if display_mode=="hdmi":
display_size=[1920,1080]
else:
display_size=[800,480]
# 人脸检测模型路径
face_det_kmodel_path="/sdcard/app/tests/kmodel/face_detection_320.kmodel"
# 人脸解析模型路径
face_parse_kmodel_path="/sdcard/app/tests/kmodel/face_parse.kmodel"
# 其他参数
anchors_path="/sdcard/app/tests/utils/prior_data_320.bin"
rgb888p_size=[1920,1080]
face_det_input_size=[320,320]
face_parse_input_size=[320,320]
confidence_threshold=0.5
nms_threshold=0.2
anchor_len=4200
det_dim=4
anchors = np.fromfile(anchors_path, dtype=np.float)
anchors = anchors.reshape((anchor_len,det_dim))
# 初始化PipeLine,只关注传给AI的图像分辨率,显示的分辨率
pl=PipeLine(rgb888p_size=rgb888p_size,display_size=display_size,display_mode=display_mode)
pl.create()
fp=FaceParse(face_det_kmodel_path,face_parse_kmodel_path,det_input_size=face_det_input_size,parse_input_size=face_parse_input_size,anchors=anchors,confidence_threshold=confidence_threshold,nms_threshold=nms_threshold,rgb888p_size=rgb888p_size,display_size=display_size)
try:
while True:
os.exitpoint()
with ScopedTiming("total",1):
img=pl.get_frame() # 获取当前帧
det_boxes,parse_res=fp.run(img) # 推理当前帧
fp.draw_result(pl,det_boxes,parse_res) # 绘制当前帧推理结果
pl.show_image() # 展示推理效果
gc.collect()
except Exception as e:
sys.print_exception(e)
finally:
fp.face_det.deinit()
fp.face_parse.deinit()
pl.destroy()
2.7. 人脸姿态
from libs.PipeLine import PipeLine, ScopedTiming
from libs.AIBase import AIBase
from libs.AI2D import Ai2d
import os
import ujson
from media.media import *
from time import *
import nncase_runtime as nn
import ulab.numpy as np
import time
import image
import aidemo
import random
import gc
import sys
# 自定义人脸检测任务类
class FaceDetApp(AIBase):
def __init__(self,kmodel_path,model_input_size,anchors,confidence_threshold=0.25,nms_threshold=0.3,rgb888p_size=[1280,720],display_size=[1920,1080],debug_mode=0):
super().__init__(kmodel_path,model_input_size,rgb888p_size,debug_mode)
# kmodel路径
self.kmodel_path=kmodel_path
# 检测模型输入分辨率
self.model_input_size=model_input_size
# 置信度阈值
self.confidence_threshold=confidence_threshold
# nms阈值
self.nms_threshold=nms_threshold
self.anchors=anchors
# sensor给到AI的图像分辨率,宽16字节对齐
self.rgb888p_size=[ALIGN_UP(rgb888p_size[0],16),rgb888p_size[1]]
# 视频输出VO分辨率,宽16字节对齐
self.display_size=[ALIGN_UP(display_size[0],16),display_size[1]]
# debug模式
self.debug_mode=debug_mode
# 实例化Ai2d,用于实现模型预处理
self.ai2d=Ai2d(debug_mode)
# 设置Ai2d的输入输出格式和类型
self.ai2d.set_ai2d_dtype(nn.ai2d_format.NCHW_FMT,nn.ai2d_format.NCHW_FMT,np.uint8, np.uint8)
# 配置预处理操作,这里使用了pad和resize,Ai2d支持crop/shift/pad/resize/affine,具体代码请打开/sdcard/app/libs/AI2D.py查看
def config_preprocess(self,input_image_size=None):
with ScopedTiming("set preprocess config",self.debug_mode > 0):
# 初始化ai2d预处理配置,默认为sensor给到AI的尺寸,可以通过设置input_image_size自行修改输入尺寸
ai2d_input_size=input_image_size if input_image_size else self.rgb888p_size
# 计算padding参数,并设置padding预处理
self.ai2d.pad(self.get_pad_param(), 0, [104,117,123])
# 设置resize预处理
self.ai2d.resize(nn.interp_method.tf_bilinear, nn.interp_mode.half_pixel)
# 构建预处理流程,参数为预处理输入tensor的shape和预处理输出的tensor的shape
self.ai2d.build([1,3,ai2d_input_size[1],ai2d_input_size[0]],[1,3,self.model_input_size[1],self.model_input_size[0]])
# 自定义后处理,results是模型输出的array列表,这里使用了aidemo库的face_det_post_process接口
def postprocess(self,results):
with ScopedTiming("postprocess",self.debug_mode > 0):
res = aidemo.face_det_post_process(self.confidence_threshold,self.nms_threshold,self.model_input_size[0],self.anchors,self.rgb888p_size,results)
if len(res)==0:
return res
else:
return res[0]
# 计算padding参数
def get_pad_param(self):
dst_w = self.model_input_size[0]
dst_h = self.model_input_size[1]
# 计算最小的缩放比例,等比例缩放
ratio_w = dst_w / self.rgb888p_size[0]
ratio_h = dst_h / self.rgb888p_size[1]
if ratio_w < ratio_h:
ratio = ratio_w
else:
ratio = ratio_h
new_w = (int)(ratio * self.rgb888p_size[0])
new_h = (int)(ratio * self.rgb888p_size[1])
dw = (dst_w - new_w) / 2
dh = (dst_h - new_h) / 2
top = (int)(round(0))
bottom = (int)(round(dh * 2 + 0.1))
left = (int)(round(0))
right = (int)(round(dw * 2 - 0.1))
return [0,0,0,0,top, bottom, left, right]
# 自定义人脸姿态任务类
class FacePoseApp(AIBase):
def __init__(self,kmodel_path,model_input_size,rgb888p_size=[1920,1080],display_size=[1920,1080],debug_mode=0):
super().__init__(kmodel_path,model_input_size,rgb888p_size,debug_mode)
# kmodel路径
self.kmodel_path=kmodel_path
# 人脸姿态模型输入分辨率
self.model_input_size=model_input_size
# sensor给到AI的图像分辨率,宽16字节对齐
self.rgb888p_size=[ALIGN_UP(rgb888p_size[0],16),rgb888p_size[1]]
# 视频输出VO分辨率,宽16字节对齐
self.display_size=[ALIGN_UP(display_size[0],16),display_size[1]]
# debug模式
self.debug_mode=debug_mode
# 实例化Ai2d,用于实现模型预处理
self.ai2d=Ai2d(debug_mode)
# 设置Ai2d的输入输出格式和类型
self.ai2d.set_ai2d_dtype(nn.ai2d_format.NCHW_FMT,nn.ai2d_format.NCHW_FMT,np.uint8, np.uint8)
# 配置预处理操作,这里使用了affine,Ai2d支持crop/shift/pad/resize/affine,具体代码请打开/sdcard/app/libs/AI2D.py查看
def config_preprocess(self,det,input_image_size=None):
with ScopedTiming("set preprocess config",self.debug_mode > 0):
# 初始化ai2d预处理配置,默认为sensor给到AI的尺寸,可以通过设置input_image_size自行修改输入尺寸
ai2d_input_size=input_image_size if input_image_size else self.rgb888p_size
# 计算affine矩阵并设置affine预处理
matrix_dst = self.get_affine_matrix(det)
self.ai2d.affine(nn.interp_method.cv2_bilinear,0, 0, 127, 1,matrix_dst)
# 构建预处理流程,参数为预处理输入tensor的shape和预处理输出的tensor的shape
self.ai2d.build([1,3,ai2d_input_size[1],ai2d_input_size[0]],[1,3,self.model_input_size[1],self.model_input_size[0]])
# 自定义后处理,results是模型输出的array列表,计算欧拉角
def postprocess(self,results):
with ScopedTiming("postprocess",self.debug_mode > 0):
R,eular = self.get_euler(results[0][0])
return R,eular
def get_affine_matrix(self,bbox):
# 获取仿射矩阵,用于将边界框映射到模型输入空间
with ScopedTiming("get_affine_matrix", self.debug_mode > 1):
# 设置缩放因子
factor = 2.7
# 从边界框提取坐标和尺寸
x1, y1, w, h = map(lambda x: int(round(x, 0)), bbox[:4])
# 模型输入大小
edge_size = self.model_input_size[1]
# 平移距离,使得模型输入空间的中心对准原点
trans_distance = edge_size / 2.0
# 计算边界框中心点的坐标
center_x = x1 + w / 2.0
center_y = y1 + h / 2.0
# 计算最大边长
maximum_edge = factor * (h if h > w else w)
# 计算缩放比例
scale = edge_size * 2.0 / maximum_edge
# 计算平移参数
cx = trans_distance - scale * center_x
cy = trans_distance - scale * center_y
# 创建仿射矩阵
affine_matrix = [scale, 0, cx, 0, scale, cy]
return affine_matrix
def rotation_matrix_to_euler_angles(self,R):
# 将旋转矩阵(3x3 矩阵)转换为欧拉角(pitch、yaw、roll)
# 计算 sin(yaw)
sy = np.sqrt(R[0, 0] ** 2 + R[1, 0] ** 2)
if sy < 1e-6:
# 若 sin(yaw) 过小,说明 pitch 接近 ±90 度
pitch = np.arctan2(-R[1, 2], R[1, 1]) * 180 / np.pi
yaw = np.arctan2(-R[2, 0], sy) * 180 / np.pi
roll = 0
else:
# 计算 pitch、yaw、roll 的角度
pitch = np.arctan2(R[2, 1], R[2, 2]) * 180 / np.pi
yaw = np.arctan2(-R[2, 0], sy) * 180 / np.pi
roll = np.arctan2(R[1, 0], R[0, 0]) * 180 / np.pi
return [pitch,yaw,roll]
def get_euler(self,data):
# 获取旋转矩阵和欧拉角
R = data[:3, :3].copy()
eular = self.rotation_matrix_to_euler_angles(R)
return R,eular
# 人脸姿态任务类
class FacePose:
def __init__(self,face_det_kmodel,face_pose_kmodel,det_input_size,pose_input_size,anchors,confidence_threshold=0.25,nms_threshold=0.3,rgb888p_size=[1280,720],display_size=[1920,1080],debug_mode=0):
# 人脸检测模型路径
self.face_det_kmodel=face_det_kmodel
# 人脸姿态模型路径
self.face_pose_kmodel=face_pose_kmodel
# 人脸检测模型输入分辨率
self.det_input_size=det_input_size
# 人脸姿态模型输入分辨率
self.pose_input_size=pose_input_size
# anchors
self.anchors=anchors
# 置信度阈值
self.confidence_threshold=confidence_threshold
# nms阈值
self.nms_threshold=nms_threshold
# sensor给到AI的图像分辨率,宽16字节对齐
self.rgb888p_size=[ALIGN_UP(rgb888p_size[0],16),rgb888p_size[1]]
# 视频输出VO分辨率,宽16字节对齐
self.display_size=[ALIGN_UP(display_size[0],16),display_size[1]]
# debug_mode模式
self.debug_mode=debug_mode
self.face_det=FaceDetApp(self.face_det_kmodel,model_input_size=self.det_input_size,anchors=self.anchors,confidence_threshold=self.confidence_threshold,nms_threshold=self.nms_threshold,rgb888p_size=self.rgb888p_size,display_size=self.display_size,debug_mode=0)
self.face_pose=FacePoseApp(self.face_pose_kmodel,model_input_size=self.pose_input_size,rgb888p_size=self.rgb888p_size,display_size=self.display_size)
self.face_det.config_preprocess()
# run函数
def run(self,input_np):
# 人脸检测
det_boxes=self.face_det.run(input_np)
pose_res=[]
for det_box in det_boxes:
# 对检测到的每一个人脸做人脸姿态估计
self.face_pose.config_preprocess(det_box)
R,eular=self.face_pose.run(input_np)
pose_res.append((R,eular))
return det_boxes,pose_res
# 绘制人脸姿态角效果
def draw_result(self,pl,dets,pose_res):
pl.osd_img.clear()
if dets:
draw_img_np = np.zeros((self.display_size[1],self.display_size[0],4),dtype=np.uint8)
draw_img=image.Image(self.display_size[0], self.display_size[1], image.ARGB8888,alloc=image.ALLOC_REF,data=draw_img_np)
line_color = np.array([255, 0, 0 ,255],dtype=np.uint8) #bgra
for i,det in enumerate(dets):
# (1)获取人脸姿态矩阵和欧拉角
projections,center_point = self.build_projection_matrix(det)
R,euler = pose_res[i]
# (2)遍历人脸投影矩阵的关键点,进行投影,并将结果画在图像上
first_points = []
second_points = []
for pp in range(8):
sum_x, sum_y = 0.0, 0.0
for cc in range(3):
sum_x += projections[pp][cc] * R[cc][0]
sum_y += projections[pp][cc] * (-R[cc][1])
center_x,center_y = center_point[0],center_point[1]
x = (sum_x + center_x) / self.rgb888p_size[0] * self.display_size[0]
y = (sum_y + center_y) / self.rgb888p_size[1] * self.display_size[1]
x = max(0, min(x, self.display_size[0]))
y = max(0, min(y, self.display_size[1]))
if pp < 4:
first_points.append((x, y))
else:
second_points.append((x, y))
first_points = np.array(first_points,dtype=np.float)
aidemo.polylines(draw_img_np,first_points,True,line_color,2,8,0)
second_points = np.array(second_points,dtype=np.float)
aidemo.polylines(draw_img_np,second_points,True,line_color,2,8,0)
for ll in range(4):
x0, y0 = int(first_points[ll][0]),int(first_points[ll][1])
x1, y1 = int(second_points[ll][0]),int(second_points[ll][1])
draw_img.draw_line(x0, y0, x1, y1, color = (255, 0, 0 ,255), thickness = 2)
pl.osd_img.copy_from(draw_img)
def build_projection_matrix(self,det):
x1, y1, w, h = map(lambda x: int(round(x, 0)), det[:4])
# 计算边界框中心坐标
center_x = x1 + w / 2.0
center_y = y1 + h / 2.0
# 定义后部(rear)和前部(front)的尺寸和深度
rear_width = 0.5 * w
rear_height = 0.5 * h
rear_depth = 0
factor = np.sqrt(2.0)
front_width = factor * rear_width
front_height = factor * rear_height
front_depth = factor * rear_width # 使用宽度来计算深度,也可以使用高度,取决于需求
# 定义立方体的顶点坐标
temp = [
[-rear_width, -rear_height, rear_depth],
[-rear_width, rear_height, rear_depth],
[rear_width, rear_height, rear_depth],
[rear_width, -rear_height, rear_depth],
[-front_width, -front_height, front_depth],
[-front_width, front_height, front_depth],
[front_width, front_height, front_depth],
[front_width, -front_height, front_depth]
]
projections = np.array(temp)
# 返回投影矩阵和中心坐标
return projections, (center_x, center_y)
if __name__=="__main__":
# 显示模式,默认"hdmi",可以选择"hdmi"和"lcd"
display_mode="hdmi"
if display_mode=="hdmi":
display_size=[1920,1080]
else:
display_size=[800,480]
# 人脸检测模型路径
face_det_kmodel_path="/sdcard/app/tests/kmodel/face_detection_320.kmodel"
# 人脸姿态模型路径
face_pose_kmodel_path="/sdcard/app/tests/kmodel/face_pose.kmodel"
# 其它参数
anchors_path="/sdcard/app/tests/utils/prior_data_320.bin"
rgb888p_size=[1920,1080]
face_det_input_size=[320,320]
face_pose_input_size=[120,120]
confidence_threshold=0.5
nms_threshold=0.2
anchor_len=4200
det_dim=4
anchors = np.fromfile(anchors_path, dtype=np.float)
anchors = anchors.reshape((anchor_len,det_dim))
# 初始化PipeLine,只关注传给AI的图像分辨率,显示的分辨率
pl=PipeLine(rgb888p_size=rgb888p_size,display_size=display_size,display_mode=display_mode)
pl.create()
fp=FacePose(face_det_kmodel_path,face_pose_kmodel_path,det_input_size=face_det_input_size,pose_input_size=face_pose_input_size,anchors=anchors,confidence_threshold=confidence_threshold,nms_threshold=nms_threshold,rgb888p_size=rgb888p_size,display_size=display_size)
try:
while True:
os.exitpoint()
with ScopedTiming("total",1):
img=pl.get_frame() # 获取当前帧
det_boxes,pose_res=fp.run(img) # 推理当前帧
fp.draw_result(pl,det_boxes,pose_res) # 绘制推理效果
pl.show_image() # 展示推理效果
gc.collect()
except Exception as e:
sys.print_exception(e)
finally:
fp.face_det.deinit()
fp.face_pose.deinit()
pl.destroy()
2.8. 人脸识别
from libs.PipeLine import PipeLine, ScopedTiming
from libs.AIBase import AIBase
from libs.AI2D import Ai2d
import os
import ujson
from media.media import *
from time import *
import nncase_runtime as nn
import ulab.numpy as np
import time
import image
import aidemo
import random
import gc
import sys
import math
# 自定义人脸检测任务类
class FaceDetApp(AIBase):
def __init__(self,kmodel_path,model_input_size,anchors,confidence_threshold=0.25,nms_threshold=0.3,rgb888p_size=[1920,1080],display_size=[1920,1080],debug_mode=0):
super().__init__(kmodel_path,model_input_size,rgb888p_size,debug_mode)
# kmodel路径
self.kmodel_path=kmodel_path
# 检测模型输入分辨率
self.model_input_size=model_input_size
# 置信度阈值
self.confidence_threshold=confidence_threshold
# nms阈值
self.nms_threshold=nms_threshold
self.anchors=anchors
# sensor给到AI的图像分辨率,宽16字节对齐
self.rgb888p_size=[ALIGN_UP(rgb888p_size[0],16),rgb888p_size[1]]
# 视频输出VO分辨率,宽16字节对齐
self.display_size=[ALIGN_UP(display_size[0],16),display_size[1]]
# debug模式
self.debug_mode=debug_mode
# 实例化Ai2d,用于实现模型预处理
self.ai2d=Ai2d(debug_mode)
# 设置Ai2d的输入输出格式和类型
self.ai2d.set_ai2d_dtype(nn.ai2d_format.NCHW_FMT,nn.ai2d_format.NCHW_FMT,np.uint8, np.uint8)
# 配置预处理操作,这里使用了pad和resize,Ai2d支持crop/shift/pad/resize/affine,具体代码请打开/sdcard/app/libs/AI2D.py查看
def config_preprocess(self,input_image_size=None):
with ScopedTiming("set preprocess config",self.debug_mode > 0):
# 初始化ai2d预处理配置,默认为sensor给到AI的尺寸,可以通过设置input_image_size自行修改输入尺寸
ai2d_input_size=input_image_size if input_image_size else self.rgb888p_size
# 计算padding参数,并设置padding预处理
self.ai2d.pad(self.get_pad_param(), 0, [104,117,123])
# 设置resize预处理
self.ai2d.resize(nn.interp_method.tf_bilinear, nn.interp_mode.half_pixel)
# 构建预处理流程,参数为预处理输入tensor的shape和预处理输出的tensor的shape
self.ai2d.build([1,3,ai2d_input_size[1],ai2d_input_size[0]],[1,3,self.model_input_size[1],self.model_input_size[0]])
# 自定义后处理,results是模型输出的array列表,这里使用了aidemo库的face_det_post_process接口
def postprocess(self,results):
with ScopedTiming("postprocess",self.debug_mode > 0):
res = aidemo.face_det_post_process(self.confidence_threshold,self.nms_threshold,self.model_input_size[0],self.anchors,self.rgb888p_size,results)
if len(res)==0:
return res,res
else:
return res[0],res[1]
def get_pad_param(self):
dst_w = self.model_input_size[0]
dst_h = self.model_input_size[1]
# 计算最小的缩放比例,等比例缩放
ratio_w = dst_w / self.rgb888p_size[0]
ratio_h = dst_h / self.rgb888p_size[1]
if ratio_w < ratio_h:
ratio = ratio_w
else:
ratio = ratio_h
new_w = (int)(ratio * self.rgb888p_size[0])
new_h = (int)(ratio * self.rgb888p_size[1])
dw = (dst_w - new_w) / 2
dh = (dst_h - new_h) / 2
top = (int)(round(0))
bottom = (int)(round(dh * 2 + 0.1))
left = (int)(round(0))
right = (int)(round(dw * 2 - 0.1))
return [0,0,0,0,top, bottom, left, right]
# 自定义人脸注册任务类
class FaceRegistrationApp(AIBase):
def __init__(self,kmodel_path,model_input_size,rgb888p_size=[1920,1080],display_size=[1920,1080],debug_mode=0):
super().__init__(kmodel_path,model_input_size,rgb888p_size,debug_mode)
# kmodel路径
self.kmodel_path=kmodel_path
# 检测模型输入分辨率
self.model_input_size=model_input_size
# sensor给到AI的图像分辨率,宽16字节对齐
self.rgb888p_size=[ALIGN_UP(rgb888p_size[0],16),rgb888p_size[1]]
# 视频输出VO分辨率,宽16字节对齐
self.display_size=[ALIGN_UP(display_size[0],16),display_size[1]]
# debug模式
self.debug_mode=debug_mode
# 标准5官
self.umeyama_args_112 = [
38.2946 , 51.6963 ,
73.5318 , 51.5014 ,
56.0252 , 71.7366 ,
41.5493 , 92.3655 ,
70.7299 , 92.2041
]
self.ai2d=Ai2d(debug_mode)
self.ai2d.set_ai2d_dtype(nn.ai2d_format.NCHW_FMT,nn.ai2d_format.NCHW_FMT,np.uint8, np.uint8)
# 配置预处理操作,这里使用了affine,Ai2d支持crop/shift/pad/resize/affine,具体代码请打开/sdcard/app/libs/AI2D.py查看
def config_preprocess(self,landm,input_image_size=None):
with ScopedTiming("set preprocess config",self.debug_mode > 0):
ai2d_input_size=input_image_size if input_image_size else self.rgb888p_size
# 计算affine矩阵,并设置仿射变换预处理
affine_matrix = self.get_affine_matrix(landm)
self.ai2d.affine(nn.interp_method.cv2_bilinear,0, 0, 127, 1,affine_matrix)
# 构建预处理流程,参数为预处理输入tensor的shape和预处理输出的tensor的shape
self.ai2d.build([1,3,ai2d_input_size[1],ai2d_input_size[0]],[1,3,self.model_input_size[1],self.model_input_size[0]])
# 自定义后处理
def postprocess(self,results):
with ScopedTiming("postprocess",self.debug_mode > 0):
return results[0][0]
def svd22(self,a):
# svd
s = [0.0, 0.0]
u = [0.0, 0.0, 0.0, 0.0]
v = [0.0, 0.0, 0.0, 0.0]
s[0] = (math.sqrt((a[0] - a[3]) ** 2 + (a[1] + a[2]) ** 2) + math.sqrt((a[0] + a[3]) ** 2 + (a[1] - a[2]) ** 2)) / 2
s[1] = abs(s[0] - math.sqrt((a[0] - a[3]) ** 2 + (a[1] + a[2]) ** 2))
v[2] = math.sin((math.atan2(2 * (a[0] * a[1] + a[2] * a[3]), a[0] ** 2 - a[1] ** 2 + a[2] ** 2 - a[3] ** 2)) / 2) if \
s[0] > s[1] else 0
v[0] = math.sqrt(1 - v[2] ** 2)
v[1] = -v[2]
v[3] = v[0]
u[0] = -(a[0] * v[0] + a[1] * v[2]) / s[0] if s[0] != 0 else 1
u[2] = -(a[2] * v[0] + a[3] * v[2]) / s[0] if s[0] != 0 else 0
u[1] = (a[0] * v[1] + a[1] * v[3]) / s[1] if s[1] != 0 else -u[2]
u[3] = (a[2] * v[1] + a[3] * v[3]) / s[1] if s[1] != 0 else u[0]
v[0] = -v[0]
v[2] = -v[2]
return u, s, v
def image_umeyama_112(self,src):
# 使用Umeyama算法计算仿射变换矩阵
SRC_NUM = 5
SRC_DIM = 2
src_mean = [0.0, 0.0]
dst_mean = [0.0, 0.0]
for i in range(0,SRC_NUM * 2,2):
src_mean[0] += src[i]
src_mean[1] += src[i + 1]
dst_mean[0] += self.umeyama_args_112[i]
dst_mean[1] += self.umeyama_args_112[i + 1]
src_mean[0] /= SRC_NUM
src_mean[1] /= SRC_NUM
dst_mean[0] /= SRC_NUM
dst_mean[1] /= SRC_NUM
src_demean = [[0.0, 0.0] for _ in range(SRC_NUM)]
dst_demean = [[0.0, 0.0] for _ in range(SRC_NUM)]
for i in range(SRC_NUM):
src_demean[i][0] = src[2 * i] - src_mean[0]
src_demean[i][1] = src[2 * i + 1] - src_mean[1]
dst_demean[i][0] = self.umeyama_args_112[2 * i] - dst_mean[0]
dst_demean[i][1] = self.umeyama_args_112[2 * i + 1] - dst_mean[1]
A = [[0.0, 0.0], [0.0, 0.0]]
for i in range(SRC_DIM):
for k in range(SRC_DIM):
for j in range(SRC_NUM):
A[i][k] += dst_demean[j][i] * src_demean[j][k]
A[i][k] /= SRC_NUM
T = [[1, 0, 0], [0, 1, 0], [0, 0, 1]]
U, S, V = self.svd22([A[0][0], A[0][1], A[1][0], A[1][1]])
T[0][0] = U[0] * V[0] + U[1] * V[2]
T[0][1] = U[0] * V[1] + U[1] * V[3]
T[1][0] = U[2] * V[0] + U[3] * V[2]
T[1][1] = U[2] * V[1] + U[3] * V[3]
scale = 1.0
src_demean_mean = [0.0, 0.0]
src_demean_var = [0.0, 0.0]
for i in range(SRC_NUM):
src_demean_mean[0] += src_demean[i][0]
src_demean_mean[1] += src_demean[i][1]
src_demean_mean[0] /= SRC_NUM
src_demean_mean[1] /= SRC_NUM
for i in range(SRC_NUM):
src_demean_var[0] += (src_demean_mean[0] - src_demean[i][0]) * (src_demean_mean[0] - src_demean[i][0])
src_demean_var[1] += (src_demean_mean[1] - src_demean[i][1]) * (src_demean_mean[1] - src_demean[i][1])
src_demean_var[0] /= SRC_NUM
src_demean_var[1] /= SRC_NUM
scale = 1.0 / (src_demean_var[0] + src_demean_var[1]) * (S[0] + S[1])
T[0][2] = dst_mean[0] - scale * (T[0][0] * src_mean[0] + T[0][1] * src_mean[1])
T[1][2] = dst_mean[1] - scale * (T[1][0] * src_mean[0] + T[1][1] * src_mean[1])
T[0][0] *= scale
T[0][1] *= scale
T[1][0] *= scale
T[1][1] *= scale
return T
def get_affine_matrix(self,sparse_points):
# 获取affine变换矩阵
with ScopedTiming("get_affine_matrix", self.debug_mode > 1):
# 使用Umeyama算法计算仿射变换矩阵
matrix_dst = self.image_umeyama_112(sparse_points)
matrix_dst = [matrix_dst[0][0],matrix_dst[0][1],matrix_dst[0][2],
matrix_dst[1][0],matrix_dst[1][1],matrix_dst[1][2]]
return matrix_dst
# 人脸识别任务类
class FaceRecognition:
def __init__(self,face_det_kmodel,face_reg_kmodel,det_input_size,reg_input_size,database_dir,anchors,confidence_threshold=0.25,nms_threshold=0.3,face_recognition_threshold=0.75,rgb888p_size=[1280,720],display_size=[1920,1080],debug_mode=0):
# 人脸检测模型路径
self.face_det_kmodel=face_det_kmodel
# 人脸识别模型路径
self.face_reg_kmodel=face_reg_kmodel
# 人脸检测模型输入分辨率
self.det_input_size=det_input_size
# 人脸识别模型输入分辨率
self.reg_input_size=reg_input_size
self.database_dir=database_dir
# anchors
self.anchors=anchors
# 置信度阈值
self.confidence_threshold=confidence_threshold
# nms阈值
self.nms_threshold=nms_threshold
self.face_recognition_threshold=face_recognition_threshold
# sensor给到AI的图像分辨率,宽16字节对齐
self.rgb888p_size=[ALIGN_UP(rgb888p_size[0],16),rgb888p_size[1]]
# 视频输出VO分辨率,宽16字节对齐
self.display_size=[ALIGN_UP(display_size[0],16),display_size[1]]
# debug_mode模式
self.debug_mode=debug_mode
self.max_register_face = 100 # 数据库最多人脸个数
self.feature_num = 128 # 人脸识别特征维度
self.valid_register_face = 0 # 已注册人脸数
self.db_name= []
self.db_data= []
self.face_det=FaceDetApp(self.face_det_kmodel,model_input_size=self.det_input_size,anchors=self.anchors,confidence_threshold=self.confidence_threshold,nms_threshold=self.nms_threshold,rgb888p_size=self.rgb888p_size,display_size=self.display_size,debug_mode=0)
self.face_reg=FaceRegistrationApp(self.face_reg_kmodel,model_input_size=self.reg_input_size,rgb888p_size=self.rgb888p_size,display_size=self.display_size)
self.face_det.config_preprocess()
# 人脸数据库初始化
self.database_init()
# run函数
def run(self,input_np):
# 执行人脸检测
det_boxes,landms=self.face_det.run(input_np)
recg_res = []
for landm in landms:
# 针对每个人脸五官点,推理得到人脸特征,并计算特征在数据库中相似度
self.face_reg.config_preprocess(landm)
feature=self.face_reg.run(input_np)
res = self.database_search(feature)
recg_res.append(res)
return det_boxes,recg_res
def database_init(self):
# 数据初始化,构建数据库人名列表和数据库特征列表
with ScopedTiming("database_init", self.debug_mode > 1):
db_file_list = os.listdir(self.database_dir)
for db_file in db_file_list:
if not db_file.endswith('.bin'):
continue
if self.valid_register_face >= self.max_register_face:
break
valid_index = self.valid_register_face
full_db_file = self.database_dir + db_file
with open(full_db_file, 'rb') as f:
data = f.read()
feature = np.frombuffer(data, dtype=np.float)
self.db_data.append(feature)
name = db_file.split('.')[0]
self.db_name.append(name)
self.valid_register_face += 1
def database_reset(self):
# 数据库清空
with ScopedTiming("database_reset", self.debug_mode > 1):
print("database clearing...")
self.db_name = []
self.db_data = []
self.valid_register_face = 0
print("database clear Done!")
def database_search(self,feature):
# 数据库查询
with ScopedTiming("database_search", self.debug_mode > 1):
v_id = -1
v_score_max = 0.0
# 将当前人脸特征归一化
feature /= np.linalg.norm(feature)
# 遍历当前人脸数据库,统计最高得分
for i in range(self.valid_register_face):
db_feature = self.db_data[i]
db_feature /= np.linalg.norm(db_feature)
# 计算数据库特征与当前人脸特征相似度
v_score = np.dot(feature, db_feature)/2 + 0.5
if v_score > v_score_max:
v_score_max = v_score
v_id = i
if v_id == -1:
# 数据库中无人脸
return 'unknown'
elif v_score_max < self.face_recognition_threshold:
# 小于人脸识别阈值,未识别
return 'unknown'
else:
# 识别成功
result = 'name: {}, score:{}'.format(self.db_name[v_id],v_score_max)
return result
# 绘制识别结果
def draw_result(self,pl,dets,recg_results):
pl.osd_img.clear()
if dets:
for i,det in enumerate(dets):
# (1)画人脸框
x1, y1, w, h = map(lambda x: int(round(x, 0)), det[:4])
x1 = x1 * self.display_size[0]//self.rgb888p_size[0]
y1 = y1 * self.display_size[1]//self.rgb888p_size[1]
w = w * self.display_size[0]//self.rgb888p_size[0]
h = h * self.display_size[1]//self.rgb888p_size[1]
pl.osd_img.draw_rectangle(x1,y1, w, h, color=(255,0, 0, 255), thickness = 4)
# (2)写人脸识别结果
recg_text = recg_results[i]
pl.osd_img.draw_string_advanced(x1,y1,32,recg_text,color=(255, 255, 0, 0))
if __name__=="__main__":
# 注意:执行人脸识别任务之前,需要先执行人脸注册任务进行人脸身份注册生成feature数据库
# 显示模式,默认"hdmi",可以选择"hdmi"和"lcd"
display_mode="hdmi"
if display_mode=="hdmi":
display_size=[1920,1080]
else:
display_size=[800,480]
# 人脸检测模型路径
face_det_kmodel_path="/sdcard/app/tests/kmodel/face_detection_320.kmodel"
# 人脸识别模型路径
face_reg_kmodel_path="/sdcard/app/tests/kmodel/face_recognition.kmodel"
# 其它参数
anchors_path="/sdcard/app/tests/utils/prior_data_320.bin"
database_dir ="/sdcard/app/tests/utils/db/"
rgb888p_size=[1920,1080]
face_det_input_size=[320,320]
face_reg_input_size=[112,112]
confidence_threshold=0.5
nms_threshold=0.2
anchor_len=4200
det_dim=4
anchors = np.fromfile(anchors_path, dtype=np.float)
anchors = anchors.reshape((anchor_len,det_dim))
face_recognition_threshold = 0.75 # 人脸识别阈值
# 初始化PipeLine,只关注传给AI的图像分辨率,显示的分辨率
pl=PipeLine(rgb888p_size=rgb888p_size,display_size=display_size,display_mode=display_mode)
pl.create()
fr=FaceRecognition(face_det_kmodel_path,face_reg_kmodel_path,det_input_size=face_det_input_size,reg_input_size=face_reg_input_size,database_dir=database_dir,anchors=anchors,confidence_threshold=confidence_threshold,nms_threshold=nms_threshold,face_recognition_threshold=face_recognition_threshold,rgb888p_size=rgb888p_size,display_size=display_size)
try:
while True:
os.exitpoint()
with ScopedTiming("total", 1):
img=pl.get_frame() # 获取当前帧
det_boxes,recg_res=fr.run(img) # 推理当前帧
fr.draw_result(pl,det_boxes,recg_res) # 绘制推理结果
pl.show_image() # 展示推理效果
gc.collect()
except Exception as e:
sys.print_exception(e)
finally:
fr.face_det.deinit()
fr.face_reg.deinit()
pl.destroy()
2.9. 人脸注册
from libs.PipeLine import PipeLine, ScopedTiming
from libs.AIBase import AIBase
from libs.AI2D import Ai2d
import os
import ujson
from media.media import *
from time import *
import nncase_runtime as nn
import ulab.numpy as np
import time
import image
import aidemo
import random
import gc
import sys
import math
# 自定义人脸检测任务类
class FaceDetApp(AIBase):
def __init__(self,kmodel_path,model_input_size,anchors,confidence_threshold=0.25,nms_threshold=0.3,rgb888p_size=[1280,720],display_size=[1920,1080],debug_mode=0):
super().__init__(kmodel_path,model_input_size,rgb888p_size,debug_mode)
# kmodel路径
self.kmodel_path=kmodel_path
# 检测模型输入分辨率
self.model_input_size=model_input_size
# 置信度阈值
self.confidence_threshold=confidence_threshold
# nms阈值
self.nms_threshold=nms_threshold
self.anchors=anchors
# sensor给到AI的图像分辨率,宽16字节对齐
self.rgb888p_size=[ALIGN_UP(rgb888p_size[0],16),rgb888p_size[1]]
# 视频输出VO分辨率,宽16字节对齐
self.display_size=[ALIGN_UP(display_size[0],16),display_size[1]]
# debug模式
self.debug_mode=debug_mode
# 实例化Ai2d,用于实现模型预处理
self.ai2d=Ai2d(debug_mode)
# 设置Ai2d的输入输出格式和类型
self.ai2d.set_ai2d_dtype(nn.ai2d_format.NCHW_FMT,nn.ai2d_format.NCHW_FMT,np.uint8, np.uint8)
self.image_size=[]
# 配置预处理操作,这里使用了pad和resize,Ai2d支持crop/shift/pad/resize/affine,具体代码请打开/sdcard/app/libs/AI2D.py查看
def config_preprocess(self,input_image_size=None):
with ScopedTiming("set preprocess config",self.debug_mode > 0):
# 初始化ai2d预处理配置,默认为sensor给到AI的尺寸,可以通过设置input_image_size自行修改输入尺寸
ai2d_input_size=input_image_size if input_image_size else self.rgb888p_size
self.image_size=[input_image_size[1],input_image_size[0]]
# 计算padding参数,并设置padding预处理
self.ai2d.pad(self.get_pad_param(ai2d_input_size), 0, [104,117,123])
# 设置resize预处理
self.ai2d.resize(nn.interp_method.tf_bilinear, nn.interp_mode.half_pixel)
# 构建预处理流程,参数为预处理输入tensor的shape和预处理输出的tensor的shape
self.ai2d.build([1,3,ai2d_input_size[1],ai2d_input_size[0]],[1,3,self.model_input_size[1],self.model_input_size[0]])
# 自定义后处理,results是模型输出的array列表,这里使用了aidemo库的face_det_post_process接口
def postprocess(self,results):
with ScopedTiming("postprocess",self.debug_mode > 0):
res = aidemo.face_det_post_process(self.confidence_threshold,self.nms_threshold,self.model_input_size[0],self.anchors,self.image_size,results)
if len(res)==0:
return res
else:
return res[0],res[1]
def get_pad_param(self,image_input_size):
dst_w = self.model_input_size[0]
dst_h = self.model_input_size[1]
# 计算最小的缩放比例,等比例缩放
ratio_w = dst_w / image_input_size[0]
ratio_h = dst_h / image_input_size[1]
if ratio_w < ratio_h:
ratio = ratio_w
else:
ratio = ratio_h
new_w = (int)(ratio * image_input_size[0])
new_h = (int)(ratio * image_input_size[1])
dw = (dst_w - new_w) / 2
dh = (dst_h - new_h) / 2
top = (int)(round(0))
bottom = (int)(round(dh * 2 + 0.1))
left = (int)(round(0))
right = (int)(round(dw * 2 - 0.1))
return [0,0,0,0,top, bottom, left, right]
# 自定义人脸注册任务类
class FaceRegistrationApp(AIBase):
def __init__(self,kmodel_path,model_input_size,rgb888p_size=[1920,1080],display_size=[1920,1080],debug_mode=0):
super().__init__(kmodel_path,model_input_size,rgb888p_size,debug_mode)
# kmodel路径
self.kmodel_path=kmodel_path
# 人脸注册模型输入分辨率
self.model_input_size=model_input_size
# sensor给到AI的图像分辨率,宽16字节对齐
self.rgb888p_size=[ALIGN_UP(rgb888p_size[0],16),rgb888p_size[1]]
# 视频输出VO分辨率,宽16字节对齐
self.display_size=[ALIGN_UP(display_size[0],16),display_size[1]]
# debug模式
self.debug_mode=debug_mode
# 标准5官
self.umeyama_args_112 = [
38.2946 , 51.6963 ,
73.5318 , 51.5014 ,
56.0252 , 71.7366 ,
41.5493 , 92.3655 ,
70.7299 , 92.2041
]
self.ai2d=Ai2d(debug_mode)
self.ai2d.set_ai2d_dtype(nn.ai2d_format.NCHW_FMT,nn.ai2d_format.NCHW_FMT,np.uint8, np.uint8)
# 配置预处理操作,这里使用了affine,Ai2d支持crop/shift/pad/resize/affine,具体代码请打开/sdcard/app/libs/AI2D.py查看
def config_preprocess(self,landm,input_image_size=None):
with ScopedTiming("set preprocess config",self.debug_mode > 0):
ai2d_input_size=input_image_size if input_image_size else self.rgb888p_size
# 计算affine矩阵,并设置仿射变换预处理
affine_matrix = self.get_affine_matrix(landm)
self.ai2d.affine(nn.interp_method.cv2_bilinear,0, 0, 127, 1,affine_matrix)
# 构建预处理流程,参数为预处理输入tensor的shape和预处理输出的tensor的shape
self.ai2d.build([1,3,ai2d_input_size[1],ai2d_input_size[0]],[1,3,self.model_input_size[1],self.model_input_size[0]])
# 自定义后处理
def postprocess(self,results):
with ScopedTiming("postprocess",self.debug_mode > 0):
return results[0][0]
def svd22(self,a):
# svd
s = [0.0, 0.0]
u = [0.0, 0.0, 0.0, 0.0]
v = [0.0, 0.0, 0.0, 0.0]
s[0] = (math.sqrt((a[0] - a[3]) ** 2 + (a[1] + a[2]) ** 2) + math.sqrt((a[0] + a[3]) ** 2 + (a[1] - a[2]) ** 2)) / 2
s[1] = abs(s[0] - math.sqrt((a[0] - a[3]) ** 2 + (a[1] + a[2]) ** 2))
v[2] = math.sin((math.atan2(2 * (a[0] * a[1] + a[2] * a[3]), a[0] ** 2 - a[1] ** 2 + a[2] ** 2 - a[3] ** 2)) / 2) if \
s[0] > s[1] else 0
v[0] = math.sqrt(1 - v[2] ** 2)
v[1] = -v[2]
v[3] = v[0]
u[0] = -(a[0] * v[0] + a[1] * v[2]) / s[0] if s[0] != 0 else 1
u[2] = -(a[2] * v[0] + a[3] * v[2]) / s[0] if s[0] != 0 else 0
u[1] = (a[0] * v[1] + a[1] * v[3]) / s[1] if s[1] != 0 else -u[2]
u[3] = (a[2] * v[1] + a[3] * v[3]) / s[1] if s[1] != 0 else u[0]
v[0] = -v[0]
v[2] = -v[2]
return u, s, v
def image_umeyama_112(self,src):
# 使用Umeyama算法计算仿射变换矩阵
SRC_NUM = 5
SRC_DIM = 2
src_mean = [0.0, 0.0]
dst_mean = [0.0, 0.0]
for i in range(0,SRC_NUM * 2,2):
src_mean[0] += src[i]
src_mean[1] += src[i + 1]
dst_mean[0] += self.umeyama_args_112[i]
dst_mean[1] += self.umeyama_args_112[i + 1]
src_mean[0] /= SRC_NUM
src_mean[1] /= SRC_NUM
dst_mean[0] /= SRC_NUM
dst_mean[1] /= SRC_NUM
src_demean = [[0.0, 0.0] for _ in range(SRC_NUM)]
dst_demean = [[0.0, 0.0] for _ in range(SRC_NUM)]
for i in range(SRC_NUM):
src_demean[i][0] = src[2 * i] - src_mean[0]
src_demean[i][1] = src[2 * i + 1] - src_mean[1]
dst_demean[i][0] = self.umeyama_args_112[2 * i] - dst_mean[0]
dst_demean[i][1] = self.umeyama_args_112[2 * i + 1] - dst_mean[1]
A = [[0.0, 0.0], [0.0, 0.0]]
for i in range(SRC_DIM):
for k in range(SRC_DIM):
for j in range(SRC_NUM):
A[i][k] += dst_demean[j][i] * src_demean[j][k]
A[i][k] /= SRC_NUM
T = [[1, 0, 0], [0, 1, 0], [0, 0, 1]]
U, S, V = self.svd22([A[0][0], A[0][1], A[1][0], A[1][1]])
T[0][0] = U[0] * V[0] + U[1] * V[2]
T[0][1] = U[0] * V[1] + U[1] * V[3]
T[1][0] = U[2] * V[0] + U[3] * V[2]
T[1][1] = U[2] * V[1] + U[3] * V[3]
scale = 1.0
src_demean_mean = [0.0, 0.0]
src_demean_var = [0.0, 0.0]
for i in range(SRC_NUM):
src_demean_mean[0] += src_demean[i][0]
src_demean_mean[1] += src_demean[i][1]
src_demean_mean[0] /= SRC_NUM
src_demean_mean[1] /= SRC_NUM
for i in range(SRC_NUM):
src_demean_var[0] += (src_demean_mean[0] - src_demean[i][0]) * (src_demean_mean[0] - src_demean[i][0])
src_demean_var[1] += (src_demean_mean[1] - src_demean[i][1]) * (src_demean_mean[1] - src_demean[i][1])
src_demean_var[0] /= SRC_NUM
src_demean_var[1] /= SRC_NUM
scale = 1.0 / (src_demean_var[0] + src_demean_var[1]) * (S[0] + S[1])
T[0][2] = dst_mean[0] - scale * (T[0][0] * src_mean[0] + T[0][1] * src_mean[1])
T[1][2] = dst_mean[1] - scale * (T[1][0] * src_mean[0] + T[1][1] * src_mean[1])
T[0][0] *= scale
T[0][1] *= scale
T[1][0] *= scale
T[1][1] *= scale
return T
def get_affine_matrix(self,sparse_points):
# 获取affine变换矩阵
with ScopedTiming("get_affine_matrix", self.debug_mode > 1):
# 使用Umeyama算法计算仿射变换矩阵
matrix_dst = self.image_umeyama_112(sparse_points)
matrix_dst = [matrix_dst[0][0],matrix_dst[0][1],matrix_dst[0][2],
matrix_dst[1][0],matrix_dst[1][1],matrix_dst[1][2]]
return matrix_dst
# 人脸注册任务类
class FaceRegistration:
def __init__(self,face_det_kmodel,face_reg_kmodel,det_input_size,reg_input_size,database_dir,anchors,confidence_threshold=0.25,nms_threshold=0.3,rgb888p_size=[1280,720],display_size=[1920,1080],debug_mode=0):
# 人脸检测模型路径
self.face_det_kmodel=face_det_kmodel
# 人脸注册模型路径
self.face_reg_kmodel=face_reg_kmodel
# 人脸检测模型输入分辨率
self.det_input_size=det_input_size
# 人脸注册模型输入分辨率
self.reg_input_size=reg_input_size
self.database_dir=database_dir
# anchors
self.anchors=anchors
# 置信度阈值
self.confidence_threshold=confidence_threshold
# nms阈值
self.nms_threshold=nms_threshold
# sensor给到AI的图像分辨率,宽16字节对齐
self.rgb888p_size=[ALIGN_UP(rgb888p_size[0],16),rgb888p_size[1]]
# 视频输出VO分辨率,宽16字节对齐
self.display_size=[ALIGN_UP(display_size[0],16),display_size[1]]
# debug_mode模式
self.debug_mode=debug_mode
self.face_det=FaceDetApp(self.face_det_kmodel,model_input_size=self.det_input_size,anchors=self.anchors,confidence_threshold=self.confidence_threshold,nms_threshold=self.nms_threshold,debug_mode=0)
self.face_reg=FaceRegistrationApp(self.face_reg_kmodel,model_input_size=self.reg_input_size,rgb888p_size=self.rgb888p_size)
# run函数
def run(self,input_np,img_file):
self.face_det.config_preprocess(input_image_size=[input_np.shape[3],input_np.shape[2]])
det_boxes,landms=self.face_det.run(input_np)
if det_boxes:
if det_boxes.shape[0] == 1:
# 若是只检测到一张人脸,则将该人脸注册到数据库
db_i_name = img_file.split('.')[0]
for landm in landms:
self.face_reg.config_preprocess(landm,input_image_size=[input_np.shape[3],input_np.shape[2]])
reg_result = self.face_reg.run(input_np)
with open(self.database_dir+'{}.bin'.format(db_i_name), "wb") as file:
file.write(reg_result.tobytes())
print('Success!')
else:
print('Only one person in a picture when you sign up')
else:
print('No person detected')
def image2rgb888array(self,img): #4维
# 将Image转换为rgb888格式
with ScopedTiming("fr_kpu_deinit",self.debug_mode > 0):
img_data_rgb888=img.to_rgb888()
# hwc,rgb888
img_hwc=img_data_rgb888.to_numpy_ref()
shape=img_hwc.shape
img_tmp = img_hwc.reshape((shape[0] * shape[1], shape[2]))
img_tmp_trans = img_tmp.transpose()
img_res=img_tmp_trans.copy()
# chw,rgb888
img_return=img_res.reshape((1,shape[2],shape[0],shape[1]))
return img_return
if __name__=="__main__":
# 人脸检测模型路径
face_det_kmodel_path="/sdcard/app/tests/kmodel/face_detection_320.kmodel"
# 人脸注册模型路径
face_reg_kmodel_path="/sdcard/app/tests/kmodel/face_recognition.kmodel"
# 其它参数
anchors_path="/sdcard/app/tests/utils/prior_data_320.bin"
database_dir="/sdcard/app/tests/utils/db/"
database_img_dir="/sdcard/app/tests/utils/db_img/"
face_det_input_size=[320,320]
face_reg_input_size=[112,112]
confidence_threshold=0.5
nms_threshold=0.2
anchor_len=4200
det_dim=4
anchors = np.fromfile(anchors_path, dtype=np.float)
anchors = anchors.reshape((anchor_len,det_dim))
max_register_face = 100 #数据库最多人脸个数
feature_num = 128 #人脸识别特征维度
fr=FaceRegistration(face_det_kmodel_path,face_reg_kmodel_path,det_input_size=face_det_input_size,reg_input_size=face_reg_input_size,database_dir=database_dir,anchors=anchors,confidence_threshold=confidence_threshold,nms_threshold=nms_threshold)
try:
# 获取图像列表
img_list = os.listdir(database_img_dir)
for img_file in img_list:
#本地读取一张图像
full_img_file = database_img_dir + img_file
print(full_img_file)
img = image.Image(full_img_file)
img.compress_for_ide()
# 转rgb888的chw格式
rgb888p_img_ndarry = fr.image2rgb888array(img)
# 人脸注册
fr.run(rgb888p_img_ndarry,img_file)
gc.collect()
except Exception as e:
sys.print_exception(e)
finally:
fr.face_det.deinit()
fr.face_reg.deinit()