Summary:
This article covers what you need to build effective models with FLEX AI.
Training Requirements and Recommendations:
- A minimum of 20 annotated objects or training images are needed for an efficient detection model (at least 30 marked images are recommended for the initial training of our detection model).
- About 100 annotated objects or training images are recommended for a more robust model.
- Draw the bounding box around the object as close as possible (try to not leave any margin).
- All target objects or training images must be labeled (i.e. frames chosen without annotation cannot be used in a data set).
- Ensure you are marking all things considered the same object to yours within a frame (the algorithm may not preform well if all similar objects within a frame are not properly included for training custom object detection of your desired object).
Mark all instances of your object within the paused frame. If this information is not provided to the algorithm, missed detections can occur. The detection being trained needs to know all variations for the same type of object. If you do not mark an instance, the algorithm may consider that the unmarked variation is not the same type of the object to be detected.
- Use video clips that contain several different angles, perspectives, and lighting of the Object Of Interest (OOI) (preferably from the same camera for which the training is being done).
- Use clips where the annotated object is not partially hidden so that the AI can learn what it really looks like (i.e. if you were teaching it what a person looks like, then annotating just an arm sticking out from behind a tree is not helpful).
NOTE: After you click Train, the data you have created is sent to the cloud and
the algorithm is trained to detect your object. This process can take anywhere
from 30min to 1hr and depends on the number of training images you included.
Performance Issues:
The following may cause object detection to have performance issues:
- Challenging background conditions due to low light or changing weather (ex: rain, snow, sunshine).
- Object size difference within the field of view of the camera, which is different from the size used while training.
- Parts of the object are covered or obstructed.
- Objects are in high density crowds or occluded.
- Stacking (E.g.: a model trained on single shopping carts will have issues detecting carts stacked in a corral).
- Object is moving too fast.
Camera Recommendations:
- Video footage used to train the model should come from the camera in which you intend to have the custom object detection deployed.
- Object detection models trained from video footage of one angle may not work when applied to a camera with different angles, perspectives, lighting or other environmental differences.
- The Field of View (FoV) should show your object with a minimum size of 20px x 20px.
- Installed cameras should be at normal video surveillance view (CCTV view of 45 degrees or larger).
- Ensure all firmware is updated to the latest for each device.
Refer to FLEX AI: Which cameras are compatible? for camera and device compatibility limitations.
Limitations:
FLEX AI has the following limitations:
- Cannot detect non-solid objects such as gas/vapor, liquid, smoke, etc.
- Cannot deploy more than one custom model to a camera at a time. Detecting multiple objects currently requires multiple cameras.
- Cannot support a single model that includes multiple objects (E.g.: hardhat + goggles + vest) and identify a missing item.
- Cannot be used for identification or recognition of an object (E.g.: specific people, faces, animals, etc.).
- Cannot be used for detecting the orientation of objects (E.g.: a cart that is facing left or right).
- Cannot distinguish colors. FLEX AI does not take color into account.
- Cannot be used for detecting the fine-grain objects (ex: golden retriever vs dog, tesla model Y vs vehicle).
Processing Times:
- FLEX AI is a cloud-based application and requires internet access and processing time.
- The algorithm takes the images you made, trains our object detection model to detect your desired object, processes the model to show you simulated performance on videos you provided, and then packages the model to work on the camera.
- Each step can take time; the initial training can take about an hour (dependent on the number of marked images you provided).
- When the training is completed, we process the model to detect the object with the videos provided in the video library of the project. When the videos become available, their buttons become active with the Ready to View status.
- We package the model for the camera and the Download button becomes active when the file is ready.