This project is currently under development approved by the CONACyT Nr. 257295
In the last 15 years, researchers of the Computer Vision (C.V.) and Robotics Communities have been working trying to solve real-time object detection and classification of different object categories. The progress so far has been enormous, especially in application domains such as image retrieval. However, there are many limitations. For instance we still cannot simply tell a robot to go to an office/room/kitchen, and to detect, classify and calculate the pose of all the objects inside. This is a very complex unsolved problem. In fields such as in Robotics, we need to locate objects with a high precision, rather than only an approximate bounding box, as it is usually done in C.V. It is also necessary to detect them in real-time, and to have some basic general knowledge of each object, such as areas where the object can be grasped or preferred position to place it.
When evaluating object detection and pose estimation, it’s hard not to compare with the performance of the human brain. However, we typically let computer vision algorithms to operate under much harder conditions. In C.V. attempts to train methods, usually static object categories are used, and the methods published so far are normally tested on a single dataset, or very similar datasets, where over fitting is a common issue.
a) This project will start closing the huge gap between robotics, computer vision and everyday tasks. This by pioneering on algorithms based on Context Free Grammars (C.F.G.), Language Relations, and 3D object representation.
b) A second objective is to create a knowledge base to be available online to serve as dictionary running on real-time, for every device needing objects information. It will be a repository of real 3D object models, and their related general information for the purpose of accurate object detection and pose estimation.
- Innovative methods for combining natural language with visual information.
- Innovative methods using dynamic trees, branch and bound, and C.F.G.
- Pioneering algorithms for simultaneous 3D object detection, segmentation and metadata transfer.
- Accurate algorithms for scanning objects in average computers.
- To create a dynamic hierarchical Knowledge Base of visual information, verbs, and natural language.