Method for Clustering and Identification of Elements of Real Engineering Sites on The Basis of Dynamic Logic

Currently there is a contradiction between availability of various new equipment, which provides a stream of digital video data, in particular in the form of point clouds from mobile laser scanning, and the lack of adequate efficient methods of information extraction and analysis. This project is aimed at resolving this contradiction on the basis of neural modeling field theory and dynamic logic (DL) proposed by L. I. Perlovsky. The main result of the project will be a method of extracting information from digital video data in the form of hybrid clouds of mobile laser scanning data points for their analysis based on neural modeling field theory and DL. The success of this project depends on the successful integration of approaches from various fields of science and technology (interdisciplinarity): artificial intelligence, pattern and object recognition, logic, algorithm theory. The significance of the development of the proposed method is to create a fundamental theoretical basis for new application algorithms and software in the field of autonomous driving, “smart city” projects, ensuring safety for sites of various purposes, etc. The scientific novelty of the proposed method is that it will solve, by a fairly new method, the relevant problem of extracting and analyzing information from a not particularly traditional type of digital video data represented by a hybrid cloud of laser scanning points. This will allow to significantly expand

can lead to new formalizations of the cognition process. Thus in [4] a generalization of the theory of NMF and DL is obtained in the form of phenomena and cognition DL. These logical systems are formulated in the most general terms: the relationship of generality, uncertainty, simplicity; the problem of maximizing similarity with empirical content; the learning method. Perlovsky L. I. notes that presently available neural networks are too simple to explain human cognition, and the NMF seek not only to improve methods for solving practical problems, but also to take a step towards explaining cognition.
The structure of NMF corresponds to our knowledge of the neural organization of the brain. The versatility of cognition and the wonder of the human brain are explained in part by dynamic logic. With standard Aristotelian (exact/"crisp") logic we encounter a lack of adaptability and a combinatorial explosion/complexity of computation (in particular, the "curse of dimensionality" as well).
Logic, according to Aristotle, is a tool for expressing previously made decisions, not a mechanism of cognition, i.e. making new decisions.
It is this multifaceted mechanism of NMF/DL and its underlying theory that we propose to be applied to various tasks related to the processing and extraction of information from various digital video data. The following is an overview of current research in the field of intelligent methods of extracting information from various digital video data with an analysis of the problems faced by researchers.
At the same time, to even more clearly determine the relevance of the proposed method for the development of solutions to current problems, we present the prospects of using the method based on the NMF/DL to solve such problems.
Thus, [5] describes the different methods used in biometrics and lists the works with indication of used methods, features, and data sources. Among the open problems mentioned in [5] are statistical modeling and mathematical analysis of "soft" biometrics, including the processing of big data (for example, over 1.2 billion subjects in the Indian project UIDAI) and improving the accuracy and performance of such an algorithm. It is in fact the NMF/DL method that is capable of working quickly with big data without losing accuracy. In addition, it allows one to create statistical models.
Work [6] addresses the problem of recognition of road signs and presents a table of works with indication of used methods, signs, and an assessment of recognition success, as well as the authors' own methods. This topic is an important part of one of the most important challenges of the 21st century: autonomous driving. The topic is also important for the inventory of signs on the country's roads and the tracking of their condition [6][7][8]. As with biometrics, there are certain obstacles to effective recognition: sign variability, lighting, shadow, orientation, etc. The authors of [6] work with a test base of 11000 signs. The NMF/DL method, should it have a dozen such bases, could help solve the problem of processing in an acceptable timeframe, and this would increase the accuracy along with performance. Moreover, part of the task of autonomous driving is simultaneous detection of a sign in real time (and other objects: pedestrians, other cars, trees, traffic lights, etc.), its recognition, tracking, combining new information with information about all objects and information from all other sensors of the car, and finally, on the basis of all this, to give the car appropriate instructions. In addition to the large amount of data as such, the difficulty lies in the fact that all of this data must be linked together. The "associative" aspect of the NMF/DL method will solve this problem. A close cousin of autonomous driving is UAV (unmanned aerial vehicles) autonomous flight, for which it is also important to quickly recognize objects and landscape and make decisions. In addition to military applications, UAVs are also useful for scanning cities from a height to create their 3D models. In [7] the authors deal with this The DL algorithm is able to solve both problems (LIDAR and 2D images from cameras). Then it would be possible to compare the accuracy of the two approaches, and, in the case of a clear victory for LIDAR, use LIDAR instead of cameras, as DL will allow to do so effectively, despite the high volume of data. In [7] the authors point to the instability of existing UAVs, which results in images being blurred and poorly structured. The "associative" aspect of the NMF/DL method will solve this problem as well. In addition to photo and video recording of road signs and entire cities, civil and military engineers also record roadways, buildings, factories, bridges and other infrastructure to monitor their condition and identify defects: cracks, breakdowns, vibrations, etc. Photo and video recording allow to more massively monitor the condition of objects, as well as to avoid risking lives of climbers who have to climb bridges, tall buildings, etc. Scientists have developed many approaches to the processing of such digital video data. A detailed review of the approaches is given in [8]: convolutional neural networks, deep learning, etc. Work [8] also describes the difficulties and challenges for scientists in this field, including the fact that a person still copes with the task of recognizing objects better than a computer program, thanks to knowledge of the situation context, correct identification of a structure's damaged part, knowledge of its importance in the design, etc. We propose linking all these aspects together using the NMF/DL method.
While [5] addresses biometric markers of the body such as the face, moles and tattoos, in [9] the authors cite their own method of processing the face in the context of medico-emotional well-being based on principal component analysis, a deep belief network and local direction-based robust features. Article [10] proposes a method, and article [11] an overview of the topic of recognizing abnormal crowd behavior. Such technology can help the police to recognize upcoming riots on city streets. Article [12] considers the recognition of human actions and suggests methods. This technology will be useful in airports and railway stations, where it is necessary to track individual actions of various individuals, and not the whole crowd. It will also find use in medicine and sports. The NMF/DL method has great prospects in improving the efficiency of solving such problems. The huge volume of unstructured video data from a plethora of cameras at railway stations and airports requires a radically new approach to their processing. This issue is touched upon in [13]. The NMF/DL method can potentially cope with this due to its performance. In [14] the issue of extraction of useful information from various data of past floods is discussed. In [15] the authors propose a way to significantly reduce the amount of data: to extract planes from it, since buildings and many objects often consist of them. In addition to planes, other features can be extracted; more on this in [16]. Work [17] develops a method for extracting text from video.
As stated in all aforementioned publications, any digital photo and video data typically contain different kinds of noise and clutter: leaves on the road, raindrops on a street sign, dust in the air, reflections, etc. In the NMF/DL method it is possible to specify a noise/clutter model which would take upon itself the corresponding noise points, such that the remainder of the points can be "cleanly" associated with the main models. Works [18][19][20]

Methods and Materials
Data from several sources were used. Outside scans of a   [1][2][3]. We used the clustering-specific version (also developed by Perlovsky) as opposed to the general one. The following Gaussian likelihood measure for conditional similarities was used: clusters. In the end, it means each point in the cloud will have been clustered twice by two separate instances of the DL algorithm. This method (we call it "5x5") not only gives much more sound results, but also ends up in faster calculation time. It also allows for much larger data sets and many more models M to be assigned, since in the 1 st computation instance the computer has to handle 5x less models compared to the "in one go" method, and in the 2 nd instance, in addition to handling 5x less models, it also handles 5x less input data in each computational branch of the instance.

Results and Discussion
Our experiment focuses on clustering in just the 3 spatial coordinates because color and intensity from a laser scan can have too great a negative influence on the final clustering. A demonstration of this is below in (Figure 1). For the original shots we use a color scale for the intensity parameter (not for RGB color). The coloring in the resulting shot shows clustering (i.e. each color corresponds to its own cluster). As seen, intensity of a point depends a lot on angle of incidence, which is useful for discerning a façade from a roof, but is bad when a façade wall is round, or in the case of the car door, where the lower and upper halves, which are at slightly different angles, give off different intensities of the reflected laser beam. The clustering that results when intensity differences are carried over into the models is unsatisfactory. It also appears that reflected beam intensity depends on the material the beam hits, as seen on the car wheels: the rubber tire has one intensity, while the metal hub has a different intensity. This does not result in a meaningful clustering.

Figure 1.
Furthermore, we can see that where different intensities are mixed together chaotically, as they are on the field, the algorithm has given a similarly noisy intertwining of clusters. This may find use (in conjunction with the aforementioned dependency on the material of an object) in a crop field, where different types of crops must be differentiated from each other or from parasites or from the ground, but we have not performed such testing here, instead focusing on objects of infrastructure. It may make sense, however, to try such testing. Hence, as mentioned, we mostly continue applying our clustering algorithm to only the 3 spatial dimensions.
The main objective is to break down an enormous point cloud into useful subsets. After that, it is entirely possible to run further instances of the algorithm on the obtained clustering result in more than just the 3 spatial dimensions, i.e. include color or intensity if it would seem to give a better result, or to altogether run a different type of algorithm on these subsets, such as one of object/pattern recognition.
In (Figure 2) below we demonstrate results of clustering the European village scan from Semantic3D into 25 clusters plus 1 noise cluster using the 5x5 method described in "Methods and materials", and compare it to the original "25 clusters (plus 1 noise cluster) in 1 instance" method. Noise cluster is always colored light gray. Several successes and mishaps can be seen. The bottom screenshot, when compared to the 1st one, shows that the 5x5 method generally results in a slightly sounder clustering, mainly manifested in objects not being broken up into random pieces as often. On the other hand, even in the final screenshot we still see parts of trees and buildings clustered together. Furthermore, we see one such cluster neighboring another such cluster containing parts of the same trees and buildings, and these 2 clusters are split by an approximately horizontal plane for no apparent reason. Granted, the algorithm is only one of clustering and not classification, hence it is not, in its bare form, smart enough to discern between objects.
It is important to mention the enormous effect that the positioning of the LiDAR scanner has on the clustering. In the final screenshot in (Figure 2) we see a green cluster in the middle -a roughly round patch of ground, no different than the ground around it. In the middle of it is where the LiDAR scanner was positioned.
Hence, nearby it there is a much higher density of points than further away. The difference in density within the image has a large effect on the final result, since cluster weights depend on the relative number of points in them. This is why we see an accumulation of small-volume clusters near the scanner position, with largervolume clusters farther away from the scanner. To resolve this, a density readjustment subfunction could be implemented within the algorithm, or, alternatively, an initial down sampling (or a voxeling) could be performed in input data pre-processing before implementing DL. Also, if done correctly, working simultaneously on several scans from different positions could improve the result.
Of the aforementioned, down sampling was tried, as was removing the ground, as well as a combination of these two approaches.
Surprisingly, none of these methods gave better results, though down sampling helped reduce the concentration of small-volume, high-density clusters near the scanner position.  Also, since the algorithm involves gradient ascent, it is possible that the algorithm stops at a local maximum instead of a global one, though this is generally avoided by the vague-to-crisp aspect of the DL algorithm. Nonetheless, we tested this phenomenon by running the algorithm 50 times with randomly initiated cluster centers each time and picking the result with the greatest value of the likelihood function. This did not lead to a noticeably better result. Our DL algorithm was also tested on a subset (1 room) of ISPRS data (Case Study 5). The LiDAR data did not include an RGB reading -only intensity, which we ignored. Below in (Figure 3) we demonstrate the results from 2 different POVs and using the same 2 methods as before. As before, noise is colored gray. The most striking thing about these results is how well the tabletops and several other entities are separated from other objects, even if the tabletops themselves are split into several clusters. A clearer demonstration of this is below in (Figure 4)   This may be attributable to the use of a GPU, as generally without a GPU (using standard RAM) the computation time was linear in the number of points, clusters and parameters. In fact, it is possible that calculation time with a GPU is sublinear up to a certain number of points/clusters/parameters, after which it approaches an asymptote, i.e. is linear, but this is not possible to test rigorously, since the GPU would have to be used in several full capacities.

Conclusion
This article describes the results of applying our DL-based clustering algorithm to LIDAR scans of a village street and inside an office space. The DL algorithm is able to work quickly with millions of points. All results presented here concern clustering in 3 spatial coordinates, i.e. ignoring the interfering color and intensity parameters, though these parameters can be useful in other cases, for example, in scans of agricultural fields. We use 2 clustering methods: in the first, segmentation is carried out simultaneously for 25 models (plus noise) in 1 instance (run) of the algorithm; the second (the so-called "5x5 method") also clusters into 25 models (plus noise); however, it does so in 2 consecutive instances of the algorithm, where the 1 st run performs an initial segmentation into 5 clusters (plus noise), and in the 2nd -each of the 5 initial nonnoise clusters is divided into a further 5. The "5x5" method with the original Bildstein image allows clustering to be performed 2.7 times faster (29.7 million points in about 8 minutes), and the result is not worse than the one obtained by the 1st (basic) method.
Reducing the number of iterations (and therefore the calculation time) by half reduces the shifted log-likelihood by 4%.
The best results are obtained when clustering the office space scan (ISPRS). The high-quality segmentation of the countertop and dividers stands out particularly. In the case of the village scan, we see satisfactory results. DL-clustering divides the space both into clusters that make sense and those that make less sense.
To improve the result, we tried down sampling, removing the ground, and a combination of these two approaches. However, no improvement is thence shown. We also consider the possibility that the algorithm reaches a non-global local maximum in its gradient ascent; however, these doubts are refuted with high certainty: we ran the algorithm 50 times with arbitrary initializations, and all results gave approximately the same final likelihood.