Do We Really Need a New Navigation-Noninvasive “on the Fly” Gesture-Controlled Incisionless Surgery?

Objectives: This study presents the use of our original contactless interface as a plug-in application for OsiriX-DICOM-viewer platform using a hardware sensor devicecontroller that supports hand/finger motions as input, with no hand contact, touching, or voice navigation. It would be possible to modify standard surgical parameters in the fly gesture-controlled incisionless surgical interventions.


Introduction
"The right treatment to the right patient at the right time is the simplest definition of personalized medicine of the 21 st century, all with a view to making an early diagnosis and deciding on an optimal treatment option" (D. Primorac). The best way to predict the future is to create it ourselves. Today, our understanding of the anatomy in humans could be deceived, where we can replace the true reality with the simulated reality [1][2][3] that enables precise, safer and faster diagnosis [4], as well as surgery, creating an impression of another 'external' world around the man [5,6]. Through the concept of simulated reality, almost the same environment can be achieved as in true reality. Traditionally, simulated reality is considered as an extension of virtual reality (VR), which is widely used in different telemedicine (TM) platforms, but in our work (Figure 1), we would like to elaborate the need of improving these VR tools in order to get better user experience both in preoperative virtual analysis and during the surgery. With all well-known benefits of adopting VR tools in surgeries, such as better planning, high quality data analysis and simulation, we would like to introduce our original proposal of touchless controlling VR tools in operation room (OR) to come as close as possible to the concept of simulated reality, as we have reported previously [2,5]. open source, operation system agnostic, approved for medical use and independent of hardware. Comparison with previous doctrine in human medicine clearly indicates that both preoperative/intraoperative manipulation with three-dimensional-volume rendering slices of the human anatomy per viam touchless surgical navigation system with simulation of virtual activities has become reality in the operation room. Navigating through narrow pathways in VE, we noticed that the camera could stay in the tissue [2,5], thus enabling a substantially different understanding of spatial relations (as exemplified by the nose/sinuses) between 2D and 3D images of human anatomy.
But, in view of the possible criticisms, even in amicable discussion with colleagues, our innovative "on the fly gesturecontrolled" ear, nose and throat (ENT) diagnostics and surgery could be understood as personal reflection of the standard wellknown navigation-noninvasive surgery, which requires additional citations for verification of its declaration of originality. That is why in our activities we employed the following: a. Pre-and postoperatively the most widely used Digital Imaging and Communications in Medicine (DICOM), with advanced post-processing techniques in two-dimensional (2D)/three-dimensional (3D), as well as four-dimensional (4D) navigation (OsiriX MD); b. Hardware (HD)-sensor device that supports hand and finger motions as input, thus requiring no hand contact or touching (RealSenseIntelDepthCamera and/or Leap Motion (LM Inc., San Francisco, CA, USA); and c. Our original, specially designed software (SW) that integrates LM-controller with medical imaging systems [2,5] completely different from some products already described in medical literature ( Figure 2).

Figure 2:
Preoperative and intraoperative virtual analysis of the patient's anatomy/pathologic tissue, with very clear distinction between the interpretation of gestures (LM) in 3D-VRen and VE per viam 'on the fly gesture-controlled' different parameters/ manner of movements, while navigating throughout the virtual space.
Rather than the fictional virtual world (VW), our 3D volume rendering (VRen) solution is based on real inputs based on gesture control and manner of movements, enhancing the simple VRen with the real needs according to the awareness of medical specialist.
Accordingly, using our original, specially designed SW that integrates LM-controller with medical imaging systems, invented by our information technology (IT), we can very precisely and successfully (pre-and perioperatively) 'assess' all anatomic relations within the patient's head (taken with permission of Klapan Medical Group Polyclinic, Zagreb, Croatia, EU; www.poliklinika-klapan.com). In this way, as seen previously in similar medical fields [6,7]  Data processing and visualization is done in real time enabling multiple interactions with the simulation. Additionally, we tried to substitute artificially generated sensations for the real standard daily information received by our senses, as we have reported previously [9]. We have minimized surgeon distraction, intervention time and anesthesia time while interacting with the positioning system per viam 'different types of gestures', and our special original plug-in (designed by our Bitmedix IT team; www. bitmedix.com) application for DICOM viewer; "navigation through" in the VW, with the use of LM sensor as an interface for camera positioning in 3D-virtual endoscopy (VE)/virtual surgery (VS) [1,2] integrating speech recognition as a voice command (VC) solution in an original way [5]. Our gestures provided very easily the interaction with the real-world elements, pre-and intraoperatively, per viam natural movements while performing navigation (VE/ VS) in the patient's head during the surgery/telesurgery (TS) [2] ( Figure 4). In this way, we enabled accessing surgery crucial data without any need to touch any kind of interface in order to mitigate the risk of infections in OR.

Our Contactless Diagnostics and Surgery
There is a great potential in medical utilization of virtual and augmented reality (AR), which has not yet been fully explored. In practice, the quality of the LM device in these cases enables good user experience, but the additional complexity from nonspecific platform integration results in a steep learning curve because the interface implementation is not fully natural to humans (mouse mimicking). This characteristic is also decreasing the quality of space perception in 3D-virtual rendered space, which is unacceptable in medical use, especially in the OR environment.
While these solutions are addressing the issues of the standard computer system interfaces (like mouse and keyboard) due to the potential risks of bacterial infection, they do not highlight the benefits of space perception and orientation that is provided with VR devices like LM-sensor. In our solution, we propose integration at the platform level, distinguishing the contactless interface from others.
We have found that creating the application specific functions dedicated only to the events from VR-sensor simplifies the interaction and gives more sense of control. The performance gains are nearly incomparable to the solutions with message hooking.
The first reason is that other functions from the system are not

Medical Imaging
In all our studies, we used Siemens MAGNETOM Espree 1.5T

MRI and MSCT Scanner SOMATOM Force System (Agram Special
Hospital, Zagreb, Croatia, EU; https://www.agram-bolnica.hr/ oprema/), and thanks to its advanced Total Imaging Matrix (TIM) technology which provides more detailed data because it has more high-channel coils than other comparable devices and exploits maximum signal-to-noise ratio, we are able to use this advantage in achieving more accurate modifications of standard surgical parameters and navigation during the previously described innovative navigation-noninvasive contactless gesture-controlled surgery. Generated data are of appropriate quality to be extended with additional usage of AR/VR tools. For preoperative image reconstruction and display, we use Siemens syngo applications to maximize standardization in slice positioning, and specifically syngo BLADE motion-insensitive Turbo Spin Echo sequence for motion correction.

Results
Technical and implementation details of our novel contactless hand-gesture-noninvasive applications in diagnostics and noninvasive surgery. For the application demo, it was decided to develop a plugin for the OsiriX platform that enables natural gesture interface for the VRen and VE-viewer. The plugin is written in the Swift language for the MacOS platform. To gain control over the OsiriX, the provided application interface framework is used where the functions of the OsiriX platform can be accessed via available header files. As the OsiriX is completely written in Objective C, the bridging headers are created to import the functions to Swift. Also, the Leap-SW development kit was included in the project to get functions of the hand gesture and position recognition.
The first problem encountered was that the functions of the VRview, which were needed to create natural interface were protected by the class. To resolve this issue, a wrapper class was created, extending the VR-view class and providing an interface with static methods to gain access to the OsiriX VR-view controller. Then an instance of the LM-controller was created and the callbacks responding to the device connection events were set. When the device is connected, the frame handler starts to process the information received from the device. Two different frame handlers have been developed, one for the VRen use-case and one for the VE.
The frame handlers are triggered when the right hand is present with the grab strength of 0.05. Then all the inputs are buffered to avoid over-sensitivity of the sensor.
In the VRen, the x-axis of the vector received from the sensor is translated to the azimuth of the camera, y-axis to the elevation, and z-axis to the zoom. This gives the effect of orbiting over the center of the object. Also, to get better performance, the level of details is lowered when the frame handler is triggered. When the hand exits the sensor working area, the level of details is returned to the maximum value. For the VE, the idea was to achieve a fly-through effect. Firstly, the x-and y-axes are used as the yaw and pitch of the camera. The z-axis is used to increase or decrease the total amount of moving speed. The OsiriX-application programming interface is used to get the current direction and position of the camera. The position of the camera is then translated by the total speed amount in the current direction of the camera. The same translation is also applied to the focal point of the camera.
The plugin can be compiled in the Xcode IDE on MacOS and then easily installed on the OsiriX platform. After installation, the plugin exists in OsiriX environment waiting for the device to connect. Once the device is connected, the frame handler is waiting for the VR or VE view to be opened. When the view is opened, the sensor is ready to use. This interface is not interfering with any other function of the OsiriX platform and it can be used simultaneously with the classic mouse interface.

Discussion
"There is no reality. Only our personal view on the reality", even in the operation room. If this view of the surrounding world is correct, it is justified for us as physicians to wonder whether our consciousness/awareness is thus creating a new reality or new forms of the reality in diagnostics and surgery. As the formed consciousness "defines our overall comprehension and, briefly, everything that exists, we cannot get behind it" (Max Planck). In   In line with our previous experimental studies published in the past two decades [1,10,12] and experiences reported by other authors [13][14][15], our primary objective was to establish whether this new approach in visualization of human anatomy would avoid the risks associated with real endoscopy and minimize procedural difficulties when used prior to performing an actual endoscopic examination [16]. Therefore, bearing in mind the definition of VR ("impression of being present in a virtual environment, such as virtual/tele-VE of the patient's head that does not exist in reality is called VR", we tested the possibility to derive spatial cross-sections at selected anatomic cutting planes in rhino-surgery, which would provide an additional insight into the internal regions observed [6]. The intraoperative image of sinus and oro-dental region anatomy in the mentioned patient showed the use of this new approach in surgery to be highly appropriate, even in such a simple example of the pathologic substrate in ENT surgery. Just imagine that during the operation, the rhinosurgeon can simply 'navigate' through the oroantral fistula canal (which has been inconceivable to date), perform 3D-assessment of its relation with the roots of other teeth within the maxillary bone, visualize the fistula ostium mucosa in maxillary sinus and its relation to the rest of sinus mucosa, 'enter' the cyst and observe its extent, composition of its content, assess the medically justified degree and range of pathologic cystic tissue removal while preserving the healthy and other vital anatomic elements of the oroantral region of the patient's head. Performing the multiply repeatable different forms of virtual diagnostics very fast, followed by VE and finally VS, offering the possibility of repeating them endlessly, we can 'copy' the entire operative procedure conceived as performed in the VW of the patient's head anatomy, which will then be 'copied' in the same way in the future real operation on the real patient.
All this must be possible to perform without interrupting the real operative procedure, just by the surgeon's view, without moving his sight from the endoscope inserted into the body of the patient lying on the operative table, simply by a few free motions of his hand, without touching any surface in the OR.
Our experience shows that all this can be done easily during the operative procedure, even with some preoperative analysis if necessary (Figures 4 & 7) by use of the navigation noninvasive 'on the fly' gesture-controlled incisionless surgical interventions (e.g., based on our original Apple-based Osirix-LM system). Thus, we can state with certainty that the use of this system is highly desirable for enabling contactless 'in the air' surgeon's commands, precise orientation in space, as well as the possibility of directing real patient operation by 'copying' the previously performed VS (with or without navigation with 3D-digitalizer/'robotic hand') in the nonexisting surgical world (Figure 8).  According to our long-standing use of this and similar approaches in clinical practice (diagnostics and surgery/TS) [12,17,18] and experiences reported by other authors [19][20][21][22], we have realized that it is quite simple to enable the animated image of the course of surgery/telesurgery be created in the form of navigation, i.e. the real patient operative field fly-through, as it has been done from the very beginning (since 1998) in our computer assisted-TSs [6] ( Figure 9). However, do physicians really believe that improvement of the accuracy of 3D-models generated from 2D-medical images is of greatest importance for the sustainable development of AR and VR in OR? Additionally, one can conclude from "the standard surgical experience" that in terms of future training and surgical practice, information technology developments in OR might include a VS within the same guide user interface using the personal computer-mouse as the scalpel or surgical knife, as commented in some papers [23] (without discussion about sterility in the OR?).
Of course, the previously mentioned possible criticisms of the strategic planning process of our newest ENT-contactless surgery have already been discussed [5,11,24]. As we have demonstrated, the navigation-OsiriX-LM suggests that real and virtual objects definitely need to be integrated by use of real 'in the air' control with simulation of virtual activities, which requires real-time visualization of 3D-VE motions [5,11], following the action of the surgeon that may be moving in the VR-area2. The rules of behavior in this imaginary world are very precisely and simply defined [25] per viam region of interest on 3D volume rendered MRI/MSCT slices (static and interactive dynamic 3D models). Additionally, we conclude as follows:

a.
Impeccable knowledge of the head anatomy (or other parts of the human body if another type of surgery is in place) [2] can be achieved in OR much better than just per viam c.
The real and the virtual operation fields are 'fused' in real time [5,28] with an acceptable degree of structure transparency, exclusion of particular structure visualization from the model, and model magnification and diminution [26,29]; and d. Navigation-OsiriX-LM is a very helpful assistant but not a replacement system, especially for inexperienced surgeons ( Figure 10).

Figure 10:
Preoperative diagnostics/tele-diagnostics in surgery planning, with navigation through 'the route of the endoscope' in our patient with a cyst protruding from maxillary sinus through the oroantral fistula to the oral cavity. In this case, a realtime-system updated the 3D-graphic visualization, with movements of the user (sufficient visualization is always shown on the screen).
However, upon critical scientific consideration of our assessments, we must also pose the following questions:

1.
Does the 'predictability' of the future operation (no matter how surgically simple or complex) as presented above, unknown in surgery before, enable precise translation, i.e.
'mapping' of the strictly determined anatomic site in the real patient as previously detected on the virtual 3D-model?

2.
Does this innovative noninvasive surgery of the future, previously realized in the virtual nonexisting space, really enable performing the future real operation within the spatial 'error' of less than 0.5 mm (e.g., such as the simplest analysis of the accuracy of the LM-controller) [27]?

3.
What prerequisites, realizable in the future, should be realized in our 'on the fly' gesture-controlled and incisionless VS interventions for the eventual utilization to meet the most demanding requirements in the OR?

4.
What should be developed in this newly formed surgical philosophy ante finem for this approach to be easy for use in daily surgical routine, better and, in line with human nature, superior to the currently widely accepted standard navigation surgery (e.g., navigation endoscopic sinus surgery (NESS) in ENT), while being less expensive than robotic surgery?

5.
Our intention is to offer an alternative to closed SW systems for visual tracking, and we want to start an initiative to develop the SW framework that will interface with depth cameras and provide a set of standardized methods for medical applications such as hand gestures and tracking, face recognition, navigation, etc. This SW should be: a.
An open source, operation system agnostic, approved for medical use, independent of HW, and b.
In the future, part of medical Swarm intelligence (SI) decentralized, self-organized systems, in a variety of VR-fields in clinical medicine and fundamental research (motion gestures in OR and preoperative diagnosis demo) [30] and

6.
In our future plans, we will discuss additional different aspects that are currently of great interest regarding the future concept of MIS in rhinology and contactless surgery, as suggested by some of our colleagues, such as:

a.
Computational fluid dynamics [30] ("the set of digital The presented ideas for the development of the previously described system for "gesture-controlled incisionless surgical interventions" are presented in our next scientific paper entitled "Does our current technological advancement represent the future in innovative contactless noninvasive sinus surgery in rhinology? What is next?"

Conclusion
Considering the multitude of long-standing activities of our medical institution/team, implementation of novel treatment modalities, some of these invented and introduced by our team, the high level of treatment success and patient satisfaction, and above all very efficient performance, we feel free to propose some useful suggestions, as follows:

a.
Novel treatment modalities and technologies should be introduced and incorporated in medical care (navigation 3D surgery/tele-3D surgery, modification of the standard surgical procedures per viam on the fly gesture-controlled incisionless surgical interventions, robotics),

b.
The use of new procedures must be ethical, as well as c.
Cost-effective SWOT (strengths, weaknesses, opportunities, threats) analysis should be performed for any medical institution prior to using new expensive technologies,

d.
Along with appropriate business relationships, the leading officials of medical institutions should maintain a true humane relationship with their staff members,

e.
Novel telemedicine technologies should be employed whenever possible, led by the 21st century medicine basic slogan "send/exchange data, not patients",

f.
We do believe that following these criteria is the main reason for the successful medical and business performance of our, or any other, medical institution.

a.
Our original, specially designed SW integrates DICOM and HW sensor device controller that supports hand and finger motions as input

b.
Our SW would be an open source, operation system agnostic, approved for medical use and independent of HW c. This kind of contactless surgery would be easy for use in daily surgical routine, better and in line with human nature, superior to the currently widely accepted standard navigation surgery d.
We offer an alternative to closed SW systems for visual tracking, which in the future will become part of medical SI selforganized systems in a variety of VR-fields in clinical medicine