Potential role of artificial intelligence in craniofacial surgery
Article information
Abstract
The field of artificial intelligence (AI) is rapidly advancing, and AI models are increasingly applied in the medical field, especially in medical imaging, pathology, natural language processing, and biosignal analysis. On the basis of these advances, telemedicine, which allows people to receive medical services outside of hospitals or clinics, is also developing in many countries. The mechanisms of deep learning used in medical AI include convolutional neural networks, residual neural networks, and generative adversarial networks. Herein, we investigate the possibility of using these AI methods in the field of craniofacial surgery, with potential applications including craniofacial trauma, congenital anomalies, and cosmetic surgery.
INTRODUCTION
Many new technologies, such as artificial intelligence (AI), three-dimensional printing, virtual and augmented reality, and robotic surgery, are being introduced to the medical field. Large amounts of data are available in the specialty of craniofacial surgery, including photographs, computed tomography (CT) images, and audio files. These data provide a basis for applying AI technology. To date, the most prominent application of AI in medicine is image recognition technology, as exemplified by AI models used in radiology and pathology. The applications of AI include image quality improvement and the identification, measurement, and classification of lesions. The performance of speech recognition and natural language processing has also substantially improved.
The US Food and Drug Administration (FDA) has approved several medical AI applications in fields including cardiology, endocrinology, radiology, neurology, internal medicine, ophthalmology, emergency medicine, and oncology; however, no AI models developed for plastic and reconstructive surgery have yet received FDA approval [1]. Nonetheless, medical AI holds considerable promise across the full range of medical specialties, including plastic surgery. In this review, we would like to investigate the potential of applying medical AI to plastic surgery, especially in the subspecialty of craniofacial surgery.
DEFINITION OF AI
To understand the definition of AI, it is essential to be familiar with the conceptual relationships among AI, machine learning, and deep learning (Fig. 1). AI is a term with a very broad meaning, extending beyond the field of engineering to include political, economic, and social considerations. Instead, machine learning refers to the implementation of AI limited to an engineering context. Machine learning encompasses various methods, including those based on artificial neural networks (ANNs). An ANN simulates human neurons and neural networks that involve hierarchical connections. The ANN concept was first proposed by McCulloch and Pitts [2] and made concrete by Rosenblatt [3], who suggested a structure, called a perceptron, that mimics the signal transduction process of human nerve cells. Hinton and Salakhutdinov [4] coined the term “deep neural network” (DNN) for a multilayer perceptron composed of several hidden layers and described the learning method of a DNN as “deep learning.” Deep learning, which refers to an ANN consisting of several deep layers,incorporates most characteristics of machine learning (Fig. 2).
Medical applications based on deep learning include image recognition, classification, detection, and segmentation; face recognition; visual tracking; video classification; speech recognition; and natural language processing [5].
CONVOLUTIONAL NEURAL NETWORKS AND CRANIOFACIAL SURGERY
With advances in deep learning technology based on DNNs and the introduction of the convolutional neural network (CNN) technology, a paradigm shift has occurred in the machine learning field dealing with image data. Before the CNN concept was presented, LeCun et al. [6] tried to classify input features by introducing the convolution concept into the multilayer perceptron structure and presented a methodology to use this concept for the handwritten number recognition problem. A CNN is an algorithm used for image pattern recognition that makes it possible to carry out the entire process from feature extraction to classification using a single model (Fig. 3). CNNs can be applied as an essential structure for detection and segmentation problems beyond simple image classification; hence, they have become a vitally important technology in deep learning studies dealing with medical images [7]. In imaging research, classification refers to determining the presence or absence of a specific object in an image, while detection involves both checking for the presence or absence of a specific object in an image and determining the object’s location. Segmentation refers to displaying the position of a specific object in an image in pixel units. Research in this domain aims to construct a CNN that derives an output image from an original image, showing a mask of the object of interest in the original input image.
Many potential applications in the field of craniofacial trauma are based on CNNs. Most craniofacial surgeons diagnose craniofacial trauma based on the patient’s history, a physical examination, and imaging findings, such as CT and radiography [8,9]. CT scans are the most important diagnostic tool for diagnosing craniofacial trauma. AI can automatically diagnose craniofacial bone fractures using CNN-based classification or detection mechanisms. The effectiveness of CNN-based imaging diagnostics for bone fractures in the extremity, spine, and hip has already been reported in several studies [10-19]. In these studies, the pretrained CNN models used were ResNet-152, DenseNet, Inception v3, U-Net, VGG_16, VGG_19, Networkin- Network, VGG CNN’s Network, and BVLC Reference CaffeNet. The mean accuracy was 90.08% (range, 83%–98%), and the mean area under the curve was 0.98 (range, 0.95–1.00). Classification, detection, and segmentation using CNN models can be used for patients with facial bone fractures based on X-ray or facial CT images.
Interestingly, several reports have compared the medical image reading performance of AI with that of doctors. In fundus photographs from adults with diabetes, an algorithm based on deep machine learning showed high sensitivity and specificity for detecting referrable diabetic retinopathy. The performance of this algorithm was on par with the diagnostic accuracy of ophthalmologists [20]. For classifying important dermatological diseases in general skin photos or dermoscopy photos, an AI algorithm showed a similar level of accuracy to that of experienced dermatologists [21]. The performance of an AI algorithm in finding malignant pulmonary nodules in chest radiography images exceeded the diagnostic accuracy of experienced chest radiologists [22]. In addition, several studies have reported that the image reading performance of AI algorithms was superior or similar to that of doctors [23,24]. Based on these studies, if CT or X-ray images of patients with facial bone fractures are trained well enough, it may be possible to implement an AI algorithm with similar image reading performance to that of craniofacial surgeons.
With regard to orthognathic surgery, AI algorithms can be applied to maxillofacial imagery, treatment planning, custom orthodontics, surgical appliances, and treatment follow-up. Software using AI to perform automatic tracing through deep learning of cephalograms already exists [25]. At present, attempts are being made to use machine learning to perform surgical planning and automatically produce CAM/CAD surgical appliances.
CNNs can also be applied to vascular diseases in the craniofacial field. Several studies have reported that AI algorithms showed excellent performance in automatic segmentation of areas affected by intracranial hemorrhage using brain CT images and in measurements of hemorrhage volume [26-29]. The possibility of accurate automatic segmentation of the extent and volume of vascular tumors or malformations can be considered, even for vascular anomalies in the head and neck area [30].
RECURRENT NEURAL NETWORKS AND SPEECH AND NATURAL LANGUAGE PROCESSING: CLEFT PALATE AND VELOPHARYNGEAL INSUFFICIENCY
One might imagine an AI algorithm listening to speech input from a smartphone and delivering a quantitative estimate of improvements in articulation. With recent advances in AI for voice recognition, this fantasy is becoming a reality. Language input constitutes representative “time series” data. In this data structure, a significant amount of data is not acquired at a single moment, as is the case for medical images; instead, data are continuously acquired in a sequence over time. Therefore, the data acquired according to a time sequence possess the characteristic of having an order, which is why the term “time series” data is used. It is difficult to reflect the time series characteristics of data in training with the aforementioned machine learning algorithms or ANN structures. Therefore, a machine learning algorithm that can effectively learn time series data has been proposed. The recurrent neural network (RNN) structure has been proposed as a representative algorithm for this purpose, and various modifications have been developed and are currently widely applied in both medical and nonmedical fields. The general process of a DNN is that when one datum is input at a time, the value of that datum is transmitted only in the direction of the output layer through an operation in each layer. Operations on nodes in each layer are performed using only the values received from the previous layer. Contrary to this, in an RNN, one datum is not input only once; instead, values in units of time are sequentially input. To express the correlation between the values input at each point in time, the output value of the node is transmitted as an input to the node of the next layer. At the same time, it is copied and returned to the input of its own node at the next point in time (Fig. 4). The recurrent edge—defined as the path through which data values that are sequentially entered from nodes in the hidden layer return— is the most specific feature of RNNs.
Mundt et al. [31] analyzed the voices of patients with depression using a statistical regression model. The results suggested the possibility that voice could serve as a biomarker indicating the degree of response to treatment. Benba et al. [32] reported the possibility of voice-based diagnosis based on an analysis of the voice characteristics of patients with Parkinson’s disease through a machine learning algorithm. For children with cleft palate who have undergone palatoplasty, an algorithm could be developed using an RNN that would use spectral analysis to extract characteristic voice features from a picture pronunciation test or a sentence pronunciation test and extracting characteristic voice features. Cleft palate and velopharyngeal insufficiency are also closely related to speech [33,34]. Although currently not valid for children with cleft palate or velopharyngeal insufficiency, South Korea has a platform that uses AI algorithms to diagnose and conduct rehabilitation of speech disorders (https://www.talkytalky.kr/). This platform is currently only available in the Korean language. A different AI algorithm would be needed to expand the platform to include other languages (e.g., English, Japanese, Chinese, or Spanish).
GENERATIVE ADVERSARIAL NETWORKS AND CRANIOFACIAL SURGERY
Machine learning, which implements AI in software, refers to an algorithm in which a computer learns data, finds patterns on its own, and learns to perform appropriate tasks. Machine learning is classified as supervised, unsupervised, and reinforcement learning. The aforementioned CNNs and RNNs are supervised learning algorithms. A generative adversarial network (GAN) is a representative example of an unsupervised learning algorithm. The GAN is a regression model published by Ian Goodfellow that consists of a model responsible for classification (discriminator) and regression generation (generator) [35]. In these models, the generator and discriminator compete against each other to improve each other’s performance. This is often compared to a confrontation between police and counterfeiters. The banknote counterfeiter (generator) tries as hard as possible to deceive the police (discriminator). Meanwhile, the police try to classify counterfeit and real banknotes. This competition involves continuous learning, so that the real and counterfeit bills become indistinguishable, resulting in the production of fake bills that are also virtually indistinguishable from the real ones. For instance, the generator receives an input of “Dog” and creates a certain image. The generator learns to deceive the discriminator, such that the discriminator can output it as a real image, denoted by 1. In contrast, the discriminator alternately receives fake images created by the generator and real images that actually exist and learns to classify an image as 1 when it is real and 0 when it is fake (Fig. 5). The training of the generator and the discriminator occur alternately. The generator gradually develops into a model that can simulate the actual distribution well. The loss function of a GAN, including the meaning of this learning process, is defined as follows [35]:
D(x) is a discriminator and a value between 0 and 1; thus, D(x) yields a value of 1 if the datum is real and 0 if it is fake. The discriminator D(G(z)) has a value of 1 if the datum generated by the generator G, G(z), is judged as real and 0 if it is judged as fake. From the perspective of how the generator G learns to minimize V(D, G), log(1−D(G(z))) must be minimized to make the second term of the above equation as small as possible. Therefore, log(1−D(G(z))) must be 0, and D(G(z)) must be 1. In other words, the generator must be trained to generate fake data that are perfect enough for the discriminator to classify as genuine. Many different types of GAN structures are being developed, including cycle GAN, conditional GAN, and progressive growing of GANs [36-38]. Among them, cycle GAN has a conversion structure that cycles the output of one domain to the input of another (Fig. 6).
Cycle GAN is being studied as a method to freely convert CT and magnetic resonance (MR) images in the medical field. In one study, CT images were generated from MR images using cycle GAN [39]. In the craniofacial surgery field, CT images could be generated using this algorithm. A recent large scale population-based cohort study found that CT scan exposure of young individuals between 0 and 19 years found that CT scan exposure was associated with an increased incidence of cancer in Koreans. The incidence of many types of lymphoid, hematopoietic, and solid cancers significantly increased after CT scan exposure. Among individuals who underwent CT scans, the overall cancer incidence was higher among exposed than nonexposed individuals after adjusting for age and sex (incidence rate ratio, 1.54; 95% confidence interval, 1.45–1.63; p<0.001) [40]. Despite high radiation risks in infants or children CT images are an essential tool for the diagnosis and treatment of patients with cleft lip with or without cleft palate, alveolar cleft, craniosynostosis, and pediatric facial bone fractures. In cases where a CT scan for infants or children would normally be required, it might instead be possible to take MR images and convert them into CT images using cycle GAN.
In aesthetic surgery, postoperative photographs can be created by inputting preoperative photos because GAN is an image-generating algorithm. In particular, postoperative photos of various results can be created specific to the surgical method. GANs can also generate voices, making it possible to generate a target voice for a child who needs speech therapy, such that the child can listen to the target voice for training purposes. In the craniofacial surgery field, the recently published style GAN showed the potential to create images with cosmetic deformities or congenital anomalies [41]. The images generated by GANs are not copyrighted; thus, they can be freely used in other studies.
At this point, GANs have the limitation of it being extremely difficult to control the attributes (e.g., gender, age, and hairstyle) of the image synthesized through the generator. Moreover, the generated image quality is inconsistent. In reality, unlike the results reported in previous articles, many unnatural images are created. Nonetheless, GANs still have the potential to be used in various ways, and we think that they have a greater possibility of being used in plastic surgery than in other medical fields.
BIOSIGNAL DATA AND HEAD AND NECK RECONSTRUCTION
Biomedical signals comprise observations of the physiological activities of organisms, ranging from gene and protein sequences to neural and cardiac rhythms [42]. With the development of monitoring techniques, such as electrocardiography and electroencephalography, a vast amount of data on biosignals can be collected, and AI algorithms can be applied to the collected data. AI research using biosignals is being actively conducted on models to predict a prognosis. One study reported an algorithm for screening hyperkalemia through deep learning of electrocardiograms. Another study reported an algorithm for predicting cardiac arrest in advance through deep learning [43,44].
In head and neck reconstruction, monitoring of free flaps is performed at 1-hour intervals on postoperative day (POD) 1, at 2-hour intervals on POD 2, and at 4-hour intervals on POD 3–7 by visual examinations or using Doppler ultrasonography [45]. Reports of flap monitoring using near-infrared spectroscopy (NIRS) in the plastic surgery field have recently been presented [46-48]. The studies reported to date have described NIRS monitoring of flaps used for breast reconstruction. To the best of our knowledge, few reports have described flap monitoring using NIRS after head and neck reconstruction procedures; however, it is also possible to use NIRS to monitor free flaps used for head and neck reconstruction. An advantage of NIRS is that it can monitor muscle flaps with deep tissue (e.g., those utilized in facial palsy reconstruction); hence, it would be possible to use NIRS to monitor invisible and buried free flaps used for head and neck reconstruction. These NIRS data constitute biosignal data that can be collected in vast quantities. Accordingly, an algorithm that can predict flap failure in advance can be imagined. The authors are preparing an AI study related to flap monitoring. If we can create an algorithm that predicts flap failure in advance through deep learning, we could imagine scenarios in which surgeons receive notifications of impending flap failure, prompting them to prepare for readmission of the patient to the operating room.
LIMITATIONS OF AI IN MEDICINE
Although GAN is a type of unsupervised learning, many AI technologies are still based on supervised learning, which results in some limitations. First, for AI to perform a specific task automatically, a large amount of training data is required for technical instruction each time, and models must be individually developed using specific training data. In addition, the phenomenon of overfitting (i.e., excessive adaptation to the learning data) can occur, and when AI models are applied in real-world medical contexts, their performance is often poor. Therefore, the performance observed in the development stage is difficult to generalize to the actual medical field. Another important limitation is that AI algorithms that contain numerous variables cannot be intuitively understood by humans; thus, even if models achieve a certain level of performance, humans cannot fully understand the working principle. In the medical field, where algorithms must be applied to real people, it is difficult to embrace the use of algorithms with opaque operating principles that humans cannot comprehend. Finally, the performance of an AI system depends not only on the sophistication of its algorithm, but also on whether it learns a large amount of high-quality data. After ensuring patient safety and achieving a basic level of performance using high-quality data, the performance of a model should be improved by relearning using field data from multiple institutions.
CONCLUSION
Medical AI technology will ultimately play a key role in solving many of the problems facing healthcare. This will be possible through convergence and collaboration in various fields, including medicine, engineering, policy, and industry. Craniofacial surgeons do not need to understand and know every detail of AI technology at the level of engineers specializing in AI, but it is very important for them to understand its core characteristics and technical architecture and acquire knowledge on how to evaluate the performance and characteristics of developed technologies.
Notes
Conflict of interest
Kang Young Choi is an editorial board member of the journal but was not involved in the peer reviewer selection, evaluation, or decision process of this article. No other potential conflicts of interest relevant to this article were reported.
Author contribution
Conceptualization: Jeong Yeop Ryu. Data curation: Jeong Yeop Ryu. Formal analysis: Jeong Yeop Ryu. Methodology: Jeong Yeop Ryu, Ho Yun Chung. Project administration: Jeong Yeop Ryu. Visualization: Jeong Yeop Ryu. Writing - original draft: Jeong Yeop Ryu. Writing - review & editing: Kang Young Choi. Investigation: Jeong Yeop Ryu. Resources: Ho Yun Chung. Supervision: Kang Young Choi. Validation: Jeong Yeop Ryu. All authors read and approved the final manuscript.
Abbreviations
AI
artificial intelligence
ANN
artificial neural network
CNN
convolutional neural network
CT
computed tomography
DNN
deep neural network
FDA
Food and Drug Administration
GAN
generative adversarial network
MR
magnetic resonance
NIRS
near-infrared spectroscopy
POD
postoperative day
RNN
recurrent neural network