Towards Clinical Application of Artificial Intelligence in Ultrasound Imaging
Introduction
Ultrasound (US) imaging is superior to other medical imaging modalities in termsof its convenience, non-invasiveness, and real-time properties. In contrast, computedtomography (CT) has a risk of radiation exposure, and magnetic resonance imaging (MRI) isnon-invasive but costly and time-consuming. Therefore, US imaging is commonly used forscreening as well as definitive diagnosis in numerous medical fields [1]. Current advancesin image rendering technologies and the miniaturization of ultrasonic diagnostic equipmenthave led to its use in point-of-care testing in emergency medical care, palliative care, andhome medical care [2]. It is worth considering the combination of US diagnostic capabilitiesand laboratory tests as the multi-biomarker strategy for prediction of clinical outcome [3].However, US imaging exhibits characteristic issues relating to image quality control. InCT and MRI, image acquisition is performed automatically with a specific patient, a fixedmeasurement time, and consistent image settings. On the other hand, US imaging isacquired through manual sweep scanning; thus, its image quality is dependent on the skill levels of the examiners [4]. Furthermore, acoustic shadows owing to obstructions such asbones affect the image quality and diagnostic accuracy [5]. Certain US diagnostic supporttechnologies are required to resolve these practical difficulties that arise in normalizingsweep scanning techniques and image quality.
In recent years, artificial intelligence (AI), which includes machine learning anddeep learning, has been developing rapidly, and AI is increasingly being adopted inmedical research and applications [6–16]. Deep learning is a leading subset of machinelearning, which is defined by non-programmed learning from a large amount of datawith convolutional neural networks (CNNs) [17]. Such state-of-the-art technologies offerthe potential to achieve tasks more rapidly and accurately than humans in particularareas such as imaging and pattern recognition [18–20]. In particular, medical imaginganalysis is compatible with AI, where classification, detection, and segmentation usedas the fundamental tasks in AI-based imaging analyses [21–23]. Furthermore, many AIpowered medical devices have been approved by the Food and Drug Administration (FDA)in the United States [24,25].
The abovementioned clinical issues have affected and slowed the progress of medicalAI research and development in US imaging compared to other modalities [26,27]. Table1shows the AI-powered medical devices for US imaging that have been approved by theFDA as of April 2021 (https://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfpmn/pmn.cfm, the access date was 10 May 2021) (Table1). Deep learning requires the availabilityof sufficient datasets on both normal and abnormal subjects for different diseases inhigh-quality controls. It is necessary to assess the input data quality and to accumulaterobust technologies, including effective data structuring and algorithm development,to facilitate the clinical implementation of AI devices. Another concern is the AI blackbox problem, whereby the decision-making process of the manner in which complicatedsynaptic weighting is performed in the hidden layers of CNNs is unclear [28]. Examinersneed to understand and explain the rationale for diagnosis to patients objectively forobtaining informed consent in constructing valid AI-based US diagnostic technologies inclinical practice.
This review introduces the current efforts and trends of medical AI research in USimaging. Moreover, future perspectives are discussed to establish the clinical applicationsof AI for US diagnostic support.
US Image Preprocessing
US imaging typically exhibits low spatial resolution and numerous artifacts owingto ultrasonic diffraction. These characteristics affect not only the US examination anddiagnosis but also AI-based image processing and recognition. Therefore, several methodshave been proposed for US image preprocessing which eliminates noises that are obstaclesto accurate feature extraction before US image processing. In this session, we present tworepresentative methods: US image quality improvement and acoustic shadow detection.
Firstly, various techniques have been developed for US image quality improvementat the time of image data acquisition by reducing speckle, clutter, and other artifacts [29].Real-time spatial compound imaging using ultrasonic beam steering of a transducer arrayto acquire several multiangle scans of an object has been presented [30]. Furthermore,harmonic imaging using endogenously generated low frequency to reduce the attenuation and improve the image contrast was proposed [31]. Several methods for US imageenhancement using traditional image processing have been reported [32]. Despecklingis the representative research subject on filtering or removing punctate artifacts in USimaging [33]. In this method, the cause of the image quality degradation is eliminatedduring the US image generation phase or the noise characteristics are modeled along withthe US image generation process following close examination. Current approaches for USimage quality improvement using machine learning or deep learning include methods forimproving the despeckling performance [34,35], and enhancing the overall image quality [36]. Such data-driven methods offer the significant advantage that it is not necessaryto create a model for each domain. However, substantial training data with targeted highquality are required to improve the US image quality, and because the preparation of sucha dataset is generally difficult, critical issues arise in clinical application.
Secondly, acoustic shadow detection is also a well-known US image preprocessingmethod. An acoustic shadow is one of the most representative artifacts, which is causedby several reflectors blocking the ultrasonic beams with rectilinear propagation froma transducer. Useful artifacts exist, such as the comet-tail artifact (B-line), which mayprovide diagnostic clues for COVID-19 infection in point-of-care lung US [37]. However,acoustic shadows are depicted in black with missing information in that region, andobstruct the examination and AI-based image recognition of the target organs in USimaging. Therefore, performing acoustic shadow detection prior to US imaging analysismay enable a judgment to be made on whether an acquired image is suitable as theinput data. Traditional image processing methods for acoustic shadow detection includeautomatic geometrical and statistical methods using rupture detection of the brightnessvalue along the scanning line [38], and random walk-based approaches [39,40]. In thesemethods, the parameters and models need to be carefully changed in response to a domainshift. However, deep learning-based methods can be applied to a wider range of domains.The preparation of the training dataset remains challenging as the pixel-level annotationof acoustic shadows is highly costly and difficult owing to their translucency and blurredboundaries. Meng et al. employed weakly supervised estimation of confidence maps usinglabels for each image with or without acoustic shadows [41,42]. Yasutomi et al. proposed asemi-supervised approach for integrating domain knowledge into a data-driven modelusing the pseudo-labeling of plausible synthetic shadows that were superimposed onto USimaging (Figure1) [43].
3. Algorithms for US Imaging Analysis
In this section, we briefly present the fundamental machine learning algorithms for USimaging, along with other medical imaging modalities. Thereafter, we focus on specializedalgorithms for US imaging analysis to overcome the noisy artifacts as well as the instabilityof the viewpoint and cross-section owing to manual operation.
Classification, detection, and segmentation have generally been used as the fundamental algorithms in US imaging analysis (Figure2). Classification estimates one or morelabels for the entire image, and it has typically been used to seek the standard scanningplanes for screening or diagnosis in US imaging analysis. ResNet [44] and Visual GeometryGroup (VGG) [45] are examples of classification methods. Detection is mainly used to estimate lesions and anatomical structures. YOLO [46] and the single-shot multibox detector(SSD) [47] are popular detection algorithms. Segmentation is used for the further precisemeasurement of lesions and organ structures in pixels as well as index calculations of thelengths, areas, and volumes. U-Net [48] and DeepLab [49,50] are representative algorithmsfor segmentation. These standard algorithms are often used as baselines to evaluate theperformance of specialized algorithms for US imaging analysis.
We introduce the specialized algorithms for US imaging analysis to address theperformance deterioration owing to noisy artifacts. Cropping–segmentation–calibration(CSC) [51] and the multi-frame + cylinder method (MFCY) [52] use time-series informationto reduce noisy artifacts and to perform accurate segmentation in US videos (Figure3).Deep attention networks have also been proposed for improved segmentation performancein US imaging, such as the attention-guided dual-path network [53] and a U-Net-basednetwork combining a channel attention module and VGG [54]. A contrastive learning-basedframework [55] and a framework based on the generative adversarial network (GAN) [56]with progressive learning have been reported to improve the boundary estimation in USimaging [57].
The critical issues resulting from the instability of the viewpoint and cross-sectionoften become apparent when the clinical indexes are calculated using segmentation. Onetraditional US image processing method is the reconstruction of three-dimensional (3D)volumes [58]. Direct segmentation methods for conventional 3D volumes, including3D U-Net [59], are useful for accurate volume quantification; however, their labeling isvery expensive and time-consuming. The interactive few-shot Siamese network uses aSiamese network and a recurrent neural network to perform 3D segmentation trainingfrom few-annotated two-dimensional (2D) US images [60]. Another research subject is theextraction of 2D US images involving standard scanning planes from the 3D US volume.The iterative transformation network was proposed to guide the current plane towards thelocation of the standard scanning planes in the 3D US volume [61]. Moreover, Duque et al.proposed a semi-automatic segmentation algorithm for a freehand 3D US volume, which isa continuum of 2D cross-sections, by employing an encoder–decoder architecture with 2DUS images and several 2D labels [62]. We summarize the abovementioned segmentation algorithms for US imaging analysis in Table2.
4. Medical AI Research in US Imaging
4.1. Oncology
4.1.1. Breast Cancer
Breast cancer is the most common cancer in woman globally [63]. US imaging is usedextensively for breast cancer screening in addition to mammography. Various efforts havebeen made to date regarding the classification of benign and malignant breast tumors in USimaging. Han et al. trained the CNN model architecture to differentiate between benignand malignant breast tumors [64]. The Inception model, which is a CNN model withbatch normalization, exhibited equivalent or superior diagnostic performance comparedto radiologists [65]. Byra et al. introduced a matching layer to convert grayscale USimages into RGB to leverage the discriminative power of the CNN more efficiently [66].Antropova et al. employed VGG and the support vector machine for classification usingthe CNN features and conventional computer-aided diagnosis features [67]. A mass-levelclassification method enabled the construction of an ensemble network by combining VGG and ResNet to classify a given mass using all views [68]. Considering that boththyroid and breast cancers exhibit several similar high-frequency US characteristics, Zhuet al. developed a generic VGG-based framework to classify thyroid and breast lesionsin US imaging [69]. The model that was constructed with features that were extractedfrom all three transferred models achieved the highest overall performance [70]. TheBreast Imaging Reporting and Data System (BI-RADS) provides guidance and criteriafor physicians to determine breast tumor categories based on medical images in clinicalsettings. Zhang et al. proposed a novel network that integrates the BI-RADS features intotask-oriented semi-supervised deep learning for accurate diagnosis using US images witha small training dataset [71]. Huang et al. developed the ROI-CNN (ROI identificationnetwork) and the subsequent G-CNN (tumor categorization network) to generate effectivefeatures for classifying the identified ROIs into five categories [72]. The Inception modelachieved the best performance in predicting lymph node metastasis from US images inpatients with primary breast cancer [73].
Yap et al. investigated the use of three deep learning approaches for breast lesiondetection in US imaging. The performances were evaluated on two datasets and thedifferent methods achieved the highest performance for each dataset [74]. An experimentalstudy was performed to evaluate the different CNN architectures on breast lesion detectionand classification in US imaging, in which SSD for breast lesion detection and DenseNet [75]for classification exhibited the best performance [76].
Several ingenious segmentation methods for breast lesions in US imaging have beenreported. Kumar et al. demonstrated the performance of the Multi-U-Net segmentationalgorithm for suspicious breast masses in US imaging [77]. A novel automatic tumorsegmentation method that combines a dilated fully convolutional network (FCN) witha phase-based active contour model was proposed [78]. Residual-dilated-attention-gateU-Net is based on the conventional U-Net, but the plain neural units are replaced withresidual units to enhance the edge information [79]. Vakanski et al. introduced attentionblocks into the U-Net architecture to learn feature representations that prioritize spatialregions with high saliency levels [80]. Singh et al. proposed automatic tumor segmentationin breast US images using contextual-information-aware GAN architecture. The proposedmodel achieved competitive results compared to other segmentation models in terms ofthe Dice and intersection over union metrics [81].
4.1.2. Thyroid Cancer
The incidence of thyroid cancer has been increasing globally as a result of overdiagnosis and overtreatment owing to the sensitive imaging techniques that are used forscreening [82]. A CNN with the addition of a spatial constrained layer was proposed todevelop a detection method that is suitable for papillary thyroid carcinoma in US imaging [83]. The Inception model achieved excellent diagnostic efficiency in differentiatingbetween papillary thyroid carcinomas and benign nodules in US images. It could providemore accurate diagnosis of nodules that were 0.5 to 1.0 cm in size, with microcalcificationand a taller shape [84]. Ko et al. designed CNNs that exhibited comparable diagnosticperformance to that of experienced radiologists in differentiating thyroid malignancy inUS imaging [85]. Furthermore, a fine-tuning approach based on ResNet was proposed,which outperformed VGG in terms of the classification accuracy of thyroid nodules [86]. Liet al. used CNNs for the US image classification of thyroid nodules. Their model exhibitedsimilar sensitivity and improved specificity in identifying patients with thyroid cancercompared to a group of skilled radiologists [82]
4.1.3. Ovarian Cancer
Ovarian cancer is the most lethal gynecological malignancy because it exhibits fewearly symptoms and generally presents at an advanced stage [87]. The screening methodsfor ovarian cysts using imaging techniques need to be improved to overcome the poorprognosis of ovarian cancer. Zhang et al. proposed an image diagnosis system for classify-Biomedicines2021,9, 7209 of 20ing ovarian cysts in color US images using the high-level deep features that were extractedby the fine-tuned CNN and the low-level rotation-invariant uniform local binary patternfeatures [88]. US imaging analysis using an ensemble model of CNNs demonstrated comparable diagnostic performance to human expert examiners in classifying ovarian tumorsas benign or malignant [89]
4.1.4. Prostate Cancer
Feng et al. presented a 3D CNN model to detect prostate cancer in sequential contrastenhanced US (CEUS) imaging. The framework consisted of three convolutional layers, twosub-sampling pooling layers, and one fully connected classification layer. Their methodachieved a specificity of over 91% specificity and an average accuracy of 90% over thetargeted CEUS images for prostate cancer detection [90]. A random forest-based classifierfor the multiparametric localization of prostate cancer lesions based on B-mode, shearwave elastography, and dynamic contrast-enhanced US radiomics was developed [91].A segmentation method was proposed for the clinical target volume (CTV) in the transrectal US image-guided intraoperative process for permanent prostate brachytherapy. ACNN was employed to construct the CTV shape in advance from automatically sampledpseudo-landmarks, along with an encoder–decoder CNN architecture for low-level featureextraction. This method achieved a mean accuracy of 96% and a mean surface distanceerror of 0.10 mm [92]
4.1.5. Other Cancers
Hassan et al. developed stacked sparse auto-encoder and softmax classifier architecture for US image classification of focal liver diseases into a benign cyst, hemangioma, andhepatocellular carcinoma along with the normal liver [93]. Schmauch et al. proposed adeep learning model based on ResNet for the detection and classification of focal liverlesions into the abovementioned diseases, as well as focal nodular hyperplasia and metastasis in liver US images [94]. An ensemble model of CNNs was proposed for kidney USimage classification into four classes, namely normal, cyst, stone, and tumor. This methodachieved a maximum classification accuracy of 96% in testing with quality images and 95%in testing with noisy images [95].
4.2. Cardiovascular Medicine
4.2.1. Cardiology
Echocardiography is the most common imaging modality in cardiovascular medicine,and it is frequently used for the screening as well as diagnosis and management of cardiovascular diseases [96]. Current technological innovations in echocardiography, such asthe assessments of 3D US volumes and global longitudinal strain, are remarkable. Clinicalevidence has been accumulating for the utilization of 3D echocardiography. However,3D US volume is still inferior in spatial and temporal resolutions to 2D US images. Toutilize these latest technologies, it is a prerequisite for examiners to have the skill levels ofacquiring high-quality images in 2D echocardiography. In addition, echocardiography hasbecome the primary point-of-care imaging modality for the early diagnosis of the cardiacsymptoms of COVID-19 [97,98]. Therefore, it is expected that the clinical applications of AIwill improve the diagnostic accuracy and workflow in echocardiography. To our knowledge, there is the highest number of the AI-powered medical devices for echocardiographyamong those devices which the FDA has been approved in application to US imaging.Abdi et al. developed a CNN to reduce the user variability in data acquisition byautomatically computing a score of the US image quality of the apical four-chamber viewfor examiner feedback [99]. Liao et al. proposed a quality assessment method for cardiacUS images through modeling the label uncertainty in CNNs resulting from intra-observervariability in the labeling [100]. Deep learning-based view classification has also beenreported. EchoNet could accurately identify the presence of pacemaker leads, an enlargedleft atrium, and left ventricular (LV) hypertrophy by analyzing the local cardiac structures.
In this study, the LV end systolic and diastolic volumes, and ejection fraction (EF), as well asthe systemic phenotypes of age, sex, weight, and height, were also estimated [101]. Zhanget al. proposed a deep learning-based pipeline for the fully automated analysis of cardiacUS images, including view classification, chamber segmentation, measurements of the LVstructure and function, and the detection of specific myocardial diseases [102].
The assessment of regional wall motion abnormalities (RWMAs) is an importanttesting process in echocardiography, which can localize ischemia or infarction of coronaryarteries. Strain imaging, including the speckle tracking method, has been used extensivelyto evaluate LV function in clinical practice. Ahn et al. proposed an unsupervised motiontracking framework using U-Net [103]. Kusunose et al. compared the area under the curve(AUC) obtained by several CNNs and physicians for detecting the presence of RWMAs.The CNN achieved an equivalent AUC to that of an expert, which was significantly higherthan that of resident physicians [104].
4.2.2. Angiology
Lekadir et al. proposed a CNN for extracting the optimal information to identifythe different plaque constituents from carotid US images. The results of cross-validationexperiments demonstrated a correlation of approximately 0.90 with the clinical assessmentfor the estimation of the lipid core, fibrous cap, and calcified tissue areas [105]. A deeplearning model was developed for the classification of the carotid intima-media thicknessto enable reliable early detection of atherosclerosis [106]. Araki et al. introduced anautomated segmentation system for both the near and far walls of the carotid artery usinggrayscale US morphology of the plaque for stroke risk assessment [107]. A segmentationmethod that integrated the random forest and an auto-context model could segment theplaque effectively, in combination with the features extracted from US images as wellas iteratively estimated probability maps [108]. The quantification of carotid plaquesby measuring the vessel wall volume using the boundary segmentation of the mediaadventitia (MAB) and lumen-intima (LIB) is sensitive to temporal changes in the carotidplaque burden. Zhou et al. proposed a semi-automatic segmentation method based oncarotid 3D US images using a dynamic CNN for MAB segmentation and an improvedU-Net for LIB segmentation [109]. Biswas et al. performed boundary segmentation ofthe MAB and LIB, incorporating a machine learning-based joint coefficient method forfine-tuning of the border extraction, to measure the carotid intima-media thickness fromcarotid 2D US images [110]. The application of a CNN and FCN to automated lumendetection and lumen diameter measurement was also presented [111]. The deep learningbased boundary detection and compensation technique enabled the segmentation of vesselboundaries by harnessing the CNN and wall motion compensation in the analysis ofnear-wall flow dynamics in US imaging [112]. Towards the cost-effective diagnosis of deepvein thrombosis, Kainz et al. employed a machine learning model for the detection andsegmentation of the representative veins and the prediction of their vessel compressionstatus [113]
4.3. Obstetrics
US imaging plays the most important role in medical diagnostic imaging in theobstetrics field. The non-invasiveness and real-time properties of US imaging enable fetalmorphological and functional evaluations to be performed effectively. US imaging isused for the screening of congenital diseases, the assessment of fetal development andwell-being, and the detection of obstetric complications [114]. Transvaginal US enables theclear observation of the fetus and other organs including the uterus, ovaries, and fallopiantubes, which are mainly located on the pelvic floor during the first trimester. Moreover,transabdominal US is useful for observing the fetal growth during the gestational weeks.
During fetal US imaging, numerous anatomical structures with small shapes andmovement are simultaneously observed in clinical practice. Medical AI research has beenconducted on the development of algorithms that are applicable to the US imaging analysis of the fetus or fetal appendages. Dozen et al. improved the segmentation performance ofthe ventricular septum in fetal cardiac US videos using cropped and original image information in addition to time-series information [51]. CSC can be applied to the segmentationof other organs that are small and have dynamically changing shapes with heartbeats, suchas the heart valves. Shozu et al. proposed a novel model-agnostic method to improve thesegmentation performance of the thoracic wall in fetal US videos. This method was basedon ensemble learning of the time-series information of US videos and the shape informationof the thoracic wall [52]. Medical AI research was conducted on the measurement of fetalanatomical segments in US imaging [115–118]. The scale attention pyramid deep neuralnetwork using multi-scale information could fuse local and global information to infer theskull boundaries that contained speckle noise or discontinuities. The elliptic geometric axeswere modified by a regression network to obtain the fetal head circumference, biparietal diameter, and occipitofrontal diameter more accurately [119].Kim et al.proposed a machinelearning-based method for the automatic identification of the fetal abdominal circumference [120]. The localizing region-based active contour method, which was integrated witha hybrid speckle noise-reducing technique, was implemented for the automatic extractionand calculation of the fetal femur length [121]. A computer-aided detection framework forthe automatic measurement of fetal lateral ventricles [122] and amniotic fluid volume [123]was also developed. The fully automated and real-time segmentation of the placenta from3D US volumes could potentially enable the use of the placental volume to screen for anincreased risk of pregnancy complications [124].
The acquisition of optimal US images for diagnosis in fetal US imaging is dependenton the skill levels of the examiners [4]. Therefore, it is essential to evaluate whetherthe acquired US images have a suitable cross-section for diagnosis. Furthermore, whenlabeling a huge amount of US images for AI-based image processing, it is necessaryto classify the acquired US images and to assess whether the image quality thereof issuitable for the input data. Burgos-Artizzu et al. evaluated a wide variety of CNNs for theautomatic classification of a large dataset containing over 12,400 images from 1792 patientsthat were routinely acquired during maternal-fetal US screening [125]. An automaticrecognition method using deep learning for the fetal facial standard planes, including theaxial, coronal, and sagittal planes was reported [126]. Moreover, automated partitioningand characterization on an unlabeled full-length fetal US video into 20 anatomical or activitycategories was performed [127]. A generic deep learning framework for the automaticquality control of fetal US cardiac four-chamber views [128] as well as a framework fortracking the key variables that described the contents of each frame of freehand 2D USscanning videos of a healthy fetal heart [129] were developed. Wang et al. presented adeep learning framework for differentiating operator skills during fetal US scanning usingprobe motion tracking [130].
AI-based abnormality detection and classification in fetal US imaging remain challenging owing to the wide variety and relatively low incidence of congenital diseases. Xie et al.proposed deep learning algorithms for the segmentation and classification of normal andabnormal fetal brain US images in the standard axial planes. Furthermore, they providedheat maps for lesion localization using gradient-weighted class activation mapping [131].An ensemble of neural networks, which was trained using 107,823 images from 1326 retrospective fetal cardiac US studies, could identify the recommended cardiac views as well asdistinguish between normal hearts and complex congenital heart diseases. Segmentationmodels were also proposed to calculate standard fetal cardiothoracic measurements [132].Komatsu et al. proposed the CNN-based architecture known as supervised object detectionwith normal data only (SONO) to detect 18 cardiac substructures and structural abnormalities in fetal cardiac US videos. The abnormality score was calculated using the probabilityof the cardiac substructure detection. SONO enables abnormalities to be detected basedon the difference from the correct anatomical localization of normal structures, therebyaddressing the challenge of the low incidence of congenital heart diseases. Furthermore,in our previous work, the above probabilities were visualized similar to a barcode-like timeline. This timeline was useful in terms of AI explainability when detecting cardiacstructural abnormalities in fetal cardiac US videos (Figure4) [133].
Deep learning-incorporated software improved the prediction performance of neonatal respiratory morbidity induced by respiratory distress syndrome or transient tachypneaof the newborn in fetal lung US imaging for AI-based fetal functional evaluation [134].
5. Discussion and Future Directions
In this review, we have introduced various areas of medical AI research with a focuson US imaging analysis to understand the global trends and future research subjects fromboth the clinical and basic perspectives. In addition to other medical imaging modalities, classification, detection, and segmentation are the fundamental tasks of AI-basedimage analysis. However, US imaging exhibits several issues in terms of image qualitycontrol. Thus, US image preprocessing needs to be performed and ingenious algorithmcombinations are required.
Acoustic shadow detection is the characteristic task in US imaging analysis. Althoughdeep learning-based methods can be applied to a wide range of domains, the preparation oftraining datasets remains challenging. Therefore, weakly or semi-supervised methods offerthe advantage of cost-effectiveness for labeling [41–43]. Towards the clinical application ofacoustic shadow detection methods, examiners can evaluate whether the current acquiredUS imaging is suitable for diagnosis in real time. If not, rescanning can be performed during the same examination time. This application may improve the workflow of examiners and reduce the patient burden. Several frameworks relating to specialized algorithmsfor US imaging analysis have been proposed, in which the time-series information in USvideo [51,52] or a channel attention module [53,54] have been integrated with conventionalalgorithms to overcome the performance deterioration owing to noisy artifacts. Furthermore, the AI-based analysis of 3D US volumes is expected to resolve the problem of theviewpoint and cross-section instability resulting from manual operation.
From a clinical perspective, breast cancer and cardiovascular diseases are medicalfields in which substantial research efforts in AI-based US imaging analysis have beenmade to date, resulting in more medical AI devices being approved. Considering theclinical background of these two medical fields in which US imaging is commonly used,the potential exists to develop medical AI research and technologies in obstetrics as well.However, AI-based US imaging analysis remains challenging and few medical AI devicesare available for this purpose. Therefore, deep learning-based methods that are applicable to cross-disciplinary studies and a wide range of domains need to be learned andincorporated. According to our review, several ingenious segmentation methods for targetlesions or structures in US imaging may apply to cross-disciplinary utilization amongoncology, cardiovascular medicine, and obstetrics. For example, CSC can be applied to thesegmentation of other small and deformable organs using time-series information of USvideos. Valid US diagnostic support technologies can be established in clinical practiceby accumulating AI-based US image analyses. Automated image quality assessment anddetection can lead to the development of a scanning guide and training material for examiners. Accurate volume quantification as well as the measurement of lesions and indexes canresult in an improved workflow and a reduction in examiner bias. AI-based abnormalitydetection is expected to be used for the objective evaluation of lesions or abnormalities andin preventing oversights. However, it remains challenging to prepare sufficient datasets onboth normal and abnormal subjects for the target diseases. To address the data preparationissue, it is possible to implement AI-based abnormality detection using correct anatomicallocalization and the morphologies of normal structures as a baseline [133].
Furthermore, AI explainability is key to the clinical application of AI-based US diagnostic support technologies. It is necessary for examiners to understand and explaintheir rationale for diagnosis to patients when obtaining informed consent. Class activationmapping is a popular technique for AI explainability, which enables the computation ofclass-specific heatmaps indicating the discriminative regions of the image that causedthe particular class activity of interest [135]. Zhang et al. provided an interpretation forregression saliency maps, as well as an adaptation of the perturbation-based quantitativeevaluation of explanation methods [136]. ExplainGAN is a generative model that producesvisually perceptible decision-boundary crossing transformations, which provide high-levelconceptual insights that illustrate the manner in which a model makes decisions [137]. Weproposed a barcode-like timeline to visualize the progress of the probability of substructuredetection along with sweep scanning in US videos. This technique was demonstrated tobe useful in terms of AI explainability when we detected cardiac structural abnormalitiesin fetal cardiac US videos. Moreover, the barcode-like timeline diagram is informativeand understandable, thereby enabling examiners of all skill levels to consult with expertsknowledgeably [133].
Towards the clinical application of medical AI algorithms and devices, it is importantto understand the approval processes and regulations of the US FDA, the Japan Pharmaceuticals and Medical Devices Agency, and the responsible institutions of other countries.Furthermore, knowledge of the acts on the protection of personal information and theguidelines for handling all types of medical data, including the clinical information ofpatients and medical imaging data, should be updated. Wu et al. compiled a comprehensive overview of medical AI devices that are approved by the FDA and pointed out thelimitations of the evaluation process that may mask the vulnerabilities of devices whenthey are developed on patients [25]. In the majority of evaluations, only retrospectivestudies have been performed. These authors recommended the performance evaluation of medical AI devices in multiple clinical sites, prospective studies, and post-market surveillance. Moreover, industry–academia–medicine collaboration is required to share valuableconcepts in the development of medical AI devices for patients and examiners, and itsactual use in clinical practice.
The utilization of AI and internet of things (IoT) technologies, along with advancednetworks such as 5G, will presently accelerate infrastructure development in the medicalfield, including remote medical care and regional medical cooperation. The current COVID-19 pandemic has also provided an opportunity to promote such developments. US imagingis the most common medical imaging modality in an extensive range of medical fields.However, stronger support for examiners in terms of image quality control should beconsidered. The clinical implementation of AI-based US diagnostic support technologies isexpected to correct the medical disparities between regions through examiner training orby remote diagnosis using cloud-based systems.
Author Contributions:Conceptualization, M.K. and R.H.; investigation, M.K., A.S., A.D., K.S., S.Y.and R.H.; writing—original draft preparation, M.K., A.S., A.D., K.S., S.Y. and R.H.; writing—reviewand editing, M.K., A.S., A.D., K.S., S.Y., H.M., K.A., S.K. and R.H. All authors have read and agreedto the published version of the manuscript.
Funding:This work was supported by the subsidy for Advanced Integrated Intelligence Platform(MEXT) and the commissioned projects income for the RIKEN AIP-FUJITSU Collaboration Center.
Institutional Review Board Statement:The studies were conducted according to the guidelines ofthe Declaration of Helsinki. The study for fetal ultrasound was approved by the Institutional ReviewBoard (IRB) of RIKEN, Fujitsu Ltd., Showa University, and the National Cancer Center (approval ID:Wako1 29-4). The study for adult echocardiography was approved by the IRB of RIKEN, Fujitsu Ltd.,Tokyo Medical and Dental University, and the National Cancer Center (approval ID: Wako3 2019-36).
Informed Consent Statement:The research protocol for each study was approved by the medicalethics committees of the collaborating research facilities. Data collection was conducted in an opt-outmanner in the study for fetal ultrasound. Informed consent was obtained from all subjects involvedin the study for adult echocardiography.
Data Availability Statement:Data sharing is not applicable owing to the patient privacy rights.
Acknowledgments:We would like to thank all members of the Hamamoto Laboratory, who providedvaluable advice and a comfortable research environment.
Conflicts of Interest:The authors declare no conflict of interest.