Yaşlı Bireylerde Düşmeyi Önlemeye Yönelik YouTube Egzersiz Videolarının Değerlendirilmesi: ChatGPT-4.5 ve İnsan Uzman Karşılaştırması
Küçük Resim Yok
Tarih
2025
Dergi Başlığı
Dergi ISSN
Cilt Başlığı
Yayıncı
Geriatrik Bilimler Derneği
Erişim Hakkı
info:eu-repo/semantics/openAccess
Özet
Amaç: Bu çalışma, yaşlılara yönelik YouTube videolarının değerlendirilmesinde ChatGPT-4.5 ile uzman sonuçlarını karşılaştırarak yapay zekanın (YZ) bu alandaki potansiyelini ve sınırlarını belirlemeyi amaçlamaktadır. Gereç ve Yöntemler: YouTube üzerinden “fall prevention exercises for elderly” anahtar kelimesi ile yapılan aramada, en çok izlenen 100 video incelenmiş, kriterlere uyan 64 video çalışmaya dahil edilmiştir. Videoların kapsamlılığı, genel kalitesi [küresel kalite ölçeği (GQS)] ve güvenilirliği [Quality Criteria for Consumer Health Information (DISCERN)] iki bağımsız fizyoterapist ve ChatGPT-4.5 tarafından değerlendirilmiştir. Değerlendiriciler arasındaki uyum Wilcoxon sıralı işaret testi, sınıfiçi korelasyon katsayısı (ICC) ve Bland-Altman analizleri ile test edilmiştir. Bulgular: ChatGPT-4.5 ve insan uzmanlar arasında kapsamlılık (p=0,242) ve GQS (p=0,083) skorları açısından anlamlı farkbulunmamış, ayrıca yüksek düzeyde uyum gözlenmiştir (ICC sırasıyla; 0,932 ve 0,876). DISCERN skorlarında ise ChatGPT-4.5, insan uzmanlardan anlamlı derecede daha yüksek puanlar vermiş (p=0,005) ve uyum düzeyi mükemmel (ICC=0,942) olarak belirlenmiştir. Buna rağmen, geniş fark aralığı (uyum sınırları: -4,9 ile 7,18) saptanmıştır. Sonuç: Düşme önleyici egzersiz videolarının kapsam ve kalite düzeylerinin tespitinde ChatGPT-4.5 güvenilir bir değerlendirme aracı olarak kullanılabilir. Ancak güvenirlik skorlarında YZ’nin uzman denetiminde kullanılmasının uygun olacağı sonucuna varılmıştır.
Objective: This study aims to determine the potential and limitations of artificial intelligence (AI) in this field by comparing the results of ChatGPT-4.5 and experts in the evaluation of YouTube videos intended for older adults. Materials and Methods: A search was conducted on YouTube using the keyword “fall prevention exercises for elderly,” and the 100 most viewed videos were examined. Of these, 64 videos that met the criteria were included in the study. The comprehensiveness, quality [global quality scale (GQS)], and reliability [Quality Criteria for Consumer Health Information (DISCERN)] of the videos were evaluated by two independent physiotherapists and ChatGPT-4.5. Agreement between the evaluations was tested using Wilcoxon signed-rank test, intraclass correlation coefficient (ICC), and Bland-Altman analyses. Results: No significant differences were found between ChatGPT-4.5 and human experts in terms of comprehensiveness (p=0.242) and GQS (p=0.083) scores, and a high level of agreement was observed (ICC 0.932 and 0.876, respectively). However, in DISCERN scores, ChatGPT-4.5 awarded significantly higher scores than the human experts (p=0.005), and the level of agreement was determined to be excellent (ICC=0.942). Nevertheless, a wide range of differences (limits of agreement: -4.9 to 7.18) was identified. Conclusion: ChatGPT-4.5 can be used as a reliable assessment tool in determining the comprehensiveness and quality levels of fall prevention exercise videos. However, it was concluded that in reliability scoring, AI should be used under expert supervision.
Objective: This study aims to determine the potential and limitations of artificial intelligence (AI) in this field by comparing the results of ChatGPT-4.5 and experts in the evaluation of YouTube videos intended for older adults. Materials and Methods: A search was conducted on YouTube using the keyword “fall prevention exercises for elderly,” and the 100 most viewed videos were examined. Of these, 64 videos that met the criteria were included in the study. The comprehensiveness, quality [global quality scale (GQS)], and reliability [Quality Criteria for Consumer Health Information (DISCERN)] of the videos were evaluated by two independent physiotherapists and ChatGPT-4.5. Agreement between the evaluations was tested using Wilcoxon signed-rank test, intraclass correlation coefficient (ICC), and Bland-Altman analyses. Results: No significant differences were found between ChatGPT-4.5 and human experts in terms of comprehensiveness (p=0.242) and GQS (p=0.083) scores, and a high level of agreement was observed (ICC 0.932 and 0.876, respectively). However, in DISCERN scores, ChatGPT-4.5 awarded significantly higher scores than the human experts (p=0.005), and the level of agreement was determined to be excellent (ICC=0.942). Nevertheless, a wide range of differences (limits of agreement: -4.9 to 7.18) was identified. Conclusion: ChatGPT-4.5 can be used as a reliable assessment tool in determining the comprehensiveness and quality levels of fall prevention exercise videos. However, it was concluded that in reliability scoring, AI should be used under expert supervision.
Açıklama
Anahtar Kelimeler
Geriatrics and Gerontology, Geriatri ve Gerontoloji
Kaynak
Geriatrik Bilimler Dergisi
Geriatrik Bilimler Dergisi
Geriatrik Bilimler Dergisi
WoS Q Değeri
Scopus Q Değeri
Cilt
8
Sayı
3












