Agun, Hayri Volkan2026-02-082026-02-0820250004-37021872-7921https://doi.org/10.1016/j.artint.2025.104422https://hdl.handle.net/20.500.12885/5621In this study, active learning methods adapted for sentence selection of Turkish sentences are evaluated through language learning with neural models. Turkish is an agglutinative language with a complex morphology, where the linguistic properties of words are encoded in suffixes. The active learning methods based on regression, clustering, language models, distance metrics, and neural networks are applied to unlabeled sentence selection. In this respect, a sentence corpus is selected from a larger corpus, with the same number of samples for each target word in intrinsic and extrinsic evaluation tasks. The selected sentences are used for the training of SkipGram, CBOW, and self-attention LSTM language models and extracted embeddings are evaluated by the semantic analogy, POS and sentiment analysis tasks. The evaluation scores of the models trained on the samples selected by the active learning method are compared. The results of the selected sentences based on language models indicate an improvement over random selection based on a static vocabulary. These results also show that the selection affects the quality of unsupervised word embedding extraction even if the target vocabulary is kept the same. Along with the accuracy, the time efficiency of the language models is shown to be better than other methods especially methods based on neural network models, and distance metrics.eninfo:eu-repo/semantics/closedAccessUnsupervised active learningLanguage modelsNatural language processingUnsupervised sentence selection for creating a representative corpus in Turkish: An active learning approachArticle10.1016/j.artint.2025.104422348WOS:0015780143000022-s2.0-105016585064Q2Q1