publications

publications by categories in reversed chronological order.

2025

  1. COLING 2025
    How Transliterations Improve Crosslingual Alignment
    Yihong Liu, Mingyang Wang,  Amir Hossein Kargaran, Ayyoob Imani, Orgest Xhelili, Haotian Ye, Chunlan Ma, François Yvon, and Hinrich SchĂŒtze
    In Proceedings of the 31st International Conference on Computational Linguistics 2025
  2. arXiv 2025
    On Relation-Specific Neurons in Large Language Models
    Yihong Liu, Runsheng Chen, Lea Hirlimann, Ahmad Dawar Hakimi, Mingyang Wang,  Amir Hossein Kargaran, Sascha Rothe, François Yvon, and Hinrich SchĂŒtze
    arXiv preprint 2025
  3. arXiv 2025
    Tracing Multilingual Factual Knowledge Acquisition in Pretraining
    Yihong Liu, Mingyang Wang,  Amir Hossein Kargaran, Felicia Körner, Ercong Nie, Barbara Plank, François Yvon, and Hinrich SchĂŒtze
    arXiv preprint 2025
  4. COLM 2025
    FineWeb2: One Pipeline to Scale Them All–Adapting Pre-Training Data Processing to Every Language
    Guilherme Penedo, Hynek Kydlı́ček, Vinko Sabolčec, Bettina Messmer, Negar Foroutan,  Amir Hossein Kargaran, Colin Raffel, Martin Jaggi, Leandro Von Werra, and Thomas Wolf
    In Second Conference on Language Modeling 2025
  5. ACL 2025
    How Programming Concepts and Neurons Are Shared in Code Language Models
    Amir Hossein Kargaran, Yihong Liu, François Yvon, and Hinrich SchĂŒtze
    In Findings of the Association for Computational Linguistics 2025
  6. ACL 2025
    MEXA: Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment
    Amir Hossein Kargaran, Ali Modarressi, Nafiseh Nikeghbal, Jana Diesner, François Yvon, and Hinrich SchĂŒtze
    In Findings of the Association for Computational Linguistics 2025

2024

  1. NeurIPS 2024
    GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages
    Amir Hossein Kargaran, François Yvon, and Hinrich SchĂŒtze
    2024
  2. ACL 2024
    MaskLID: Code-Switching Language Identification through Iterative Masking
    Amir Hossein Kargaran, François Yvon, and Hinrich SchĂŒtze
    In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) 2024
  3. LREC 2024
    GlotScript: A Resource and Tool for Low Resource Writing System Identification
    Amir Hossein Kargaran, François Yvon, and Hinrich SchĂŒtze
    The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation 2024
  4. MSR 2024
    GIRT-Model: Automated Generation of Issue Report Templates
    Nafiseh Nikeghbal,  Amir Hossein Kargaran, and Abbas Heydarnoori
    In 21st IEEE/ACM International Conference on Mining Software Repositories (MSR) Apr 2024

2023

  1. EMNLP 2023
    GlotLID: Language Identification for Low-Resource Languages
    Amir Hossein Kargaran, Ayyoob Imani, François Yvon, and Hinrich SchĂŒtze
    In The 2023 Conference on Empirical Methods in Natural Language Processing Apr 2023
  2. ACL 2023
    Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages
    Ayyoob ImaniGooghari, Peiqin Lin,  Amir Hossein Kargaran, Silvia Severini, Masoud Jalili Sabet, Nora Kassner, Chunlan Ma, Helmut Schmid, AndrĂ© Martins, François Yvon, and Hinrich SchĂŒtze
    In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) Apr 2023
  3. MSR 2023
    GIRT-Data: Sampling GitHub Issue Report Templates
    Nafiseh Nikeghbal,  Amir Hossein Kargaran, Abbas Heydarnoori, and Hinrich SchĂŒtze
    In IEEE/ACM 20th International Conference on Mining Software Repositories (MSR) Apr 2023
  4. arXiv 2023
    MenuCraft: Interactive Menu System Design with Large Language Models
    Amir Hossein Kargaran, Nafiseh Nikeghbal, Abbas Heydarnoori, and Hinrich SchĂŒtze
    arXiv preprint Apr 2023

2022

  1. AACL 2022
    Hengam: An Adversarially Trained Transformer for Persian Temporal Tagging
    Sajad Mirzababaei,  Amir Hossein Kargaran, Hinrich SchĂŒtze, and Ehsaneddin Asgari
    In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) Apr 2022

2021

  1. WEBSCI 2021
    Wide-AdGraph: Detecting Ad Trackers with a Wide Dependency Chain Graph
    Amir Hossein Kargaran, Mohammad Sadegh Akhondzadeh, Mohammad Reza Heidarpour, Mohammad Hossein Manshaei, Kave Salamatian, and Masoud Nejad Sattary
    In 13th ACM Web Science Conference Apr 2021
  2. ADCHEM 2021
    Analytical Derivation and Comparison of Alarm Similarity Measures
    Amir Hossein Kargaran, Amir Neshastegaran, Iman Izadi, and Ehsan Yazdian
    IFAC-PapersOnLine Apr 2021