Paper

Interpreting Mechanism

[New!🎉🎉] Tracing and Dissecting How LLMs Recall Factual Knowledge for Real World Questions. ACL, 2025.

[New!🎉🎉] Interpret and Improve In-Context Learning via the Lens of Input-Label Mappings. ACL, 2025.

[New!🎉🎉] Visual Evidence Prompting Mitigates Hallucinations in Large Vision-Language Models. ACL, 2025.

Interpreting and improving large language models in arithmetic calculation. ICML (oral), 2024.

From yes-men to truth-tellers: addressing sycophancy in large language models with pinpoint tuning. ICML, 2024.

Enhancing Multiple Dimensions of Trustworthiness in LLMs via Sparse Activation Control. NeurIPS, 2024.


Model Enhancing

[New!🎉🎉] NeuronMerge: Merging Models via Functional Neuron Groups. ACL, 2025.

Leveraging Submodule Linearity Enhances Task Arithmetic Performance in LLMs. ICLR, 2025.


Structurization

Sac-kg: Exploiting large language models as skilled automatic constructors for domain knowledge graphs. ACL, 2024.

Knowledge graph finetuning enhances knowledge manipulation in large language models. ICLR, 2025.