Large language systems (LLMs) have achieved remarkable performances in various natural language processing tasks. Scientific text summarization is a particularly difficult task due to the technical nature of scientific literature. Evaluating LLMs on this particular task requires carefully formulated benchmarks and evaluation criteria. Several res