In recent years, a field of scientific research (Science of Science, SciSci), has taken hold, which by using digital data on academic inputs and outputs aims to understand the structure and modes of evolution of science. At the same time, also recently, the evolution of the branch of Artificial Intelligence in the field of text generation has provided many useful tools for developing new technologies in scientific research. Thus, these two branches of research find a common background in this thesis work, as some Artificial Intelligence tools can be used by SciSci to be able to pursue its research. Underlying SciSci’s study are also papers found in the scientific literature. Thus, one of the key questions is precisely how to create metrics by which one article can be evaluated against another. This work studied the possible use of GPT- 3 by SciSci was studied. GPT-3 is a tool developed in 2021 by Open-AI and is part of the Generative Pre-Trained Transformers family. In the first part of this thesis, we made a preliminary study of how the algorithm theoretically works in the context of scientific texts. A general understanding of the Output generated by the algorithm was necessary to evaluate its possible use. The analysis focused mainly on the parameters that can be handled during the request to the algorithm. These parameters may or may not go to affect the goodness of the final output produced by the algorithm. So, at first, we explored the entire parameter variation space and then studied the variation of the output generated as the parameters vary, always remaining within the framework of scientific texts. The second part required a Data-Base of scientific papers found in the literature. The local Data-Base was built using the API of Open-Alex, an online Data-Base. A collection of papers found in the scientific literature made it possible to compare them with the text generated by GPT-3. The final comparison was done by applying two text evaluation metrics that allowed us to have distributions on which to apply nonparametric statistical tests. Applied statistical tests led us to the conclusion that some choices have no the difference in the results obtained by studying the Output of GPT-3.

L'IA può generare testi scientifici? Un'analisi di testi scientifici generati dall'IA

PERRUZZA, MARIA ELENA
2021/2022

Abstract

In recent years, a field of scientific research (Science of Science, SciSci), has taken hold, which by using digital data on academic inputs and outputs aims to understand the structure and modes of evolution of science. At the same time, also recently, the evolution of the branch of Artificial Intelligence in the field of text generation has provided many useful tools for developing new technologies in scientific research. Thus, these two branches of research find a common background in this thesis work, as some Artificial Intelligence tools can be used by SciSci to be able to pursue its research. Underlying SciSci’s study are also papers found in the scientific literature. Thus, one of the key questions is precisely how to create metrics by which one article can be evaluated against another. This work studied the possible use of GPT- 3 by SciSci was studied. GPT-3 is a tool developed in 2021 by Open-AI and is part of the Generative Pre-Trained Transformers family. In the first part of this thesis, we made a preliminary study of how the algorithm theoretically works in the context of scientific texts. A general understanding of the Output generated by the algorithm was necessary to evaluate its possible use. The analysis focused mainly on the parameters that can be handled during the request to the algorithm. These parameters may or may not go to affect the goodness of the final output produced by the algorithm. So, at first, we explored the entire parameter variation space and then studied the variation of the output generated as the parameters vary, always remaining within the framework of scientific texts. The second part required a Data-Base of scientific papers found in the literature. The local Data-Base was built using the API of Open-Alex, an online Data-Base. A collection of papers found in the scientific literature made it possible to compare them with the text generated by GPT-3. The final comparison was done by applying two text evaluation metrics that allowed us to have distributions on which to apply nonparametric statistical tests. Applied statistical tests led us to the conclusion that some choices have no the difference in the results obtained by studying the Output of GPT-3.
ENG
IMPORT DA TESIONLINE
File in questo prodotto:
File Dimensione Formato  
956546_thesis_mep.pdf

non disponibili

Tipologia: Altro materiale allegato
Dimensione 5.26 MB
Formato Adobe PDF
5.26 MB Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14240/52535