Publications
An overview of Semantic Change: Understanding the Phenomenon, Current Trends and Future Research Roadmap
This report results from over a year (2016) of discussions and knowledge sharing within the PERICLES Community of Practice on Evolving Semantics, which brought together a group of researchers and academics interested in the research and application of the area of evolving semantics. The document also touches upon recent research in the area of semantic change, starting with the role of semiotics in identifying semantic drift, to content and community change in the media-art case study, and finally to the study and detection of semantic drift in ontologies.
Darányi, S., Wittek, P., Konstantinidis, K., Papadopoulos, S., Kontopoulos, E. (2016). A Physical Metaphor to Study Semantic Drift. In Proceedings of SuCCESS-16, 1st International Workshop on Semantic Change; Evolving Semantics (Vol. 1695)
In accessibility tests for digital preservation, over time we experience drifts of localized and labelled content in statistical models of evolving semantics represented as a vector field. This articulates the need to detect, measure, interpret and model outcomes of knowledge dynamics. To this end we employ a high-performance machine learning algorithm for the training of extremely large emergent self-organizing maps for exploratory data analysis. The working hypothesis we present here is that the dynamics of semantic drifts can be modelled on a relaxed version of Newtonian mechanics called social mechanics. By using term distances as a measure of semantic relatedness vs. their PageRank values indicating social importance and applied as variable ‘term mass’, gravitation as a metaphor to express changes in the semantic content of a vector field lends a new perspective for experimentation. From ‘term gravitation’ over time, one can compute its generating potential whose fluctuations manifest modifications in pairwise term similarity vs. social importance, thereby updating Osgood’s semantic differential. The dataset examined is the public catalogue metadata of Tate Galleries, London.
Read more at the University of Borås public repository
Kontopoulos, E., Moysiadis, T., Tsagiopoulou, M., Darányi, S., Wittek, P., Papakonstantinou, N., … Kompatsiaris, I. (2016). Studying the Cohesion Evolution of Genes Related to Chronic Lymphocytic Leukemia Using Semantic Similarity in Gene Ontology and Self-Organizing Maps. In Proceedings of SWAT4LS-16, 9th International Conference on Semantic Web Applications and Tools for Life Sciences
A significant body of work on biomedical text mining is aimed at uncovering meaningful associations between biological entities, including genes. This has the potential to offer new insights for research, uncovering hidden links between genes involved in critical pathways and processes. Recently, high-throughput studies have started to unravel the genetic landscape of chronic lymphocytic leukemia (CLL), the most common adult leukemia. CLL displays remarkable clinical heterogeneity, likely reflecting its underlying biological heterogeneity which, despite all progress, still remains insufficiently characterized and understood. This paper deploys an ontology-based semantic similarity combined with self-organizing maps for studying the temporal evolution of cohesion among CLL-related genes and the extracted information. Three consecutive time periods are considered and groups of genes are derived therein. Our preliminary results indicated that our proposed gene groupings are meaningful and that the temporal dimension indeed impacted the gene cohesion, leaving a lot of room for further promising investigations.
Read more at the University of Borås public repository
Meroño Peñuela, A., Wittek, P., Darányi, S. (2016). Visualizing the Drift of Linked Open Data Using Self-Organizing Maps. In Proceedings of Drift-a-LOD Workshop at the 20th International Conference on Knowledge Engineering and Knowledge Management
The urge for evolving the Web into a globally shared dataspace has turned the Linked Open Data (LOD) cloud into a massive platform containing 100 billion machine-readable statements. Several factors hamper a historical study of the evolution of the LOD cloud, and hence forecasting its future: its ever-growing scale, which makes a global analysis difficult; its Web-distributed nature, which challenges the analysis of its data; and the scarcity of regular and time-stamped archival dumps. Recently, a scalable implementation of self-organizing maps (SOM) has been developed to visualize the local topology of high-dimensional data. We use this methodology to address scalability issues, and the Dynamic Linked Data Observatory, a regular biweekly, centralized sample of the LOD cloud, as a time-stamped collection. We visualize the drift of Linked Datasets between 2012 and 2016, finding that datasets with high availability, high vocabulary reuse, and modeling with commonly used terms in the LOD cloud are better traceable across time.
Read more at the University of Borås public repository
Kompatsiaris, I., McNeill, J. (2016). PERICLES – Digital Preservation through Management of Change in Evolving Ecosystems. In The Success of European Projects Using New Information and Communication Technologies (pp. 51–74). Setubal, Portugal.
Management of change is essential to ensure the long-term reusability of digital assets. Change can be brought about in many ways, including through technological, user community and policy factors. Motivated by case studies in space science and time-based media, we consider the impact of change on complex digital objects comprising multiple interdependent entities, such as files, software and documentation. Our approach is based on modelling of digital ecosystems, in which abstract representations are used to assess risks to sustainability and support tasks such as appraisal. The paper is based on work of the EU FP7 PERICLES project on digital preservation, and presents some general concepts as well as a description of selected research areas under investigation by the project.
Read more at the University of Borås public repository
Wittek, P., Liu, Y.-H., Darányi, S., Gedeon, T., Lim, I. S. (2016). Risk and Ambiguity in Information Seeking : Eye Gaze Patterns Reveal Contextual Behaviour in Dealing with Uncertainty. Frontiers in Psychology,7
Information foraging connects optimal foraging theory in ecology with how humans search for information. The theory suggests that, following an information scent, the information seeker must optimize the trade-off between exploration by repeated steps in the search space vs. exploitation, using the resources encountered. We conjecture that this trade-off characterizes how a user deals with uncertainty and its two aspects, risk and ambiguity in economic theory. Risk is related to the perceived quality of the actually visited patch of information, and can be reduced by exploiting and understanding the patch to a better extent. Ambiguity, on the other hand, is the opportunity cost of having higher quality patches elsewhere in the search space. The aforementioned trade-off depends on many attributes, including traits of the user: at the two extreme ends of the spectrum, analytic and wholistic searchers employ entirely different strategies. The former type focuses on exploitation first, interspersed with bouts of exploration, whereas the latter type prefers to explore the search space first and consume later. Based on an eye tracking study of experts’ interactions with novel search interfaces in the biomedical domain, we demonstrate that perceived risk shifts the balance between exploration and exploitation in either type of users, tilting it against vs. in favour of ambiguity minimization. Since the pattern of behaviour in information foraging is quintessentially sequential, risk and ambiguity minimization cannot happen simultaneously, leading to a fundamental limit on how good such a trade-off can be. This in turn connects information seeking with the emergent field of quantum decision theory.
Read more at the University of Borås public repository
Wittek, P., Darányi, S., Nelhans, G. (2016). Ruling out static latent homophily in citation networks. Scientometrics, 110(2), 765–777.
Citation and co-author networks offer an insight into the dynamics of scientific progress. We can also view them as representations of a causal structure, a logical process captured in a graph. From a causal perspective, we can ask questions such as whether authors form groups primarily due to their prior shared interest, or if their favourite topics are ‘contagious’ and spread through co-authorship. Such networks have been widely studied by the artificial intelligence community, and recently a connection has been made to nonlocal correlations produced by entangled particles in quantum physics—the impact of latent hidden variables can be analyzed by the same algebraic geometric methodology that relies on a sequence of semidefinite programming (SDP) relaxations. Following this trail, we treat our sample co-author network as a causal graph and, using SDP relaxations, rule out latent homophily as a manifestation of prior shared interest only, leading to the observed patternedness. By introducing algebraic geometry to citation studies, we add a new tool to existing methods for the analysis of content-related social influences.
Read more at the University of Borås public repository
Darányi, S., Wittek, P. (2015). Conceptual machinery of the mythopoetic mind : Attis, a case study. In Proceedings of QI-15, 9th International Quantum Interaction Symposium
In search for the right interpretation regarding a body of related content, we screened a small corpus of myths about Attis, a minor deity from the Hellenistic period in Asia Minor to identify the noncommutativity of key concepts used in storytelling. Looking at the protagonist's typical features, our experiment showed incompatibility with regard to his gender and downfall. A crosscheck for entanglement found no violation of a Bell inequality, its best approximation being on the border of the local polytope.
Read more at the University of Borås public repository
Darányi, S., Wittek, P., Konstantinidis, K., Papadopoulos, S. (2015). A Potential Surface Underlying Meaning? Presented at the Yandex School of Data Analysis Conference, Machine Learning: Prospects and Applications.
Machine learning algorithms utilizing gradient descent to identify concepts or more general learnables hint at a so-far ignored possibility, namely that local and global minima represent any vocabulary as a landscape against which evaluation of the results can take place. A simple example to illustrate this idea would be a potential surface underlying gravitation. However, to construct a gravitation-based representation of, e.g., word meaning, only the distance between localized items is a given in the vector space, whereas the equivalents of mass or charge are unknown in semantics. Clearly, the working hypothesis that physical fields could be a useful metaphor to study word and sentence meaning is an option but our current representations are incomplete in this respect. For a starter, consider that an RBF kernel has the capacity to generate a potential surface and hence create the impression of gravity, providing one with distance-based decay of interaction strength, plus a scalar scaling factor for the interaction, but of course no term masses. We are working on an experiment design to change that. Therefore, with certain mechanisms in neural networks that could host such quasi-physical fields, a novel approach to the modeling of mind content seems plausible, subject to scrutiny. Work in progress in another direction of the same idea indicates that by using certain algorithms, already emerged vs. still emerging content is clearly distinguishable, in line with Aristotle’s Metaphysics. The implications are that a model completed by “term mass” or “term charge” would enable the computation of the specific work equivalent of sentences or documents, and that via replacing semantics by other modalities, vector fields of more general symbolic content could exist as well. Also, the perceived hypersurface generated by the dynamics of language use may be a step toward more advanced models, for example addressing the Hamiltonian of expanding semantic systems, or the relationship between reaction paths in quantum chemistry vs. sentence construction by gradient descent.
Read more at the University of Borås public repository
Wittek, P. (2015). Ncpol2sdpa – Sparse Semidefinite Programming Relaxations for Polynomial Optimization Problems of Noncommuting Variables. ACM Transactions on Mathematical Software
A hierarchy of semidefinite programming (SDP) relaxations approximates the global optimum of polynomial optimization problems of noncommuting variables. Generating the relaxation, however, is a computationally demanding task, and only problems of commuting variables have efficient generators. We develop an implementation for problems of noncommuting problems that creates the relaxation to be solved by SDPA -- a high-performance solver that runs in a distributed environment. We further exploit the inherent sparsity of optimization problems in quantum physics to reduce the complexity of the resulting relaxations. Constrained problems with a relaxation of order two may contain up to a hundred variables. The implementation is available in Python. The tool helps solve problems such as finding the ground state energy or testing quantum correlations.
Read more at the University of Borås public repository
Wittek, P., Darányi, S., Kontopoulos, E., Moysiadis, T., Kompatsiaris, I. (2015). Monitoring Term Drift Based on Semantic Consistency in an Evolving Vector Field. In Proceedings of IJCNN-15
Based on the Aristotelian concept of potentiality vs. actuality allowing for the study of energy and dynamics in language, we propose a field approach to lexical analysis. Falling back on the distributional hypothesis to statistically model word meaning, we used evolving fields as a metaphor to express time dependent changes in a vector space model by a combination of random indexing and evolving self-organizing maps (ESOM).To monitor semantic drifts within the observation period, an experiment was carried out on the term space of a collection of 12.8 million Amazon book reviews. For evaluation, the semantic consistency of ESOM term clusters was compared with the irrespective neighbourhoods in WordNet, and contrasted with distances among term vectors by random indexing. We found that at 0.05 level of significance, the terms in the clusters showed a high level of semantic consistency. Tracking the drift of distributional patterns in the term space across time periods, we found that consistency decreased, but not at a statistically significant level. Our method is highly scalable, with interpretations in philosophy.
Read more at the University of Borås public repository
Wittek, P., Darányi, S., Liu, Y.-H. (2014). A Vector Field Approach to Lexical Semantics. Presented at the 8th International Conference on Quantum Interaction, Filzbach, Switzerland. June 30 - July 3, 2014.
We report work in progress on measuring "forces" underlying the semantic drift by comparing it with plate tectonics in geology. Based on a brief survey of energy as a key concept in machine learning, and the Aristotelian concept of potentiality vs. actuality allowing for the study of energy and dynamics in language, we propose a field approach to lexical analysis. Until evidence to the contrary, it was assumed that a classical field in physics is appropriate to model word semantics. The approach used the distributional hypothesis to statistically model word meaning. We do not address the modelling of sentence meaning here. The computability of a vector field for the indexing vocabulary of the Reuters-21578 test collection by an emergent self-organizing map suggests that energy minima as learnables in machine learning presuppose concepts as energy minima in cognition. Our finding needs to be confirmed by a systematic evaluation.
Read more at the University of Borås public repository
Darányi, S., Wittek, P., Kitto, K. (2013). The Sphynx’s new riddle : How to relate the canonical formula of myth to quantum interaction. Presented at the 7th International Quantum Interaction Conference, Leicester, United Kingdom.
We introduce Claude Lévi Strauss' canonical formula (CF), an attempt to rigorously formalise the general narrative structure of myth. This formula utilises the Klein group as its basis, but a recent work draws attention to its natural quaternion form, which opens up the possibility that it may require a quantum inspired interpretation. We present the CF in a form that can be understood by a non-anthropological audience, using the formalisation of a key myth (that of Adonis) to draw attention to its mathematical structure. The future potential formalisation of mythological structure within a quantum inspired framework is proposed and discussed, with a probabilistic interpretation further generalising the formula.
Read more at the University of Borås public repository
Wittek, P., Koopman, B., Zuccon, G., Darányi, S. (2013). Combining Word Semantics within Complex Hilbert Space for Information Retrieval. Presented at the 7th International Quantum Interaction Conference, Leicester, United Kingdom.
Complex numbers are a fundamental aspect of the mathematical formalism of quantum physics. Quantum-like models developed outside physics often overlooked the role of complex numbers. Specifically, previous models in Information Retrieval (IR) ignored complex numbers. We argue that to advance the use of quantum models of IR, one has to lift the constraint of real-valued representations of the information space, and package more information within the representation by means of complex numbers. As a first attempt, we propose a complex-valued representation for IR, which explicitly uses complex valued Hilbert spaces, and thus where terms, documents and queries are represented as complex-valued vectors. The proposal consists of integrating distributional semantics evidence within the real component of a term vector; whereas, ontological information is encoded in the imaginary component. Our proposal has the merit of lifting the role of complex numbers from a computational byproduct of the model to the very mathematical texture that unifies different levels of semantic information. An empirical instantiation of our proposal is tested in the TREC Medical Record task of retrieving cohorts for clinical studies.
Read more at the University of Borås public repository
Wittek, P., Lim, I. S., Rubio-Campillo, X. (2013). Quantum Probabilistic Description of Dealing with Risk and Ambiguity in Foraging Decisions. Presented at the 7th International Quantum Interaction Conference, Leicester, United Kingdom.
A forager in a patchy environment faces two types of uncertainty: ambiguity regarding the quality of the current patch and risk associated with the background opportunities. We argue that the order in which the forager deals with these uncertainties has an impact on the decision whether to stay at the current patch. The order effect is formalised with a context-dependent quantum probabilistic framework. Using Heisenberg's uncertainty principle, we demonstrate the two types of uncertainty cannot be simultaneously minimised, hence putting a formal limit on rationality in decision making. We show the applicability of the contextual decision function with agent-based modelling. The simulations reveal order-dependence. Given that foraging is a universal pattern that goes beyond animal behaviour, the findings help understand similar phenomena in other fields.
Read more at the University of Borås public repository
Deliverables
D4.3 Content Semantics and Use Context Analysis Techniques
This document summarises the work conducted within task T4.3 of WP4, focusing on the extraction and the subsequent analysis of semantic information from digital content, which is imperative for its preservability. More specifically, the deliverable defines content semantic information from a visual and textual perspective, explains how this information can be exploited in long-term digital preservation and proposes novel approaches for extracting this information in a scalable manner.
D4.4 Modelling Contextualised Semantics
This document summarises the work conducted within task T4.4 of WP4, presenting our proposed models for semantically representing digital content and its respective context – the latter refers to any information coming from the environment of the digital object (DO) that offers better insight into the object’s status, its interrelationships with other content items and information about the object’s context of use.
D4.5 Context-aware Content Interpretation
This deliverable summarises the work conducted within task T4.5 of WP4, presenting PERICLES proposed approaches for contextualised content interpretation, aimed at gaining insightful contextualised views on content semantics.
D7.1 Report on Training Needs
Nasrine Olson, Elena Maceviciute, Tom Wilson
This document aims to present the results of the assessment of training needs and modes of training undertaken in the partner organisations TATE and B.USOC and related organisations. A study of training activities in other digital preservation related EU commissioned projects was also undertaken with the dual aim of learning from their experience and finding projects with which collaboration in training might be possible. A review of the limited existing research literature on training for digital preservation was also carried out.
D7.2 Training Material
Nasrine Olson
This deliverable accompanies the training material produced by the project, which is available online at http://pericles-project.eu/training-module/.