# Trends in Artificial Intelligence

### Article Outline

RESEARCH ARTICLE | VOLUME 2 | ISSUE 1 OPEN ACCESS

# Identifying Latent Semantics in Action Games for Player Modeling

Katia Lida Kermanidis

• Katia Lida Kermanidis 1*
• Department of Informatics, Ionian University, Greece

Kermanidis KL (2018) Identifying Latent Semantics in Action Games for Player Modeling. Trends Artif Intell 2(1):34-45.

Accepted: April 24, 2018 | Published Online: April 26, 2018

# Identifying Latent Semantics in Action Games for Player Modeling

## Abstract

Machine learning approaches to player modeling traditionally employ high-level game-knowledge-based features for representing game sessions, and often player behavioral features as well. The present work makes use of generic low-level features and latent semantic analysis for unsupervised player modeling, but mostly for revealing underlying hidden information regarding game semantics that is not easily detectable beforehand.

## Keywords

Action game semantics, Latent semantic analysis, Latent dirichlet allocation, k-means clustering, Low-level feature representation, Player modeling

## Introduction

Player modeling has been attracting the interest of game design and development experts for several years, as a means to increase player satisfaction and immersion. According to the inclusive reviews in [1,2], modeling techniques vary from empirical (data-driven) [3-7], where the application of machine learning or statistical analysis to gaming data enables predictions of playing styles, to theoretical (i.e. analytical), mostly applicable to board-like games, where search and optimization techniques are used to determine the moves towards the best outcome [8].

Regarding empirical approaches to player modeling, various learning techniques have been experimented with; Supervised, like support vector machines for predicting difficulty adjustment [9], Bayesian networks for classification [10], statistical analysis of the distribution of player actions [11], and unsupervised, like self-organizing maps [12], reinforcement learning [13], transfer learning [14] and preference learning [15]. Supervised techniques (stand-alone or in combination with unsupervised approaches) have been gaining in popularity [16-21] during the last three-four years, compared to purely unsupervised approaches [12,22,23], mostly due to their improved performance. Dimensionality reduction techniques, other than self-organizing maps, have been experimented with for unsupervised modeling: Linear Discriminant Analysis has been applied to arcade-style as well as combat-style games [24], where match data are annotated with the players' identity to enable the supervised application of Linear Discriminant Analysis, and then k-means clustering groups together players of the same gaming style.

All previous approaches use a limited number of high-level game and player features to perform modeling, that are game-dependent (vary from game to game, and a game expert is required to define them) and whose impact on the player model is to some extent a-priori sensed. The high-level features pose significant demands on knowledge resources, while they minimize expectations to extract new knowledge and unforeseen relations and dependencies between game and player features.

The present work proposes grouping similar playing styles together by modeling the semantics of the game domain. There are two possible ways for supplying domain knowledge [25]: By hand, making use of domain experts' know-how, and automatically, by deriving the semantics from large corpora of "word" sequences, i.e. sequences of concepts that carry units of meaning related to the domain. The first approach is more accurate, but domain-dependent, while the second (adopted in the present work) is useful when no hand-crafted knowledge is available.

In an attempt to disassociate the player modeling process from the necessity of an already known high-level knowledge-based feature set, the present paper proposes.

- The use of numerous low-level generic screen-distributed features, relatively easily derived from any action game, that are raw (i.e. they are unprocessed and don't undergo any averaging/rate counts etc.); Also, no sophisticated knowledge about the game is required (e.g. specific time lapses, specific player reactions, specific game frames and periods).

- The use of these features to model an action game domain knowledge, i.e. to semantically represent game states and game sessions.

- The application of Latent Semantic Analysis (LSA) to the game term-document matrix in order to enable the revelation of hidden, previously unknown semantic relations among players, game states and sessions.

- The comparison of LSA to other semantic modeling approaches, like Latent Dirichlet Allocation (LDA) to note their differences in addressing the data sparseness problem.

LSA has been applied with significant success to several domains, other than IR, like essay assessment in language learning [26], intelligent tutoring [27], text cohesion measurement [28], summary evaluation [29], text categorization [30]. Although all previously mentioned LSA applications have been performed on text corpora, some approaches have proposed its use in different non-textual knowledge domains like board game player modeling [31], complex problem solving [32], gene function prediction [33-35], web navigation behavior prediction [36], collaborative filtering [37], semantic description of images [38].

LDA has been applied to several distinct modeling applications, like topic detection [39,40], tag recommendation [41], entity resolution [42], human action recognition in video streams [43], spam recognition [44], satellite image clustering [45], flood event modeling [46], source code analysis [47].

## Latent Semantic Analysis

In Information Retrieval (IR) the most important data structure is the term-document matrix, where every matrix row corresponds to a vocabulary entry (the vocabulary or lexicon consists of the words/terms appearing in a document collection) and every matrix column corresponds to a document in the collection. Matrix cells may indicate term incidence (contain a Boolean value that denotes the presence or absence of a term in a document), term frequency (tf-denoting the number of times the term appears in a document) or more sophisticated weights that incorporate the discriminative power of every term, i.e. its ability to distinguish groups of documents based on their semantic similarity, like the widely used tf.idf metric, where idf is the logarithm of the inverse document frequency (df-the number of documents a term appears in). The larger the number of documents a word appears in, the weaker its discriminative power, i.e. its ability to semantically mark a document. The following formula defines tf.idf, N is the total number of documents in the collection.

In a realistic IR application scenario, the number of rows and columns of the term-document matrix may reach tens or hundreds of thousands. LSA [48] is a mathematical method, initially proposed for reducing the size of the term-document matrix in the IR setting, by applying Singular Value Decomposition (SVD). SVD decomposes the initial term-document matrix A into a product of three matrices and "transfers" matrix A into a new semantic space:

T is the matrix with rows the lexicon terms, and columns the dimensions of the new semantic space. The columns of DT represent the initial documents and its rows the new dimensions, while S is a diagonal matrix (all non-diagonal elements have a zero value) containing the singular values of A. Multiplication of the three matrices will reconstruct the initial matrix. The product can be computed in such a way that the singular values are positioned in the diagonal of S in descending order. The smaller the singular value, the less it affects the product outcome. By keeping only the first k, i.e. the k largest, singular values, and setting the remaining ones to zero and calculating the resulting product, a low-rank approximation Ak of the initial matrix A may be calculated as a least-squares best fit. The rank of the new matrix Ak is equal to the number k of selected singular values.

As an interesting side effect, dimensionality reduction decreases or increases the cell values of matrix Ak, compared to those of A. It may even set the occurrence of words to higher than zero for documents that they initially did not appear in. Thereby semantic relations between words and documents are revealed that were not apparent at first (latent). It needs to be noted that LSA is fully automatic, i.e. the latent semantic relations are learned in an unsupervised manner.

## LSA for Action Games

Unlike board-like and arcade games, action games have properties resembling those of complex/dynamic environments: Causality (actions/decisions often affect subsequent actions/decisions), time dependence (environmental circumstances that affect actions/decisions vary over time), and latent, implicit relations between domain properties that are not straightforward.

The first challenge is the identification of the set of domains "words", corresponding to the collection vocabulary in the IR domain. In board and board-like games, like tic-tac-toe or chess, domain "words" are easy to identify. Boards may be viewed as grids of cells and each cell state (e.g. X/O/empty in tic-tac-toe) constitutes a "word" [25]. In action games "words" are harder to identify. In the firefighting microworld of [32] "words" are actions like appliance moves, or water drops. The definition of a game "word" depends on the intended use of the model. If the intended use is behavior prediction, a "word" needs to model a player's action, as the player's sequence of actions (in a given context) defines his behavior. Words need to be primitive carriers of domain knowledge and every word represents a distinct game state (a game screenshot at a given moment) in a unique manner. In the present work two ways for defining words are proposed. The first (henceforth the "holistic" approach) makes use of high-level game features (e.g. position of player ship, number of enemy ships etc) and measures their value for each particular game state. The concatenation of all feature values in a given game state is the word. The second (henceforth the "grid" approach) considers each game state to be a grid of cells, i.e. the game screen is divided into rows and columns so that the resulting cell size can roughly fit the game elements (spaceships, weapons etc). Each cell may or may not contain one or more game elements, i.e. game state semantics is acquired by combining the semantics of each cell in a given state. The number of possible state conditions equals ${2}^{n*m}$ (m is the number of cells, and n the number of game elements). A word is the concatenation of all cell conditions for the given state.

The second challenge involves the definition of documents. In IR, documents are meaningful utterances that carry information regarding their topic, author, genre etc. Similarly, game documents need to constitute complete and meaningful player "utterances", i.e. well-formed sequences of words that constitute complete descriptions of player actions or context conditions. Herein these descriptions address entire game sessions as these carry a complete meaning and the mark of each individual player. A game session is represented by taking a sample of the game state at constant pre-defined time intervals (i.e. every 500 msecs) and registering the sequence of words ("holistic" or "grid") that describe the sample. Each sample represents a game state at the specified time point.

After words and documents have been defined, the term-document matrix may be constructed. In this matrix each row is a word (i.e. game state) and each column is a document (i.e. game session). Two matrices are experimented with: One where each matrix element denotes the tf and one where it denotes the tfidf weight of a particular word in a particular document. The term-document matrices are very sparse, i.e. only very few words appear in more than one document and very few words appear in a document more than once. LSA is applied to the matrices in an attempt to address this sparseness and extract the knowledge lying underneath it.

Having constructed the term-document matrices, each game session may be viewed as a learning vector and k-means clustering is applied to the initial and the reduced dataset so as to group similar game sessions together in an unsupervised manner.

## Latent Dirichlet Allocation

LDA [49] is a probabilistic process for modeling corpora. Documents in a collection are treated as distribution mixtures over 'topics'. 'Topics' are latent notions that are represented as word distributions P(w|t). Given a document d, the probability that it involves a mixture of topics t is P(t|d). The probability of a specific word wi given a document d, can be calculated as:

where ti is the latent topic and t is the number of topics, which is a-priori user-defined. LDA models the above distributions using Dirichlet priors from an unlabeled collection of documents. As a result, the process outputs a document-topic matrix and a term-topic matrix, i.e. a matrix containing membership probabilities of every document for every topic and a matrix with membership probabilities of every word for every topic respectively.

Unlike LSA, where the resulting feature-vectors are sometimes hard to explain, especially when they involve negative values, LDA probabilities on topics and documents are more straightforward to understand and make use of in other applications.

LDA has been chosen for modeling the game semantics in this work, compared to similar methods, like word2vec/doc2vec modeling, due to its higher interpretability (LDA results in topic probabilities, while word2vec results in a vector of real numbers that are hard to interpret) and its globalitya. In other words, LDA has the ability to model global dependencies (documents globally predict words), while, in word2vec, the modeled dependencies are local (one word predicts another word nearby). Given the complexity of the action game semantics, global dependencies need to be uncovered and are more useful.

## Experimental Setup: Space Debris

The videogame used for the purposes of data collection is based on Space Debris [22]. The action takes place within the confines of a single screen, with alien ships scrolling downwards. There are two types of enemy spaceships, the carrier (enemy 1) which is slow and can withstand more laser blasts, and a fighter (enemy 2) which is fast and easier to destroy. The player wins when he has successfully with stood the enemy ship waves for a predetermined time. The game environment is littered with floating asteroids which in their default state do not interact (i.e. collide) with any of the game spaceships. In order to do so, an asteroid has to be "energized" (hit by player weapon). Also floating are shield and life power-ups which the user can use to replenish his ship's shield and remaining lives. The player's ship is equipped with a laser cannon which she can use to shoot alien ships. The laser canon is weak and about 4-5 successful shots are required to destroy an enemy ship (except for the boss which requires many more). The laser can also be used to "energize" an asteroid and guide it to destroy an enemy ship. A screenshot of the game is shown in Figure 1.

The participants included 10 players (74 game sessions, 10532 game states). Each participant had a 5-minute trial period for becoming acquainted with Space Debris. Then, he/she was given a short questionnaire consisting of general gaming questions (about gaming experience, preferences etc.) and specific questions about Space Debris. The questionnaire is used as an initial resource to categorize the gaming style of the player into one of four categories: Novice (a player with little gaming experience and playing Space Debris without any particular style), tactical (a player keen on playing strategy or adventure games and when playing Space Debris makes wise use of the laser and power-ups), aggressive (a player keen on action games and when playing Space Debris fires constantly without frequent use of the power-ups), and defensive (a player keen on puzzle and internet games and when playing Space Debris does not fire or tries to avoid the enemies in order not to be killed). Afterwards, the participants played the game for 10 minutes, with the presence of a domain expert, who witnessed the player's gaming style and accepted or disputed the questionnaire's categorization. 29% of the game states belong to the novice class, 42% to the tactical, 19% to the aggressive and 10% to the defensive class. The class of a game state is given the value of the playing style a player exhibits on the particular game session.

For comparative analysis purposes, a first experiment was run using a dataset that is formed by a set of high-level features used traditionally in player modeling. Each feature-value vector represents an entire game session (i.e. 74 vectors in total), and the features (22 in number) denote

- The outcome of the game session (win/lose).

- The session score value.

- The mean number of available life and shield upgrades during the session.

- The total number of life and shield upgrades performed.

- The minimum and maximum horizontal and vertical position of the player during the session.

- The total number of lasers fired by the player and by the enemies during the session.

- The total number of enemies killed by lasers and asteroids.

- The total number of enemies escaped.

- The total number of enemies very close to the player.

- The total number of enemies close to the player.

- The total number of enemies on screen.

- The total number of asteroids hit.

- The total number of visible asteroids.

Classification tests were run with several state-of-the-art classifiers (i.e. pruned decision trees, random forests and Support Vector Machines) on the Weka machine learning workbench (https://www.cs.waikato.ac.nz/ml/weka/). Decision trees and random forests are used to demonstrate the features' impact on the gaming style, and the extent to which this knowledge is already known a-priori. As can be seen from the derived decision tree (Figure 2), the borderline of the player's position, the number of performed life and shield upgrades and the game result are closely related to the gaming style, information that is not so surprising, but can be sensed to some extent beforehand. The confusion matrix is shown in Figure 3. In this matrix rows show the actual gaming style derived from the questionnaire and the observation process described earlier (e.g. there are 25 + 2 + 2 + 0 = 29 actual tactical game sessions), and the columns show the style predicted by the decision tree (i.e. of the 29 actual tactical games, 25 are correctly classified as tactical, 2 as aggressive, 2 as novice and 0 as defensive).

Figure 4 shows the confusion matrix when applying random forests and Support Vector Machines to the data (both learning schemata led to the same results). The meta-learning random forest schema and the SVM classifier, more sophisticated than the simple decision tree classifier, both reach 100% accuracy, showing thereby the relevance of the chosen features for modeling the particular game. Random forests were run choosing randomly 12 features and forming 100 decision trees in every run [50]. Support Vector Machines were run using a first degree polynomial kernel function and the Sequential Minimal Optimization algorithm for training [51].

### "Holistic" data collection

The game features for the holistic dataset and their value range are listed next:

- The number of enemies very close to the player (denoting imminent threat) (min: 0, max: 5).

- The number of enemies close to the player (denoting danger) (min: 0, max: 9).

- The total number of enemies on screen (min: 0, max:13).

- The number of player lasers fired (min: 0, max: 5).

- The number of enemy 1 lasers fired (min: 0, max: 10).

- The number of enemy 2 lasers fired (min: 0, max: 10).

- The horizontal (X) and vertical (Y) position of the player (minX: -462, maxX: 462, minY: -334, maxY: 334).

- The number of life upgrades performed (min: 0, max: 1).

- The number of shield upgrades performed (min: 0, max: 1).

- The number of hit asteroids (min: 0, max: 6).

- The number of visible asteroids (min: 0, max: 8).

- The number of hit enemy 1 ships (min: 0, max: 8).

- The number of hit enemy 2 ships (min: 0, max: 8).

- The score value (min: 0, max: 8000).

- The number of the player's available life upgrades (min: 0, max: 3).

- The number of shields available to the player (min: 0, max: 100).

As explained earlier, a word is a string representing the concatenation of the values of all aforementioned features for a given game state, and the collected data resulted in 8458 distinct words. The feature values are discretized into at most ten value levels of equal distance. Even after discretization, the resulting term-document matrix consists of 98.4% zero elements, showing the sparseness problem.

### "Grid" data collection

The game screen is viewed as an 8 × 11 grid. Table 1 shows the distinct states that a cell may be in. More than one of the distinct states may be combined to form the cell state. A word is the concatenation of the states of the 88 cells. The vocabulary consists of 10263 words, and, in this case, zero elements in the term-document matrix reach 98.8%.

To lessen the sparseness problem, another dataset was created, by merging the 25 distinct cell states into ten (Table 2) and by reducing the granularity of the grid down to 4 × 5 = 20 cells. Vocabulary size drops down to 9694 words and the percentage of zero elements to 98.5%. The merging process inevitably leads to information loss, as different (though similar) game elements are treated as the same. All dataset files can be downloaded from http://di.ionio.gr/hilab.

### LSA for classification

All aforementioned term-document matrices are transposed so that rows indicate sessions and columns indicate game states. LSA is applied to all aforementioned term-document matrices. Experiments were run using 20, 40 and 60 latent semantic dimensions. The resulting D matrices depict the transformation of the game sessions (rows) into the new semantic space (columns). Performing classification experiments with the LSA datasets is not expected to lead to better results. As is claimed in the literature [52], running classification experiments using a dataset with latent attributes (dimensions) usually drops classification performance due to the unsupervised manner in which the latent dataset has been derived. Figure 5 shows the best results (confusion matrix) achieved with a latent dataset with the unpruned decision tree classifier.

Figure 6 shows the confusion matrix when using random forests (parameters are the same like before), and Figure 7 is the SVMs confusion matrix with a fourth degree polynomial kernel function. The prediction weakness of the latent dataset is evident even with sophisticated learners.

### LSA for clustering: Revealing underlying semantic information

In order to identify the significance of the application of LSA, a similarity detection process between game sessions needs to be performed, other than classification. Metrics like the cosine similarity are widely employed for this purpose in the literature. Herein k-means clustering is performed to the term-document matrices in order to group semantically similar game sessions together. Choosing the number of clusters to equal the number of gaming styles, i.e. four, enables classes-to-clusters evaluation. Classes-to-clusters evaluation results in a confusion matrix that depicts to what percent sessions of the same style (class) are grouped into the same cluster. The confusion matrix for all initial datasets (before applying LSA) is the same regardless of the dataset type (holistic, grid or merged grid) and of using tf or tfidf weights and is shown in Figure 8b. Rows indicate actual gaming styles and columns indicate the clusters these were assigned to. For comparison purposes, Figure 8a shows the corresponding results for the traditional dataset. The difference in clustering performance is significant, clearly due to the large number of features in the term-document datasets.

The remaining matrices ((c)-(t)) correspond to the LSA-transformed datasets. Though results are far from satisfactory when it comes to matching clusters to styles one-to-one (as already explained, this is not the primary concern of the proposed methodology anyway), several interesting findings are noteworthy. When the number of latent semantic dimensions reaches 60, clustering performance is the same as when no LSA is performed, i.e. no semantic information is hidden in the remaining singular values of the initial diagonal matrix that is not already captured in the first 60 singular values. Decreasing the number of singular values to 40 has a positive effect on clustering performance in most cases, while 20 dimensions seem to be insufficient to capture as well the semantic properties of the game. Regarding the holistic vs. the grid and the merged grid approach, it seems that the more "aggregate" (i.e. holistic) the nature of the dataset, the better the clustering performance, while the opposite holds for the "distributed" datasets. Intuitively, the latter require large amounts of data in order to overcome the bigger sparseness problem they face. Raw term frequencies seem to outperform tfidf scores in the "distributed" datasets, but not in the holistic dataset, where the df of the words varies significantly more and thereby makes their different discriminative power evident.

The subjective nature of the true play style assignment, the limited data, as well as the desire to investigate the semantic modeling process further, led to tests with a smaller number of clusters, i.e. 2. In this case, classes-to-clusters evaluation is not meaningful (a percentage of correctly clustered sessions is not provided). Figure 9 shows the formed clusters in relation to the game styles. What is interesting to observe is that for most datasets the majority of members of the first cluster are aggressive and tactical players, and of the second cluster defensive and novice players. The two formed clusters are interesting and can be explained, as novice players usually tend to play defensively, with no pattern or offensive strategy. On the other hand, aggressive and tactical players share the same confidence and are eager to win. The grid data captures this knowledge, while it is mostly lost in the merged grid, a sign of the information loss mentioned earlier. LSA helped reveal these unknown semantic similarities with low-level features even better than the high-level traditional dataset.

Another interesting finding is that the aggressive style seems to be the most difficult to model. While the other styles (especially the novice and defensive styles) show a more conforming behavior in the latent dimensions (almost no example has dimension values outside a specific limited range), some of the aggressive examples often show rebellious behavior and lie away from the majority of their co-clustered examples, given a latent dimension. This is partly attributed to the small number of examples of the particular class, and partly to the to-the-limit type of playing that aggressive players adopt, where borderline values of the latent features are often reached. This underlying semantics was not noticeable in the initial datasets but came to light with the LSA analysis.

For experimental purposes the DBScan [53] clusterer has also been applied, as it is claimed in the literature to cope well with noisy data. DBScan is density-based and forms high-density regions into clusters. Pre-defining the value of cluster radius and the minimum number of objects required to form a cluster, the algorithm does not need to know the number of clusters beforehand, unlike k-means. Objects that end up not belonging to any cluster are considered as noise and remain unclustered. In our experiments the minimum number of objects was set to 8, as it was the number of objects belonging to the minority class (aggressive), and several radius values were experimented with.

The only interesting findings with DBScan were achieved with the traditional dataset (Figure 10). The higher the radius value, the fewer instances remain unclustered. A radius of 0.8 leads to the optimal confusion matrix and, interestingly, novice and defensive examples are mostly grouped together, something that k-means was able to reveal in the latent space only. Here this knowledge is derived from the traditional dataset; However, 17 instances (i.e. 23%) are left unclustered. The initial and the latent dataset lead to instances forming one cluster only, regardless of the radius size. These datasets are more complex, and multi-density, and DBScan neglects entire clusters and assigns them as noise [54].

### Latent dirichlet allocation

Experiments with LDA were run on the Matlab Topic Modeling toolboxb that performs LDA with Gibbs sampling. The number of topics (T) was chosen to be 4 (the number of play styles) and 2 (for comparison purposes to LSA). According to the parameter default values suggested by the toolbox, the number of iterations was set to 500, the theta (topic distribution) and phi (topic-word distribution) hyperparameters on the Dirichlet priors were set to 50/T and 200/vocabulary-size respectively.

bhttp://psiexp.ss.uci.edu/research/programs_data/toolbox.htm

Figure 11 shows the topic assignments of the game sessions for the initial grid and holistic datasets. For exploratory purposes, in matrices (a) and (d), the knowledge extracted from the LSA experiments is taken into account and results are depicted when only two styles are considered: Aggressive-tactical and defensive-novice.

Supplementary Figure 1 and Supplementary Figure 2 show the percentage of words that belong to each of the four topics (T1-T4) for every game session (session ids 1-74 are depicted in the x-axis) for the holistic and for the grid data respectively. As can be seen, for most sessions, differences between their word-topic distributions are not immense, sometimes they are not even noticeable, leading to the 'unclear' confusion matrices of Figure 11.

The dashed lines separate sessions between different players. At the player level, all players can be modeled using one or at most two topics, i.e. interestingly the vast majority of their sessions consist of words that belong to one or two topics at the most.

The increased sparseness in the data makes it hard for a probabilistic modeling algorithm like LDA to identify further sense-making concepts within the data in such a large feature space. Performance limitations of the Dirichlet-distribution based traditional LDA technique on sparse data has also been discussed elsewhere [55]. Furthermore, probabilistic modeling approaches usually rely on more data for more confident results.

## Conclusion

Through a set of IR-inspired low-level features, LSA-based dimensionality reduction techniques and also a probabilistic modeling approach (Latent Dirichlet Allocation), the semantic domain of an action game is modeled, revealing dependencies and relations that are not easily detectable through traditional aggregate high-level feature vector representations.

The future research prospects regarding the detection of hidden semantic knowledge are significant. A larger dataset with more players, the inclusion of time-and causality-related features, and further exploration of the obtained model (e.g. word similarity apart from session similarity researched herein) can lead to the detection of unknown knowledge that may contribute significantly to intelligent player-centered game design. It can also ensure performance of higher quality for the probabilistic modeling approaches.

## Abstract

Machine learning approaches to player modeling traditionally employ high-level game-knowledge-based features for representing game sessions, and often player behavioral features as well. The present work makes use of generic low-level features and latent semantic analysis for unsupervised player modeling, but mostly for revealing underlying hidden information regarding game semantics that is not easily detectable beforehand.