Structural and Semantic Similarity Measurement of UML Use Case Diagram

Reusing software has several beneﬁts ranging from reducing cost and risk, accelerating development, and its primary purposes are improving software quality. In the early stage of software development, reusing existing software artifacts may increase the beneﬁt of reusing software because it uses mature artifacts from previous artifacts. One of software artifacts is diagram, and in order to assist the reusing diagram is to ﬁnd the level of similarity of diagrams. This paper proposes a method for measuring the similarity of the use case diagram using structural and semantic aspects. For structural similarity measurement, Graph Edit Distance is used by transforming each actor and use case into a graph, while for semantic similarity measurement, WordNet, WuPalmer, and Levenshtein were used. The experimentation was conducted on ten datasets from various projects. The results of the method were compared with the results of assessments from experts. The measurement of agreement between experts and method was done by using Gwet’s AC1 and Pearson correlation coefﬁcient. Measurement results with Gwet’s AC1 diagram similarity are 0,60, which were categorized as “moderate" agreement and the result of measurement with Pearson is 0.506 which means there is a signiﬁcant correlation between experts and methods. The result showed that the proposed method can be used to ﬁnd the similarity of the diagram, so ﬁnding and reuse of the diagram as a software component can be optimized.


Introduction
Software reuse refers to a strategy in developing new software that uses previously developed software components [1,2,3,4]. These components could be code fragments, design, test data, or cost estimates. The scale of software reuse may range from one line of code within a function up to one complete software package. Software engineers classified two types of software reuse, i.e. systematic and accidental reuse. The systematic software reuse is a well-defined organization process in developing software in which reusable resources are intentionally generated, composed, or obtained, and then reliably expended and preserved to acquire an eminent degree of reuse [5]. It improves the capability of the organization to deliver high-quality endproducts in a timely and cost-effective manner. The end-product produced by systematic software reuse is considered more robust, well documented, and better-tested artifacts compared with accidental reuse. The accidental software reuse is an arbitrary process of developing software in which reusable resources are intentionally generated, composed, or obtained, and sporadically expended and hardly preserved. The accidental software reuse is simple, but components may not be in the best form. Reusing components, specifically on the diagram, could help quicken the product advancement process. It also can decrease the expenses and dangers utilized [6]. There are some information used to find compatible reused components [7,8,9], such as software requirements [10,11], the fragment of codes [12,13], metadata [14], and design [15,16,17]. There are methods or techniques used to compare diagrams, i.e. Graph Matching Techniques, Case-Based Reasoning Techniques, Ontology-Based Techniques, Information Retrieval methods, and other specific methods [1]. Su and Bao [18] concentrated on real structural similarity of the UML model by comparing XML structure in XML format using the graph approach. Whereas in [19], three types of information are used to measure the similarity of class diagrams based on their semantic similarity on WordNet.
Use case diagrams are UML diagrams to define functionality and graphically of a system in terms of actor, use cases, and relations [20]. A tool has been implemented for storing, searching, and retrieving use case diagrams using ontologies and Semantic Web technology by [20]. This tool stores use case diagram information in OWL ontology and the implementation in Java and using SPARQL query language. Previous research by Fauzan et al. [16] adapted its predecessors [17,21]. They suggest that the structural and semantic similarities of the two diagrams are suitable parameters in calculating the use case diagram similarity. They used the WuPalmer lexical distance of neighboring components for calculating structural similarity measurement. Both previous researches emphasized the use of semantic information from a diagram to measure the overall similarity of the two diagrams.
This study primarily focuses on developing an approach to measure the similarity between two use case diagrams by using structural and semantic aspects. To measure structural similarity, the proposed method used the process of modeling the use case diagram as a graph and graph similarity method and for semantic similarity used WuPalmer and Levenshtein. The rest of the paper is organized as follows. Section two describes in detail the similarity measurement method. It elaborates the semantic similarity measurement and structural similarity measurement. Section three describes the scenarios employed during the testing. It also shows the results and their analysis. The last section concludes the research and suggests future works.

Similarity Measurement Method
Our similarity measurement method is composed of two main processes, i.e. diagram preprocessing and similarity measurement process. The similarity measurement process comprises of two similarity measurement aspects, i.e. semantic similarity and structural similarity. The semantic similarity between the two use case diagram is calculated using the Greedy Algorithm. The structural similarity between the two use case diagrams is calculated using Graph Edit Distance.

Diagram Preprocessing
The diagram preprocessing aims mainly to extract the diagram metadata by converting the use case diagram into a graph. The use case diagram is modeled using an open-source UML modeling tool. Then, each model is exported to XML Metadata Interchange (XMI) format. A parser have been developed that analyze and convert XMI files into a graph by extracting property information of components that composed the system. The components are actors, use cases, and their relations. For the sake of illustration, let us consider a use case of an automatic teller machine (ATM), as shown in Figure 1.
The use case diagram in Figure 1 models the context diagram of the ATM system. Let the ATM system is called s1. The context diagram describes the overview of system interactions with other objects outside of the system. A use case in the context diagram represents the basic needs of an actor to the system. The ATM system has six main use cases, i.e. Check Balance, Deposit Fund, Withdraw Cash, Transfer Fund, Cash Register, and Maintenance. An actor is a role played by a set of objects outside the system that directly interacts with the system. An object can be an end-user or other system that directly interacts with the system. An object may have one or more roles, but an object can play only one role at a time. For example, a Card Holder is an actor played by any customer who has a bank account and holds an ATM card. The directed arrow shows the relations between actor and cardholder. An active actor is an actor that triggers the use case. A passive actor is an actor being involved in a use case. For example, the Card Holder has four use cases, i.e. Check Balance, Deposit Fund, Withdraw Cash, and Transfer Fund. In the Check Balance use case, the Card Holder is the active actor that triggers the Check Balance use case, while the Bank is a passive actor that being involved in the Check Balance use case.
The main use case may have a detailed description that views its relations with its sub-use cases. Figure 2 shows the detailed description of the use case Transfer Fund and Check Balance. It can be seen that that both use case has Print Transaction as their sub-use case. The Transfer Fund use case includes a Print Transaction use case, while the Check Balance use case is extended by Print Transaction use case.
Given the use case diagram of the ATM system in Figure 1, an XMI file of the use case diagram can be obtained. Figure 3 shows a snapshot of the XMI script of the ATM system's use case diagram. Use Cases and actors in the use case diagram are represented as package elements. A use case's package element is denoted by xmi:type="uml:UseCase", while an actor's package element is denoted by xmi:type="uml:Actor". Each package element has a unique identity. The association between actor Card Holder with use case Check Balance is represented by owned-Member element with type uml:Association. The element has two ends, i.e. the actor Card Holder and use case Check Balance (with green background). The relation between use case or depicted as extend, include, or generalization elements. The extension relation between Print Transaction and Check Balance is shown in Figure 4. Notice that text with green background is Check Balance use case.
The next step is parsing the XMI file and represents the element as a directed graph [6]. Let g(V, E) is a graph with a set of vertices V , and their edges E. A vertices can be an actor or a use case. An edge represents an association among actors, between an actor and a use case, or among use cases. The graph representation of the ATM system is shown in Figure 5.

Graph Edit Distance
In this paper, the inexact graph matching is used by facilitating Graph Edit Distance. Graph edit distance is the distance between two measured graphs, g 1 and g 2 , by the amount of distortion that is needed to transform g 1 into g 2 [22]. In this method, graph modifications take the form of addition, deletion, and replacement of vertices and edges. For vertices replacement, it is based on type of vertices (i.e. actor and use case) and for edges are based on its' type and directions (i.e. association, include, extend, and generalization). Equation 1 shows how to measure the distance of the two compared graphs.
where d λmin (g 1 , g 2 ) denoted as graph edit distance, which is the minimum transformation of graph g 1 into g 2 and c(e i ) is the cost for each graph modification. The cost of all operations in this paper is set 1, where it could be set a different number for increasing costs for certain operations.
The process of comparing vertices and edges is based on values obtained from the "xmi: type" attribute of the XMI file. type but also based on edge direction. Based on this, for directed relationships such as include, extend, and generalization, the location and type of origin or destination vertices also affect the total costs of the graph transformation process. So the results of the transformed graph will not only have the same vertices / edges type but also have the same direction of edges. Figure 6 is an example of the transformation steps of two compared graphs. The graph g 1 (first graph) has five vertices, where v 0 has associations to v 1 , v 2 , v 3 , and v 4 . The graph g2 (last graph) has three vertices, where v 0 has associations to v 1 and v 4 . To transform g 1 into g 2 , there should be a deletion of two actors (v 2 and v 3 ) and their (two) associations with vertices v 0 . Therefore, the sum of operation cost from g 1 to g 2 is 4. This operation cost value then converted into a number in a range between 0 and 1 with equation 2.
where cost is the value of operation cost, v is the number of vertices, and e is the number of edges of compared graphs g 1 and g 2 . From equation 2, the graph edit distance of g 1 and g 2 has operation cost 0.7143.

Word Similarity
The semantic relationship between the two concepts is often related to their distance in the Word-Net lexical dictionary. WordNet-based has been used for determining the semantic similarity of class diagram [23,19], sequence diagram [21,17], and use case diagram [16,20]. In this paper, the information contained in the use case diagram about actor and use case is measured using a combination of WuPalmer and Levenshtein where the calculation of Levenshtein distance will be used if the calculations with WuPalmer can not be performed.

Levenshtein Distance
Levenshtein distance is the smallest number of insertions, deletion, and substitution processes that change a word or string to be another string [24]. For example, Levenshtein Distance of string "synthesis" and "synthesize" is 2 because there are two operations: change character 's' into 'z' and addition of character 'e'. In this paper, equation 3 is an equation for transforming Levenshtein distance into a normalized number ranged 0 -1.
where lev is levenshtein distance value, len(w i ), and len(w j ) is string length of word w i and w j . Therefore, the result of similarity measurement of the words "synthesis" and "synthesize" based on the Levenshtein distance is 0.867.

Greedy Algorithm
In this paper, all of the comparison values from the two diagrams compared are arranged in metrics. Comparing the metrics requires an algorithm to find the most optimal value. Khiaty in  [23] proposed an algorithm based on greedy the algorithm, which is superior in matching time compared with the simulated annealing based algorithm. This method then adapted by several researchers such as [25,21] for measuring structural and semantic similarity.

Diagram Similarity Measurement
Based on the determined aspect, structural and semantics, the main formula for obtaining similarity between two compared diagrams is shown in equation 4. Since each aspect may have a different impact on total similarity, the proposed method used weights for each similarity measurement.
where w struc and w sem are the constant values which represent weight of structural and semantic aspects, respectively, strucSim and semSim are the results of structural and semantic similarity measurement. The weights are given arbitrarily. Structural and semantic similarity measurement use weight for actor and use case as in equation 3 and 4.
where W ac and W uc are the weight of actor and use case respectively, struc is result of the structural similarity measurement, and sem is the result of semantic similarity measurement, ∀ac i and ∀uc i is all actor and all use case respectively, within (∈) diagram d 1 and d 2 .
Based on equations 3 and 4, each actor in the first diagram and the second diagram will be matched and measured using Graph Edit Distance for structural, and combination of WuPalmer and Levenshtein for semantic similarity. The calculation results are summed and then multiplied with the weight of the actor W ac . This step will also be applied to each use case in the first diagram and second diagram. Weight for actor and use case is arbitrary given with value between 0 -1, where it's sum must be 1. These weights are used to emphasize which component in use case measurement, whether actor or use case.
To illustrate the calculation process, let's consider the second ATM system (shown in Figure 7). Let the second version of the ATM system called s2. In s2, there are only one actor, i.e. Card Holder, and four use cases, i.e. Withdraw Fund, Show Balance on Screen, Print Balance, and Authenticate Card Holder. The use case Withdraw Fund is the only use case that directly connected to the Card Holder. Given this information, a graph representation of s2, called g 2 , as shown in Figure 8 was generated. The next subsections explain how to calculate the structural and semantic similarities of the two diagrams. The weights of w ac and w uc for structural and semantic were set to 0.5, while w struc ,w sem was set to 0.7, 0.3, respectively.

Structural Similarity Measurement
The first step in structural similarity measurement is calculating the structural similarity of each component type. Therefore, each vertices within g 1 and g 2 is treated as sub-graphs. Given graph  g 1 , two sub-graphs for actors (Figure 9.a and 9.b) and five sub-graphs for use cases can be generated. Given graph g 2 , there are one sub-graph for the actor (Figure 9.c) and four sub-graphs for use cases. Then, for each actor in g 1 , the method calculates its sub-graph similarity with the sub-graph of each actor in g 2 . Using the Graph Edit Distance, the structural similarity between sub-graphs can be calculated. Transforming sub-graph Card Holder (sg 11 ) in g 1 into sub-graph Card Holder (sg 21 ) in g 2 requires six operations, i.e. removes three vertices (u 2 , u 3 , and u 4 ) and removes three edges (a 1 -u 2 , a 1 -u 3 , and a 1 -u 4 ). Therefore, the cost of transforming (sg 11 ) into (sg 21 ) is 6. Thus, transforming sub-graph bank (sg 12 ) in g 1 and sub-graph Card Holder (sg 21 ) requires seven operations, i.e removes three vertices (u 2 , u 3 , and u 4 ), removes three edges (u 2a 2 , u 3 -a 2 , and u 4 -a 2 ), and one edge replacement (from u 1 -a 2 , to a 2 -u 1 ). Therefore, the cost of transforming (sg 12 ) into (sg 21 ) is 7. Given their costs, the structural similarities can be calculated as follow: struc(a 1 : g 2 , a 1 : g 2 ) = 100 − 6·100 5+4+2+1 100 = 0.5 struc(a 2 : g 1 , a 1 : g 2 ) = 100 − 7·100 5+4+2+1 100 = 0.42 Given this structural similarity scores, it can be concluded that actor Card Holder in g 1 is more structurally similar to actor Card Holder in g 2 than actor Bank in g 1 . The structural similarity of actors in g 1 and g 2 can be calculated as follow: struc(∀ac i ∈ g 1 , ∀ac j ∈ g 2 ) = 2 · 0.5 2 + 1 = 0.33 Structural similarity measurement on the use case's sub-graphs is also conducted. Figure 10 shows the sub-graphs of the use case in g 1 and g 2 . Table 1 shows the structural similarity measurement of each pair. The result shows that u 1 : g 1 is best matched with u 1 : g 2 , u 2 : g 1 is best matched with u 4 : g 2 , u 3 : g 1 is best matched with u 3 : g 2 , and u 5 : g 1 is best matched with u 2 : g 2 . Given the best pairs, we could calculate the structural similarity measurement of use cases in g 1 and g 1 as follow: struc(∀uc i ∈ g 1 , ∀uc j ∈ g 2 ) = 2 · (0.813 + 0.750 + 0.625 + 0.625) 5 + 4 = 0.625 Given the structural similarity score of actors and use cases, we could calculate the structural similarity between g 1 and g 2 as follow: strucSim(g 1 , g 2 ) = 0.5 · 0.33 + 0.5 · 0.625 = 0.478

Semantic Similarity Measurement
The first step of semantic similarity measurement is extracting tokens of text from each component within each vertices. Each token should go through three text-preprocesses, i.e. stop-word  removal, lower casing, and lemmatizing. To get the semantic similarity of actors, the method calculated semantic similarity between tokens in each actor in g 1 against tokens in each actor in g 2 . To calculate the semantic similarity between tokens, WuPalmer and Levenshtein Distance algorithms are employed. To enable the use of WuPalmer calculation, both of the two compared tokens must be found in WordNet lexical database. If one of them is absent, the Levenshtein distance calculation function is used. Different from [16] and [25], this paper does not use cosine similarity for semantic similarity calculation. We could calculate the semantic similarity between pairs of actors as follow: sem(a 1 : g 1 , a 1 : g 2 ) = 2 · (1.0 + 1.0) 2 + 2 = 1.0 sem(a 2 : g 1 , a 1 : g 2 ) = 2 · 0.405 2 + 1 = 0.27 Given this semantic similarity scores, it can be concluded that actor Card Holder in g1 is more semantically similar to actor Card Holder in g 2 than actor Bank in g 1 . The semantic similarity of actors in g 1 and g 2 can be calculated as follow: sem(∀ac i ∈ g 1 , ∀ac j ∈ g 2 ) = 2 · (1.0) 2 + 1 = 0.67 Semantic similarity measurement on use cases is also conducted. To get the semantic similarity of use cases, the method calculated the semantic similarity between tokens in each use case in g 1 against tokens in each use case in g 2 . Using the WuPalmer similarity measurement, the semantic similarity between pairs of use cases can be calculated. Table 2 shows the semantic similarity measurement of each pair. The result shows that u 3 : g 1 is best matched with u 1 : g 2 , u 1 : g 1 is best matched with u 3 : g 2 , u 4 : g 1 is best matched with u 2 : g 2 , and u 5 : g 1 is best matched with u 4 : g 2 . Given the best pairs, we could calculate the semantic similarity measurement of use cases in g 1 and g 2 as follow: sem(∀uc i ∈ g 1 , ∀uc j ∈ g 2 ) = 2 · (1.0 + 0.835 + 0.492 + 0.256) 5 + 4 = 0.57 Given the semantic similarity score of actors and use cases, we could calculate the semantic similarity between g 1 and g 2 as follow: semSim(g 1 , g 2 ) = 0.5 · 0.67 + 0.5 · 0.57 = 0.62 The similarity score between the two graphs could be calculated using equation 4, given the weight of structural 0.5 and semantic 0.5 is 0.55. With the range value of similarity between 0 -1, where the highest value means equal, this similarity result of s1 and s2 is considered moderate. Although they have relatively significant semantic similarity, there are significant differences in their structure.

Datasets
In this study, the author collected ten projects. These projects are generated from several undergraduate student projects in a software engineering course. Table 3 shows a list of software projects. Each project has different complexity in terms of the number of actors and use cases.
They range from small (1 actor and four use cases) to medium size of projects.

Result and Discussion
A tool that implementing the proposed method has been built. This tool process use case diagrams started from parsing and analyzing XMI documents until the testing process. It has been built by using a combination of Typescript, Python, and libraries such as Python NLTK, and xml-js. After building the tool, the next step is redraw and convert into XMI all datasets that consist of ten diagrams from ten projects by using open-source UML modeling applications. This process also rechecked the models to make sure that all components structurally and semantically able to be processed. After finishing this process, all xmi documents parsed and analyzed by using the created tool.
To measure whether the proposed method can provide a sufficient result, a comparison with assessment from experts was conducted. In this paper, there are three experts, consisting of two academics and a practitioner in the field of use case diagram modeling who have used and utilized a use case diagram for at least two years. These experts provide an assessment of the similarity between 30 pairs of the compared diagrams. Expert's assessments were obtained using questionnaire contains all paired diagrams, and each diagram pair is given an expert rating for each aspect (structural and semantic) with number scale 1-5 where the greater of the number means the more similar the compared diagram.
Due to the different types of numbers, which is the expert's assessment number for questionnaire produces an ordinal number 1 -5, while the calculation from the proposed method produces 0 -1 interval numbers, then two kinds of calculations are used to measure the agreement between expert and method.   Table 4, and the result of agreements for diagrams based on semantic and structural similarity are listed in Table 5.
Based on the values of the agreement for the structural and semantic aspect in Table 4, an increased agreement for structural and semantic aspects were obtained with the increasing number of weights for actors, whether using Gwet's AC1 or Pearson's correlation. It can be interpreted that experts tend to assess structural and semantic similarity based on the conditions of actors. Still based on Table 4, the agreement on the semantic aspect is not optimal, and even all calculations with Pearson's are below the critical value, which means there is no significant relationship between expert's assessments and the method. This result also stated in [16]. Therefore, an improvement should be conducted on the current semantic similarity method. For the structural aspect, the value of the agreement is better than the agreement on the semantic aspect where the values are within the "moderate" agreement category, so Graph Edit Distance in this proposed method can be used as a tool in measuring the structural similarity of a diagram.
Based on the values of agreement of diagram similarity in Table 5, in general, the increasing agreement can be achieved by increasing structural weight. All values are categorized as a "moderate" agreement for Gwet's AC1 and have a significant relationship based on Pearson's correlation. Based on values in Tables 4 and 5, the proposed method is generally able to provide sufficient agreement values, both using Gwet's AC1 or using Pearson's correlation. However, the values obtained are not high or in the moderate category. Therefore, it can be concluded that the use of Graph Edit Distance for structural similarity and the use of WuPalmer and Levenshtein for semantic similarity can be used as one of the tools in measuring similarity diagrams.

Conclusion
This paper has introduced a method for measuring the similarity between use case diagrams. From ten datasets used from various project with various number of actor and use case, the level of agreement between the method and experts are in the "moderate" category, which is around 0.60. The results of experiments also showed that the graph approach to structural similarity calculations can be used in evaluating the similarity of use case diagrams as can be seen at the sufficient level of agreement between expert and method. The name of the property of component within the use case diagram is also ideal for measuring the use case diagram similarity in the semantic aspect.
The result further indicate that the method can be used to find the similarity of the diagram so that the finding and reuse of the diagram as a software component can be optimized. The re-finding of diagrams is very useful especially when going through new software projects that may have similar functionality that might be have the same use case diagram. But, there are still some problems that must be considered such as the proposed method is still not optimal in calculation semantic similarities because of the use of Levenshtein that quite often caused by the absence of the word in WordNet lexical database.
The important thing that should be considered that this work is limited to use case diagram, which may not work for other UML diagrams. Further study should determine a set of weights that can achieve the most accurate measurement value. Second, the author plan to search for an alternative algorithm to increase the measurement value of semantic aspect when the name of a component not listed on WordNet lexical dictionary or when the name of component consists of more than one word. This is because these two conditions reduce the opportunity for finding the word's lexical meaning in WordNet.