|
| 1 | +--- |
| 2 | +title: "Analyzing Data" |
| 3 | +execute: |
| 4 | + echo: true |
| 5 | + message: false |
| 6 | +format: |
| 7 | + html: |
| 8 | + code-fold: false |
| 9 | +lightbox: true |
| 10 | +--- |
| 11 | + |
| 12 | +Our final step is to analyze the data using network analysis to uncover thematic areas within the Medieval Art collection. |
| 13 | + |
| 14 | +We have a dataset containing 1725 unique pairs of tags. By transforming this dataset into a network graph, we can visualize how terms relate to one another: |
| 15 | + |
| 16 | +- Nodes represent individual tags. |
| 17 | +- Edges represent co-occurrences of tags within the same object, indicating a thematic connection. |
| 18 | +- Weights reflect how often each tag pair appears together. |
| 19 | + |
| 20 | +While the full details of network construction are beyond the scope of this workshop, you can create and explore network graphs using tools like Gephi. The processed network data is available in the [project repository](https://github.com/jairomelo/intro2APIs-examples/tree/main/data) as [`network_medieval_art_tags.gexf`](https://github.com/jairomelo/intro2APIs-examples/blob/main/data/processed/network_medieval_art_tags.gexf). |
| 21 | + |
| 22 | +::: {.callout-tip collapse="true"} |
| 23 | +## What is GEXF? |
| 24 | + |
| 25 | +The GEXF (Graph Exchange XML Format) is a widely used format for storing and sharing network graphs. It is structured using XML, making it both human-readable and compatible with various network analysis tools like Gephi and Cytoscape. |
| 26 | +::: |
| 27 | + |
| 28 | +## Visualizing the network |
| 29 | + |
| 30 | +To better understand our network, we need to visualize how terms are connected. A graph layout allows us to represent nodes (tags) and edges (relationships) in a structured way, making it easier to explore clusters, central terms, and thematic areas. |
| 31 | + |
| 32 | +There are multiple tools available for network visualization, ranging from desktop applications to web-based platforms: |
| 33 | + |
| 34 | +- Gephi ([desktop](https://gephi.org/) | [Gephi Lite](https://gephi.org/gephi-lite/)) – A powerful open-source tool for network analysis. |
| 35 | +- Cytoscape ([desktop](https://cytoscape.org/)) – Commonly used in bioinformatics and large-scale networks. |
| 36 | +- Retina ([web-based](https://ouestware.gitlab.io/retina/beta/)) – A browser-friendly tool that allows interactive exploration. |
| 37 | + |
| 38 | +For this workshop, we will explore the network using Retina, which enables interactive exploration directly in your browser. |
| 39 | + |
| 40 | +::: {.callout-note collapse="true"} |
| 41 | +## Retina as a Web API for Network Visualization |
| 42 | + |
| 43 | +Unlike traditional network visualization tools, Retina works dynamically through URL parameters, making it function similarly to a Web API. Instead of manually adjusting settings in a graphical interface, Retina allows users to modify the view using URL parameters. These parameters control layout, node size, color attributes, filtering, and focus points—all without needing a separate backend. Because the graph is rendered directly in the user’s browser, all changes happen client-side, just like API calls retrieving data. |
| 44 | + |
| 45 | +This means we can construct Retina URLs dynamically to generate custom views of our network, making it a powerful tool for API-driven scientific research. |
| 46 | + |
| 47 | +By tweaking the URL, you can reconfigure the network visualization dynamically, much like calling an API with different query parameters! |
| 48 | + |
| 49 | +::: |
| 50 | + |
| 51 | +The layout for this visualization was prepared in Gephi and then exported to Retina, but you can also import the raw GEXF file into Retina and generate a similar graph from scratch. |
| 52 | + |
| 53 | +<iframe |
| 54 | + width="800" |
| 55 | + height="600" |
| 56 | + src="https://ouestware.gitlab.io/retina/beta/#/embed/?url=https%3A%2F%2Fgist.githubusercontent.com%2Fjairomelo%2F1aebfa947f5c84b8b43c7a1eb06857fa%2Fraw%2Fa5d4243596c17b6df8cab16f2ef09585fc520d32%2Fnetwork-5fafdb38-4d8.gexf&c=m-s&s=ei&sa[]=h&sa[]=b&sa[]=cu&sa[]=t&sa[]=ei&sa[]=r&sa[]=co&ca[]=ec-s&ca[]=w-s&ca[]=m-s&ca[]=s-s&er=0.1&ec=s&ds=1" |
| 57 | + frameBorder="0" |
| 58 | + title="Retina" |
| 59 | + allowFullScreen |
| 60 | +></iframe> |
| 61 | +
|
| 62 | +You can also explore the network in a new tab by clicking [here](https://ouestware.gitlab.io/retina/beta/#/graph?url=https%3A%2F%2Fgist.githubusercontent.com%2Fjairomelo%2F1aebfa947f5c84b8b43c7a1eb06857fa%2Fraw%2Fa5d4243596c17b6df8cab16f2ef09585fc520d32%2Fnetwork-5fafdb38-4d8.gexf&c=m-s&s=ei&sa[]=h&sa[]=b&sa[]=cu&sa[]=t&sa[]=ei&sa[]=r&sa[]=co&ca[]=ec-s&ca[]=w-s&ca[]=m-s&ca[]=s-s&er=0.1&ec=s&ds=1) |
| 63 | + |
| 64 | +The network is fully interactive, allowing you to zoom in and out to examine specific areas or use the search bar to locate a particular term. Each node represents a tag from the dataset, and its color and size reflect important network properties. |
| 65 | + |
| 66 | +- Color is determined by the *modularity class*, which groups related terms into thematic communities. Terms that frequently appear together in the dataset will tend to cluster, forming distinct regions in the network. |
| 67 | +- Size is based on *Eigenvector Centrality*, a measure of influence within the network. Larger nodes indicate terms that serve as key connectors, linking multiple themes together. For instance, if a term like "Christ" or "Virgin Mary" appears prominently, it suggests that it plays a central role in shaping the overall structure of the dataset. |
| 68 | + |
| 69 | +To further customize your view, open the menu on the left and experiment with color and size settings to highlight different attributes. By adjusting these parameters, you can gain new insights into how terms interact and uncover unexpected thematic relationships within the Medieval Art collection. |
| 70 | + |
| 71 | +## Exploring the network |
| 72 | + |
| 73 | +One of the challenges of using networks as a data visualization tool is the risk of creating a "hairball" — a dense, cluttered graph with too many connections, making it difficult to interpret patterns or insights. To address this issue, Retina offers built-in features that allow us to explore the network in more detail and isolate key themes. |
| 74 | + |
| 75 | +### Isolate communities |
| 76 | + |
| 77 | +Since each community in our network represents a thematic area, we can filter specific communities to explore the terms they contain and their connections to the broader network. |
| 78 | + |
| 79 | +::: {.callout-tip collapse="true"} |
| 80 | +## How to Isolate a Community in Retina |
| 81 | + |
| 82 | +A straightforward way to isolate communities is by using Retina’s "Explore" feature: |
| 83 | + |
| 84 | +1. Open the Explore menu. |
| 85 | +2. Click on a community to highlight it. |
| 86 | +3. Use the filter options to hide other nodes and focus on a single thematic area. |
| 87 | + |
| 88 | + |
| 89 | + |
| 90 | +::: |
| 91 | + |
| 92 | +Our analysis reveals 11 distinct communities, with three major groups containing most of the terms. Some communities are very small, consisting of just two or three nodes. |
| 93 | + |
| 94 | +To explore this further, let’s isolate the largest community (Community 0) and analyze its structure in more detail: |
| 95 | + |
| 96 | +<iframe |
| 97 | + width="800" |
| 98 | + height="600" |
| 99 | + src="https://ouestware.gitlab.io/retina/beta/#/embed/?url=https%3A%2F%2Fgist.githubusercontent.com%2Fjairomelo%2F1aebfa947f5c84b8b43c7a1eb06857fa%2Fraw%2Fa5d4243596c17b6df8cab16f2ef09585fc520d32%2Fnetwork-5fafdb38-4d8.gexf&c=m-s&s=ei&sa[]=h&sa[]=b&sa[]=cu&sa[]=t&sa[]=ei&sa[]=r&sa[]=co&ca[]=ec-s&ca[]=w-s&ca[]=m-s&ca[]=s-s&m-s.t=0&er=0.1&ec=s&ds=1" |
| 100 | + frameBorder="0" |
| 101 | + title="Retina" |
| 102 | + allowFullScreen |
| 103 | +></iframe> |
| 104 | +
|
| 105 | +With this filter applied, we can better interpret the network. Community 0 appears to represent "Animals, Plants, and Objects", a broad category that connects mythical, religious, and historical figures with specific objects such as flowers, horses, and medallions. |
| 106 | + |
| 107 | +### Spotlight on "Horses" |
| 108 | + |
| 109 | +To further explore connections within Community 0, let’s isolate the term "Horses" using the search bar in Retina. |
| 110 | + |
| 111 | +Doing this reveals that "Horses" serves as a bridge between multiple thematic areas. It connects to mythical and historical figures like Achilles, Alexander the Great, Andromache, Hercules, and Saint George. It also connects with other communities, such as Community 4, which centers on "Men" and "Women", and Community 3, which focuses on "Christ" and "Virgin Mary". |
| 112 | + |
| 113 | +<iframe |
| 114 | + width="800" |
| 115 | + height="600" |
| 116 | + src="https://ouestware.gitlab.io/retina/beta/#/graph/?url=https%3A%2F%2Fgist.githubusercontent.com%2Fjairomelo%2F1aebfa947f5c84b8b43c7a1eb06857fa%2Fraw%2Fa5d4243596c17b6df8cab16f2ef09585fc520d32%2Fnetwork-5fafdb38-4d8.gexf&c=m-s&s=b&n=Horses&sa[]=h&sa[]=b&sa[]=cu&sa[]=t&sa[]=ei&sa[]=r&sa[]=co&ca[]=ec-s&ca[]=w-s&ca[]=m-s&ca[]=s-s&m-s.t=0&er=0.1&ec=s&ds=1" |
| 117 | + frameBorder="0" |
| 118 | + title="Retina" |
| 119 | + allowFullScreen |
| 120 | +></iframe> |
| 121 | +
|
| 122 | +You can also notice other values displayed in the explore menu. Let's explore those values in more detail and what they mean in the context of our network. |
| 123 | + |
| 124 | +**Metric**|**Value (Horses)**|**Meaning in Context** |
| 125 | +:-----:|:-----:|:-----: |
| 126 | +Eccentricity|3|The longest shortest path from 'Horses' to any other node is 3 steps, meaning it is relatively central within the network. |
| 127 | +Closeness Centrality|0.56|A high value (0.56) suggests 'Horses' is relatively close to all other nodes, meaning it is well-positioned in the network. |
| 128 | +Harmonic Closeness Centrality|0.622|A variation of closeness centrality that accounts for disconnected components; 0.622 indicates that 'Horses' is influential even in sub-networks. |
| 129 | +Betweenness Centrality|0.041|Low betweenness means 'Horses' is not a major bridge but still plays a minor role in connecting subgroups. |
| 130 | +Weighted Degree|174|Very high, meaning 'Horses' has a large number of connections, suggesting it is a key thematic term in the dataset. |
| 131 | +Modularity Class|0|Belongs to Modularity Class 0, indicating it is part of a major thematic community in the network. |
| 132 | +Stat Inf Class|14|Categorized in statistical inference class 14, which may represent a specific thematic or metadata classification in the dataset. |
| 133 | +Clustering Coefficient|0.167|Low, meaning 'Horses' is not in a highly clustered environment but still has meaningful interconnections. |
| 134 | +Triangles|381|Forms 381 triangles, meaning it frequently appears in tightly interconnected clusters with other nodes. |
| 135 | +Eigenvector Centrality|0.584|Relatively high, meaning 'Horses' is well connected to other important nodes and has a strong influence in the network. |
0 commit comments