Ranking Relationships, Discovering Evidence

In my introductory post on the topic of social relationship identification (SRI), I talked about the need to develop a collaborative process where the machine guides the analyst in the discovery of relevant social relationships and learns from the context provided by the analyst. Both the analyst and the machine have important and complementary roles to play in the analytic process. The question is what specific form should that collaborative process take?

We’ll begin by focusing on the learning task. How should we think of SRI? Is this fundamentally a classification or ranking problem? Conclusions drawn by an analyst rest on trusted insights derived from data. The machine’s role is to accelerate the identification of trusted insights. The final interpretation lies with the analyst. If we choose to define the task as a classification problem, what happens when the analyst disagrees with aspects of the most probable network hypothesis? The analyst will request other probable hypotheses, thereby necessitating an ordering of hypotheses. Similarly given continual constraints on the available time for analysis, an ordering of the most probable hypotheses is key to maximize the value of limited exploration. Clearly classifiers can be utilized to rank order hypotheses; yet they will not excel at the task relative to functions optimized directly to rank. If prioritization is key to the utility of the system, as we contend, then SRI should be thought of fundamentally as a ranking task.

The next logical question is what specifically are we learning to rank? To answer this question, we must further specify our model of interaction. We envisioned an analyst navigating the communications graph and fixating on an individual of interest to understand the social structure they are embedded in. By constraining analysis to a chosen ego network, which contains the individual of interest (ego) and his/her associates (alters), the complexity of the analysis becomes more manageable.

Within the context of an ego network, the SRI objectives are two-fold: (a) to discover the most probable communication relationships that exhibit the social relation of interest (ex. manager-subordinate relations, friendship, etc.) and (b) to discover compelling evidence supporting the existence of the social relation, if such evidence exists. We refer to these tasks as relationship ranking and message ranking respectively. When ranking within an ego network is complete, the analyst will have a prioritized ordering of the communication relationships to work through along with an ordering of the messages in each relationship, highlighting those that are deemed most fruitful to examine first. Workflow in this problem requires significant attention as it is easy to immediately lose context even with the assistance of automation. That will the topic of posts farther into the future.

At this point, it is worth reflecting on the definition of a relationship to avoid misconceptions. In social network analysis, it is commonplace to abstract a network in the form of a static representation where time is not an explicit dimension. Here we choose to embrace time and acknowledge that social relationships are dynamic entities that evolve over time. A communication relationship between two individuals reflects aspects of that evolution in the messages that have been created over the course of the relationship. By analyzing these digital social artifacts, our goal is to understand over a given time period what social relations are expressed in the generated artifacts. So the specification of a given time period of interest is a critical part in defining the communication relationships, or segments thereof, that the ranking process will operate on.

The importance of time is clear when we reflect on the Enron scenario. We know that Tim Belden was a key actor in the development and execution of the financial schemes Enron employed to profit in the energy markets. Therefore we want to understand who he was reporting to and who he was supervising during the time period of the California energy crisis. Having a workflow that allows us to navigate in time relative to events of interest while exploring the social context is key to understanding how events unfolded.

[Joint work with Lise Getoor (UMCP), Galileo Namata (UMCP), Jaime Montemayor (JHU/APL) and Mike Pekala (JHU/APL)]

[Previous post in the series: The Social Relationship: Issues of Representation]