scispace - formally typeset
Open AccessJournal ArticleDOI

Automatic identification of informal social groups and places for geo-social recommendations

TLDR
A clustering algorithm based on user copresence that identifies groups and places even when group members participate to only a certain fraction of meetings is presented, demonstrating that 90 96% of group members can be identified with negligible false positives when the user meeting attendance is at least 50%.
Abstract
Mobile locatable devices can help identify previously unknown ad hoc or semi-permanent groups of people and their meeting places. Newly identified groups or places can be recommended to people to enhance their geo-social experience, while respecting privacy constraints. For instance, new students can learn about popular hangouts on campus or faculty members can learn about groups of students routinely having research discussions. This paper presents a clustering algorithm based on user copresence that identifies such groups and places even when group members participate to only a certain fraction of meetings. Simulation results demonstrate that 90 96% of group members can be identified with negligible false positives when the user meeting attendance is at least 50%. Experimental results using one-month of mobility traces collected from smart phones running Intel's PlaceLab location engine successfully identified all groups that met regularly during that period. Additionally, the group places were identified with good accuracy.

read more

Content maybe subject to copyright    Report

Int. J. Mobile Network Design and Innovation, Vol. X, No. Y, XXXX 1
Automatic identification of informal social groups
and places for geo-social recommendations
Ankur Gupta and Sanil Paul
Department of Computer Science,
New Jersey Institute of Technology,
University Heights, Newark, NJ 07102, USA
E-mail: ag59@njit.edu
E-mail: sp286@njit.edu
Quentin Jones
Department of Information Systems,
New Jersey Institute of Technology,
University Heights, Newark, NJ 07102, USA
E-mail: qjones@njit.edu
Cristian Borcea*
Department of Computer Science,
New Jersey Institute of Technology,
University Heights, Newark, NJ 07102, USA
E-mail: borcea@cs.njit.edu
*Corresponding author
Abstract: Mobile locatable devices can help identify previously unknown ad hoc or
semi-permanent groups of people and their meeting places. Newly identified groups or places
can be recommended to people to enhance their geo-social experience, while respecting privacy
constraints. For instance, new students can learn about popular hangouts on campus or faculty
members can learn about groups of students routinely having research discussions. This paper
presents a clustering algorithm based on user copresence that identifies such groups and places
even when group members participate to only a certain fraction of meetings. Simulation results
demonstrate that 90–96% of group members can be identified with negligible false positives when
the user meeting attendance is at least 50%. Experimental results using one-month of mobility
traces collected from smart phones running Intel’s PlaceLab location engine successfully identified
all groups that met regularly during that period. Additionally, the group places were identified with
good accuracy.
Keywords: mobile social computing; location aware recommender systems; group identification;
place identification.
Reference to this paper should be made as follows: Gupta, A., Paul, S., Jones, Q., and
Borcea, C. (XXXX) Automatic identification of informal social groups and places for geo-social
recommendations’, Int. J. Mobile Network Design and Innovation, Vol. X, No. Y, pp.XXX–XXX.
Biographical notes: Ankur Gupta is a PhD candidate in the Department of Computer Science at
the New Jersey Institute of Technology (NJIT). His research interests include mobile and
ubiquitous computing, middleware and algorithms. He received an MS in Computer Science
from NJIT in 2005.
Sanil Paul is working toward an MS in Computer Science at NJIT. His research interests include
ubiquitous computing and location-aware systems.
Quentin Jones is an Assistant Professor at NJIT. He is the Director of NJIT’s SmartCampus
Project, an effort to explore location-aware community system design, utility and social impacts.
His research and teaching focus is social computing with an emphasis on the design of collaborative
environments. He has a PhD in Information Systems from Haifa University, Israel.
Cristian Borcea is an Assistant Professor in the Department of Computer Science at the NJIT.
His research interests include mobile and ubiquitous computing, ad hoc networks, middleware
and distributed systems. He is a Member of ACM, IEEE and Usenix. Cristian received a PhD in
Computer Science from Rutgers University in 2004.
Copyright © XXXX Inderscience Enterprises Ltd.

2 A. Gupta et al.
1 Introduction
Internet-based social networking applications such as
Facebook (2004), MySpace (2003) and LinkedIn (2002)
have experienced a huge success during the last few years.
Existing location technologies (Bahl and Padmanabhan,
2000; Enge and Misra, 1999; LaMarca et al., 2005;
Priyantha et al., 2000), which proliferated on many mobile
devices such as smart phones, can be used to build on
this success and deliver location-aware social computing
applications. With research (Jones et al., 2007) showing
that users are increasingly willing to share their location
in return for services, these applications can provide
geo-social recommendations about people, places and events
of interests anytime, anywhere. The first steps in this direction
have already been taken by a number of context-aware
recommendation systems (Espinoza et al., 2001; Heijden
et al., 2005; Takeuchi and Sugimoto, 2005;Yang et al., 2008).
While these systems consider location or user preferences
when making recommendations, they do not take into account
group membership and associations between groups and
places. If captured and properly used, group membership
information can enhance the user profiles, thus improving
the quality of people-to-people recommendations (Jones
et al., 2004). Similarly, group-place associations can improve
the quality of place recommendations by enhancing the
semantics of the place with social information. However,
identifying social groups and their associated places is a
challenging task.
Social groups can be divided as either formal or informal.
Formal groups (e.g. students in a class, faculty members of
a department) have a formal organisational structure as well
as advertised meeting places and times. These groups and
their meeting places can easily be identified using web sites,
databases, notice boards or mailing lists. On the other hand,
informal groups are very hard to identify due to their volatile
or semi-permanent nature. Examples of informal groups
include a study group for a class, faculty that routinely have
lunch together, coworkers who play poker once in a while,
or neighbours who go together to the mall on Saturdays.
These groups tend to evolve out of collaborating individuals
with similar interests, and they are typically unknown to
people outside the group. Unlike formal groups, their
information (e.g. type, members, meeting places and times)
is not registered with an information database or service.
However, this information can be used, while respecting
privacy constraints, to provide valuable recommendations
that improve users’ geo-social experience. For example, new
students can learn about popular hangouts for social activities
on campus or faculty members can learn about groups of
students meeting to discuss a certain research topic.
This paper presents Group-Place Identification (GPI),
an algorithm for automatic identification of informal
social group members and group-place associations using
community mobility traces. GPI can be incorporated
in different location-aware social computing applications
that deliver geo-social recommendations. While users can
potentially provide data about informal social groups and
places, we believe that an automatic method is much more
accurate for two reasons. Firstly, it is possible that only
a small fraction of the users will introduce these data
manually. And secondly, the information introduced by users
can contain errors either by mistake or maliciously. GPI
can use mobility traces acquired from any type of location
technology. From a user privacy perspective, however,
systems that compute the location on the mobile devices
(e.g. Enge and Misra, 1999; La Marca et al., 2005) are
preferable because they give users control over what parts
of mobility trace are shared.
So far, mobility traces have only been used in algorithms
that identify significant places for individual users, such
as Kang et al. (2004) and Hightower et al. (2005).
To the best of our knowledge, no work has been done
on using community mobility traces to identify social
groups and places that have importance for a group of
people. While place identification algorithms typically deem
a place significant based on repeated patterns of user’s
presence at the place, identifying group members and
group-place associations is much harder because informal
groups do not have a clear pattern in terms of group
meeting times, group composition or group member
attendance. Therefore, GPI relies on repeated user
copresence at the same place to determine the group
members, and consequently the meeting places. The
underlying assumption is that group members have
a much higher Degree of Copresence (DCP) than
non-group members (i.e. the DCP is defined as the total
number of times two members were copresent divided by
the total number of group meetings). The fact that group
members are typically present only to a fraction of the
meetings and non-group members can possibly be present at
meetings raises the following question: What is the required
DCP between group members considered by GPI?
We performed a theoretical analysis that determined
the optimal required DCP that allows GPI to balance the
trade-off between group member identification percentage
and false positives percentage (i.e. non-group members
wrongly identified as group members). Based on this
analysis, we also calculated the expected results of the GPI
algorithm. We also implemented GPI and ran extensive
simulations. The results were in tune with the expected
theoretical values. GPI was able to identify between 90 and
96% of group members with negligible false positives when
the average meeting attendance was at least 50%.
Finally, we used the GPI implementation to identify
groups and places on our campus using mobility traces
collected from students and faculty. To successfully
integrate GPI into a mobile computing and communication
infrastructure, it is essential that this infrastructure provides
support to collect accurate and continuous user location data
both indoors and outdoors. The hardware infrastructure has
to be cheap and easily deployable in order to enable location
collection across large areas; as such, software solutions
that take advantage of existing hardware infrastructure are
preferable. Furthermore, systems that compute the location
on mobile devices and allow users to decide when and what
parts of the mobility traces are shared encourage the early
technology adoption for privacy-conscious users.
Considering these requirements, we chose the WiFi-based
Intel PlaceLab (LaMarca et al., 2005) location engine that
computes location on mobile devices using the position and
signal strength of visible access points. This system takes

Automatic identification of informal social groups and places for geo-social recommendations 3
advantage of existing access points, which are relatively
densely deployed in cities. Therefore, it can work both
indoors and outdoors across large urban areas. In our
campus, we have at least three visible access points almost
everywhere, and consequently, we obtained an accuracy of
10–15 m, which is good enough for GPI. However, one
major concern with this location engine is that it could cause
significant battery consumption, especially when location
is computed and delivered to a server frequently. Our
experiments (Anand et al., 2007) using iMate KJam smart
phones showed that the battery lasts for about 5–6 hr when
location is computed and delivered every 30 sec, which is
sufficient for GPI. This result demonstrated that GPI and
geo-social mobile recommendation applications are feasible
with current technologies. We then collected mobility traces
over a one-month period from smart phones carried by users
on our campus. GPI successfully identified all groups that
met regularly during that period. Additionally, the group
places extracted from these traces were identified with good
accuracy.
The rest of this paper is organised as follows.
Section 2 presents a number of applications that motivate
the importance of identifying group membership and
group-place associations. Section 3 describes our algorithm.
Section 4 presents the theoretical analysis and provides
guidelines for setting the constants of our algorithm function
of the environment conditions. Section 5 shows simulation
and experimental results. Related work is discussed
in Section 6, and this paper concludes in Section 7.
2 Motivation
This section considers a college campus scenario to illustrate
the two main categories of applications that can benefit
from information about informal social group membership
and place-group associations. The GPI algorithm can assist
recommendation applications with information about groups,
such as
1 members and their profile information
2 type, which can possibly be inferred from user profiles
and the meeting place
3 meeting times.
Additionally, it can provide information about places,
such as
1 types of groups meeting at a place and their corresponding
meeting times
2 statistical information about groups that meet at a place,
such as the total number of groups and the average size of
groups.
Group/person recommendations
For students: group membership information is leveraged
to build social networks. For example, if a student
needs help with a math assignment, an application
can analyse her social network and discover that one
of the members of her poetry reading group has
a friend who is a math major; subsequently, the
math major will be recommended to the person who
needs help. A different application is social matching
that provides recommendations for dating partners on
campus. For instance, people who are members of
the same groups are excluded from recommendations
(i.e. they know each other already), while people who
are members in similar groups and visit similar places are
higher ranked in recommendations.
For faculty: a faculty member looking to recruit new
students to work in his/her lab is recommended a group of
students who meet routinely to discuss research papers.
For administration: the identified groups are used for
group-centric information dissemination. For example,
research groups are notified about upcoming seminars
in their research area, and groups of students regularly
present on basketball courts are notified about an
upcoming intra-mural basketball tournament.
Place recommendations
For students: a new student finds out information about
popular spots for social activities on campus. For instance,
a CS student could find out that the game room of the
student centre is generally occupied by other CS students
on Tuesday evenings.
For faculty: a faculty member uses information about the
places where students from his department hang out to
post fliers about an upcoming course.
For administration: the administration discovers places
that need improvement on campus by checking the
statistical information about places (e.g. type, size and
demographics of the groups that meet at a place). For
instance, the settings and ambiance in certain rooms of the
student centre can be modified according to the number
of students who spend time there.
3 The GPI algorithm
GPI takes as input the users’ mobility traces obtained via
any location technology. The mobility traces of the users
consist of an array of location points indexed by time. To
have enough data for GPI, mobility traces should be collected
over an extended period of time. The goal of GPI is to analyse
these traces to identify the members of informal groups and
the meeting places of these groups. To understand what type
of group information GPI can extract from mobility traces,
we start by presenting a characterisation of typical informal
groups.
Member structure: the number of group members can
vary greatly. For instance, a study group could have
3–5 members, a basketball group could have
10–15 people, and a group of people attending
routinely seminars on wireless networks could go up to
30–50 people. Additionally, members are typically shared
among groups, and they join and leave groups frequently.

4 A. Gupta et al.
Member attendance: group members do not have a pattern
for meeting attendance, with the attendance frequency
typically varying form 100% to 50%. Consequently, the
number of members at the group meetings keeps varying
over time.
Meeting time: unlike with the formal groups, there is
no guarantee that informal groups meet regularly
(e.g. weekly at the same time).
Meeting place: groups are expected to share meeting
places over time, such as different study groups in the
library. Even worse, different groups can meet at the same
place simultaneously. For instance, two different groups
of students regularly have lunch in the same part of the
cafeteria.
Since these characteristics emphasise the lack of patterns
of informal groups, we decided that the only characteristic
amenable to automatic identification is member copresence
at the group place. Routine copresence among group
members is almost guaranteed even though it might vary over
time. Therefore, GPI’s challenge is to first detect repeated
copresence among users and then to analyse it to determine
the group members and the group places.
Figure 1 presents the pseudo-code for our algorithm. GPI
starts by identifying the important places for individual users.
For this purpose, we use the clustering algorithm proposed
by Kang et al. (2004). This algorithm performs time-based
clustering on users’ mobility traces; it starts by analysing
the trace points ordered by timestamps and adds them to
a cluster as long as the next point is within a permissible
distance d of the existing cluster. The cluster is closed if
the trace points move away from it. If the duration of such a
cluster is significant (more than time t), the cluster represents
a significant visit. The newly identified place is represented
by the average of the geographical coordinates of these points.
We set the distance threshold d to 30 m and the time threshold
t to 10 min as recommended by the authors.
For each place that a user (say u
i
) visited, we check if
there are groups associated with this place. The function
Figure 1 GPI algorithm pseudo-code
Inputs
U = (u
1
...u
n
) Input set of all users
M = (m
1
...m
n
) Mobility traces for users (u
1
...u
n
)
Constants
t Minimum time duration for significant cluster
d Maximum distance between clusters
d
cp
Maximum distance between copresent users
t
cp
Minimum time overlap for user visits to determine copresence
MI Maximum number of iterations
EV F Estimated group member visit frequency
RCP Required degree of copresence to determine a group member
MVC Minimum visit count to determine a potential group place
The Algorithm
For each user u
i
in U
SP
i
= IndividualPlaces(m
i
, t, d) /* Set of significant places for u
i
*/
For each place P
ij
in SP
i
DGM = empty /* Set of discovered group members */
CI =1/* Current number of iterations to identify group members */
Call IdentifyGroupMembers (u
i
, P
ij
)
While CI MI
Pick u
k
/* Random unprocessed user from DGM */
Call IdentifyGroupMembers (u
k
, P
ij
)
CI = CI +1
Call IdentifyMultipleGroups(DGM)
Output DGM
Function IdentifyGroupMembers(u
i
, P )
NV = NumberOfVisits(u
i
, P )
If NV MV C
EGM = NV/EVF /* Estimated total group meetings */
Add u
i
to DGM
For each user u
k
in U
CP = CoPresenceCount(u
i
, u
k
, P , d
cp
, t
cp
)
If RCP CP/EGM
Add u
k
to DGM
Remove data for user u
i
at place P from SP
i
Mark u
i
as processed in DGM

Automatic identification of informal social groups and places for geo-social recommendations 5
IdentifyGroupMembers uses copresence information to
identify the group members. This function first checks if the
user u
i
has a significant number of visits at the place (say P )to
ensure that the algorithm has sufficient visit data for analysis.
This is done by setting a constant for the minimum number
of visits, Minimum Visit Count (MVC). Setting constants in
GPI is an essential part of the algorithm given the volatile
nature of informal social groups. With changing operational
environments, the constants can be set differently to achieve
better performance. Section 4 discusses the criteria used
to set the values of all constants in GPI. If the number of
visits of u
i
at P is at least MVC, the function calculates the
estimated number of group meetings based on the Estimated
Visit Frequency (EVF). Estimation of the group meetings is
required because it is not possible to determine the actual
number of group meetings from the place visit data of a user.
Next, for each other user u
k
, the function analyses her
place visit data to check potential copresence with u
i
at P .
This information is used to build a copresence matrix with
respect to u
i
and P as illustrated in Table 1. For copresence
to be considered in the matrix, the distance between the
identified places for two users should be less than d
cp
and
the time overlap between the visits should be at least t
cp
. The
function uses the copresence matrix to compute the DCP of
u
i
with all the other users. The DCP is defined as the total
number of times two users are copresent divided by the total
number of group meetings. If the calculated DCP between
u
i
and u
k
is greater than the Required Degree of Copresence
(RCP), u
k
is added to the set Discovered Group Members
(DGM). Finally, the function removes the data for u
i
at place
P and marks the user as processed such that the algorithm
will not analyse u
i
at P again.
Table 1 Copresence matrix for user u
i
at place P , wherein 1
implies copresence with another user and 0 otherwise
Visit u
1
u
2
u
3
u
4
number
(u
i
at P)
1 111 1
2 011 0
3 110 0
4 110 1
5 101 0
6 010 1
In the main part of the algorithm, the function
IdentifyGroupMembers is repeated with an unprocessed user
from the set DGM to discover more group members. This is
necessary because it is possible that certain members were
not present at the group meetings when the first user was
present, but they were sufficiently copresent with the new
user picked up in this iteration. However, the probability of
encountering such users decreases significantly with every
subsequent iteration. To speed up the running time, this
process is repeated for Maximum Iterations (MI) times (less
than the number of users) because no new group members
are expected to be identified if more iterations are executed.
Finally, GPI analyses DGM to check for multiple groups
at the same place, by calling IdentifyMultipleGroups.In
rare cases, it is possible that the users in DGM belong to
two or more different groups at the same place. This may
happen when there are multiple groups at the same place,
and several shared members have sufficient copresence with
members of all the groups. However, it is easy to detect and
divide such groups considering the observation that besides
the shared members, members of one group do not have
enough copresence with members of another group. For
example, suppose that there are two groups (u
1
, u
2
, u
3
) and
(u
3
, u
4
, u
5
) that routinely hang out at the same place P . Then
u
1
and u
2
have significant copresence with each other and u
3
,
but not with u
4
and u
5
. Similarly u
4
and u
5
have significant
copresence only with each other and u
3
. We successfully
tested our procedure to split groups, but we do not present
the details due to the lack of space.
Once the algorithm completes, we need to define the
identified group place P . We compute the average of the
geographical coordinates of all trace points of all visits by all
users at P (let this be C). C is defined as a point, but most
applications are interested in well-defined places rather than
points. P is determined by looking at the actual geographies
around the point C. For example, if C falls inside an office
building, P is defined as all the rooms that overlap with a
circular area of radius E around C, where E is the maximum
error in determining C (i.e. this error is introduced by the
location technology). If the application needs to associate a
place with only one room, then P is considered to be the
room that contains C.
GPI executes off-line, and as such, its running time is
not essential for the applications. Nevertheless, we analysed
its complexity to estimate how long it would take to identify
groups and places for a large user population. The asymptotic
running time of the algorithm is O(n
2
× v
2
+ nt), where n is
the number of users, v is the maximum number of significant
visits for a user and t is the maximum number of mobility
trace points for a user. For instance, let us assume that the
user population is 10, 000, and we collect location data for
every user at every 10 sec, for 6 hr a day, during one month
period. Running on a medium size server, GPI will complete
in several hours, which is acceptable considering that it is
executed rarely.
4 Analysis of constants in GPI
As discussed in the previous section, GPI uses six constants
(RCP, MVC, EVF, MI, d
cp
, t
cp
) that affect significantly the
performance of the algorithm. It is important to note that
the values of these constants do not change once they have
been set for a certain environment. However, with changing
operational environments, it is possible to achieve better
identification results by altering the values of these constants.
For example, if we know that people meet more frequently
in a particular environment, we can set the estimated group
member visit frequency, EVF, higher. Similarly, we can set
the MVC higher, if we know that groups meet very frequently.
Our goal in this section is to provide the reader with an
understanding of how these constants affect the algorithm,

Citations
More filters
Journal ArticleDOI

A survey of context data distribution for mobile ubiquitous systems

TL;DR: A unified architectural model and a new taxonomy for context data distribution are presented by considering and comparing a large number of solutions and some of the research challenges still unsolved are drawn and identify some possible directions for future work.
Journal ArticleDOI

MobiSoC: a middleware for mobile social computing applications

TL;DR: MobiSoC is presented, a middleware that enables MSCA development and provides a common platform for capturing, managing, and sharing the social state of physical communities and incorporates algorithms that discover previously unknown emergent geo-social patterns to augment this state.
Journal ArticleDOI

Survey of Context Provisioning Middleware

TL;DR: This article surveys not only related work in context representation, context management and reasoning but also the required evaluation principles in these three categories.
Journal ArticleDOI

BREAKING BOUNDARIES: Recasting the “local” newspaper as “geo-social” news in a digital landscape

TL;DR: In this paper, the role of the small "local" newspaper in a new media environment and argues that definitions and concepts currently used to describe and define such publications are becoming increasingly problematic as newspapers shift into both print and online formats.
Proceedings ArticleDOI

The MobiSoC middleware for mobile social computing: challenges, design, and early experiences

TL;DR: MobiSoC, a middleware that enables MSCAs development and provides a common platform for capturing, managing, and sharing the social state of physical communities, is presented and incorporates algorithms that discover previously unknown emergent geosocial patterns to augment this state.
References
More filters
Proceedings ArticleDOI

RADAR: an in-building RF-based user location and tracking system

TL;DR: RADAR is presented, a radio-frequency (RF)-based system for locating and tracking users inside buildings that combines empirical measurements with signal propagation modeling to determine user location and thereby enable location-aware services and applications.
Book

Statistical Inference

Proceedings ArticleDOI

The Cricket location-support system

TL;DR: The randomized algorithm used by beacons to transmit information, the use of concurrent radio and ultrasonic signals to infer distance, the listener inference algorithms to overcome multipath and interference, and practical beacon configuration and positioning techniques that improve accuracy are described.
Book ChapterDOI

Place lab: device positioning using radio beacons in the wild

TL;DR: Experimental results are presented showing that 802.11 and GSM beacons are sufficiently pervasive in the greater Seattle area to achieve 20-30 meter median accuracy with nearly 100% coverage measured by availability in people's daily lives.
Journal ArticleDOI

Using GPS to learn significant locations and predict movement across multiple users

TL;DR: This work presents a system that automatically clusters GPS data taken over an extended period of time into meaningful locations at multiple scales and incorporates these locations into a Markov model that can be consulted for use with a variety of applications in both single-user and collaborative scenarios.
Related Papers (5)