Automatic identification of informal social groups and places for geo-social recommendations

doi:10.1504/IJMNDI.2007.017320

Int. J. Mobile Network Design and Innovation, Vol. X, No. Y, XXXX 1

Automatic identiﬁcation of informal social groups

and places for geo-social recommendations

Ankur Gupta and Sanil Paul

Department of Computer Science,

New Jersey Institute of Technology,

University Heights, Newark, NJ 07102, USA

E-mail: ag59@njit.edu

E-mail: sp286@njit.edu

Quentin Jones

Department of Information Systems,

New Jersey Institute of Technology,

University Heights, Newark, NJ 07102, USA

E-mail: qjones@njit.edu

Cristian Borcea*

Department of Computer Science,

New Jersey Institute of Technology,

University Heights, Newark, NJ 07102, USA

E-mail: borcea@cs.njit.edu

*Corresponding author

Abstract: Mobile locatable devices can help identify previously unknown ad hoc or

semi-permanent groups of people and their meeting places. Newly identiﬁed groups or places

can be recommended to people to enhance their geo-social experience, while respecting privacy

constraints. For instance, new students can learn about popular hangouts on campus or faculty

members can learn about groups of students routinely having research discussions. This paper

presents a clustering algorithm based on user copresence that identiﬁes such groups and places

even when group members participate to only a certain fraction of meetings. Simulation results

demonstrate that 90–96% of group members can be identiﬁed with negligible false positives when

the user meeting attendance is at least 50%. Experimental results using one-month of mobility

traces collected from smart phones running Intel’s PlaceLab location engine successfully identiﬁed

all groups that met regularly during that period. Additionally, the group places were identiﬁed with

good accuracy.

Keywords: mobile social computing; location aware recommender systems; group identiﬁcation;

place identiﬁcation.

Reference to this paper should be made as follows: Gupta, A., Paul, S., Jones, Q., and

Borcea, C. (XXXX) ‘Automatic identiﬁcation of informal social groups and places for geo-social

recommendations’, Int. J. Mobile Network Design and Innovation, Vol. X, No. Y, pp.XXX–XXX.

Biographical notes: Ankur Gupta is a PhD candidate in the Department of Computer Science at

the New Jersey Institute of Technology (NJIT). His research interests include mobile and

ubiquitous computing, middleware and algorithms. He received an MS in Computer Science

from NJIT in 2005.

Sanil Paul is working toward an MS in Computer Science at NJIT. His research interests include

ubiquitous computing and location-aware systems.

Quentin Jones is an Assistant Professor at NJIT. He is the Director of NJIT’s SmartCampus

Project, an effort to explore location-aware community system design, utility and social impacts.

His research and teaching focus is social computing with an emphasis on the design of collaborative

environments. He has a PhD in Information Systems from Haifa University, Israel.

Cristian Borcea is an Assistant Professor in the Department of Computer Science at the NJIT.

His research interests include mobile and ubiquitous computing, ad hoc networks, middleware

and distributed systems. He is a Member of ACM, IEEE and Usenix. Cristian received a PhD in

Computer Science from Rutgers University in 2004.

2 A. Gupta et al.

1 Introduction

Internet-based social networking applications such as

Facebook (2004), MySpace (2003) and LinkedIn (2002)

have experienced a huge success during the last few years.

Existing location technologies (Bahl and Padmanabhan,

2000; Enge and Misra, 1999; LaMarca et al., 2005;

Priyantha et al., 2000), which proliferated on many mobile

devices such as smart phones, can be used to build on

this success and deliver location-aware social computing

applications. With research (Jones et al., 2007) showing

that users are increasingly willing to share their location

in return for services, these applications can provide

geo-social recommendations about people, places and events

of interests anytime, anywhere. The ﬁrst steps in this direction

have already been taken by a number of context-aware

recommendation systems (Espinoza et al., 2001; Heijden

et al., 2005; Takeuchi and Sugimoto, 2005;Yang et al., 2008).

While these systems consider location or user preferences

when making recommendations, they do not take into account

group membership and associations between groups and

places. If captured and properly used, group membership

information can enhance the user proﬁles, thus improving

the quality of people-to-people recommendations (Jones

et al., 2004). Similarly, group-place associations can improve

the quality of place recommendations by enhancing the

semantics of the place with social information. However,

identifying social groups and their associated places is a

challenging task.

Social groups can be divided as either formal or informal.

Formal groups (e.g. students in a class, faculty members of

a department) have a formal organisational structure as well

as advertised meeting places and times. These groups and

their meeting places can easily be identiﬁed using web sites,

databases, notice boards or mailing lists. On the other hand,

informal groups are very hard to identify due to their volatile

or semi-permanent nature. Examples of informal groups

include a study group for a class, faculty that routinely have

lunch together, coworkers who play poker once in a while,

or neighbours who go together to the mall on Saturdays.

These groups tend to evolve out of collaborating individuals

with similar interests, and they are typically unknown to

people outside the group. Unlike formal groups, their

information (e.g. type, members, meeting places and times)

is not registered with an information database or service.

However, this information can be used, while respecting

privacy constraints, to provide valuable recommendations

that improve users’ geo-social experience. For example, new

students can learn about popular hangouts for social activities

on campus or faculty members can learn about groups of

students meeting to discuss a certain research topic.

This paper presents Group-Place Identiﬁcation (GPI),

an algorithm for automatic identiﬁcation of informal

social group members and group-place associations using

community mobility traces. GPI can be incorporated

in different location-aware social computing applications

that deliver geo-social recommendations. While users can

potentially provide data about informal social groups and

places, we believe that an automatic method is much more

accurate for two reasons. Firstly, it is possible that only

a small fraction of the users will introduce these data

manually. And secondly, the information introduced by users

can contain errors either by mistake or maliciously. GPI

can use mobility traces acquired from any type of location

technology. From a user privacy perspective, however,

systems that compute the location on the mobile devices

(e.g. Enge and Misra, 1999; La Marca et al., 2005) are

preferable because they give users control over what parts

of mobility trace are shared.

So far, mobility traces have only been used in algorithms

that identify signiﬁcant places for individual users, such

as Kang et al. (2004) and Hightower et al. (2005).

To the best of our knowledge, no work has been done

on using community mobility traces to identify social

groups and places that have importance for a group of

people. While place identiﬁcation algorithms typically deem

a place signiﬁcant based on repeated patterns of user’s

presence at the place, identifying group members and

group-place associations is much harder because informal

groups do not have a clear pattern in terms of group

meeting times, group composition or group member

attendance. Therefore, GPI relies on repeated user

copresence at the same place to determine the group

members, and consequently the meeting places. The

underlying assumption is that group members have

a much higher Degree of Copresence (DCP) than

non-group members (i.e. the DCP is deﬁned as the total

number of times two members were copresent divided by

the total number of group meetings). The fact that group

members are typically present only to a fraction of the

meetings and non-group members can possibly be present at

meetings raises the following question: What is the required

DCP between group members considered by GPI?

We performed a theoretical analysis that determined

the optimal required DCP that allows GPI to balance the

trade-off between group member identiﬁcation percentage

and false positives percentage (i.e. non-group members

wrongly identiﬁed as group members). Based on this

analysis, we also calculated the expected results of the GPI

algorithm. We also implemented GPI and ran extensive

simulations. The results were in tune with the expected

theoretical values. GPI was able to identify between 90 and

96% of group members with negligible false positives when

the average meeting attendance was at least 50%.

Finally, we used the GPI implementation to identify

groups and places on our campus using mobility traces

collected from students and faculty. To successfully

integrate GPI into a mobile computing and communication

infrastructure, it is essential that this infrastructure provides

support to collect accurate and continuous user location data

both indoors and outdoors. The hardware infrastructure has

to be cheap and easily deployable in order to enable location

collection across large areas; as such, software solutions

that take advantage of existing hardware infrastructure are

preferable. Furthermore, systems that compute the location

on mobile devices and allow users to decide when and what

parts of the mobility traces are shared encourage the early

technology adoption for privacy-conscious users.

Considering these requirements, we chose the WiFi-based

Intel PlaceLab (LaMarca et al., 2005) location engine that

computes location on mobile devices using the position and

signal strength of visible access points. This system takes

Automatic identiﬁcation of informal social groups and places for geo-social recommendations 3

advantage of existing access points, which are relatively

densely deployed in cities. Therefore, it can work both

indoors and outdoors across large urban areas. In our

campus, we have at least three visible access points almost

everywhere, and consequently, we obtained an accuracy of

10–15 m, which is good enough for GPI. However, one

major concern with this location engine is that it could cause

signiﬁcant battery consumption, especially when location

is computed and delivered to a server frequently. Our

experiments (Anand et al., 2007) using iMate KJam smart

phones showed that the battery lasts for about 5–6 hr when

location is computed and delivered every 30 sec, which is

sufﬁcient for GPI. This result demonstrated that GPI and

geo-social mobile recommendation applications are feasible

with current technologies. We then collected mobility traces

over a one-month period from smart phones carried by users

on our campus. GPI successfully identiﬁed all groups that

met regularly during that period. Additionally, the group

places extracted from these traces were identiﬁed with good

accuracy.

The rest of this paper is organised as follows.

Section 2 presents a number of applications that motivate

the importance of identifying group membership and

group-place associations. Section 3 describes our algorithm.

Section 4 presents the theoretical analysis and provides

guidelines for setting the constants of our algorithm function

of the environment conditions. Section 5 shows simulation

and experimental results. Related work is discussed

in Section 6, and this paper concludes in Section 7.

2 Motivation

This section considers a college campus scenario to illustrate

the two main categories of applications that can beneﬁt

from information about informal social group membership

and place-group associations. The GPI algorithm can assist

recommendation applications with information about groups,

such as

1 members and their proﬁle information

2 type, which can possibly be inferred from user proﬁles

and the meeting place

3 meeting times.

Additionally, it can provide information about places,

such as

1 types of groups meeting at a place and their corresponding

meeting times

2 statistical information about groups that meet at a place,

such as the total number of groups and the average size of

groups.

Group/person recommendations

•

For students: group membership information is leveraged

to build social networks. For example, if a student

needs help with a math assignment, an application

can analyse her social network and discover that one

of the members of her poetry reading group has

a friend who is a math major; subsequently, the

math major will be recommended to the person who

needs help. A different application is social matching

that provides recommendations for dating partners on

campus. For instance, people who are members of

the same groups are excluded from recommendations

(i.e. they know each other already), while people who

are members in similar groups and visit similar places are

higher ranked in recommendations.

•

For faculty: a faculty member looking to recruit new

students to work in his/her lab is recommended a group of

students who meet routinely to discuss research papers.

•

For administration: the identiﬁed groups are used for

group-centric information dissemination. For example,

research groups are notiﬁed about upcoming seminars

in their research area, and groups of students regularly

present on basketball courts are notiﬁed about an

upcoming intra-mural basketball tournament.

Place recommendations

•

For students: a new student ﬁnds out information about

popular spots for social activities on campus. For instance,

a CS student could ﬁnd out that the game room of the

student centre is generally occupied by other CS students

on Tuesday evenings.

•

For faculty: a faculty member uses information about the

places where students from his department hang out to

post ﬂiers about an upcoming course.

•

For administration: the administration discovers places

that need improvement on campus by checking the

statistical information about places (e.g. type, size and

demographics of the groups that meet at a place). For

instance, the settings and ambiance in certain rooms of the

student centre can be modiﬁed according to the number

of students who spend time there.

3 The GPI algorithm

GPI takes as input the users’ mobility traces obtained via

any location technology. The mobility traces of the users

consist of an array of location points indexed by time. To

have enough data for GPI, mobility traces should be collected

over an extended period of time. The goal of GPI is to analyse

these traces to identify the members of informal groups and

the meeting places of these groups. To understand what type

of group information GPI can extract from mobility traces,

we start by presenting a characterisation of typical informal

groups.

•

Member structure: the number of group members can

vary greatly. For instance, a study group could have

3–5 members, a basketball group could have

10–15 people, and a group of people attending

routinely seminars on wireless networks could go up to

30–50 people. Additionally, members are typically shared

among groups, and they join and leave groups frequently.

4 A. Gupta et al.

•

Member attendance: group members do not have a pattern

for meeting attendance, with the attendance frequency

typically varying form 100% to 50%. Consequently, the

number of members at the group meetings keeps varying

over time.

•

Meeting time: unlike with the formal groups, there is

no guarantee that informal groups meet regularly

(e.g. weekly at the same time).

•

Meeting place: groups are expected to share meeting

places over time, such as different study groups in the

library. Even worse, different groups can meet at the same

place simultaneously. For instance, two different groups

of students regularly have lunch in the same part of the

cafeteria.

Since these characteristics emphasise the lack of patterns

of informal groups, we decided that the only characteristic

amenable to automatic identiﬁcation is member copresence

at the group place. Routine copresence among group

members is almost guaranteed even though it might vary over

time. Therefore, GPI’s challenge is to ﬁrst detect repeated

copresence among users and then to analyse it to determine

the group members and the group places.

Figure 1 presents the pseudo-code for our algorithm. GPI

starts by identifying the important places for individual users.

For this purpose, we use the clustering algorithm proposed

by Kang et al. (2004). This algorithm performs time-based

clustering on users’ mobility traces; it starts by analysing

the trace points ordered by timestamps and adds them to

a cluster as long as the next point is within a permissible

distance d of the existing cluster. The cluster is closed if

the trace points move away from it. If the duration of such a

cluster is signiﬁcant (more than time t), the cluster represents

a signiﬁcant visit. The newly identiﬁed place is represented

by the average of the geographical coordinates of these points.

We set the distance threshold d to 30 m and the time threshold

t to 10 min as recommended by the authors.

For each place that a user (say u

i

) visited, we check if

there are groups associated with this place. The function

Figure 1 GPI algorithm pseudo-code

Inputs

U = (u

1

...u

n

) → Input set of all users

M = (m

1

...m

n

) → Mobility traces for users (u

1

...u

n

)

Constants

t → Minimum time duration for signiﬁcant cluster

d → Maximum distance between clusters

d

cp

→ Maximum distance between copresent users

t

cp

→ Minimum time overlap for user visits to determine copresence

MI → Maximum number of iterations

EV F → Estimated group member visit frequency

RCP → Required degree of copresence to determine a group member

MVC → Minimum visit count to determine a potential group place

The Algorithm

For each user u

i

in U

SP

i

= IndividualPlaces(m

i

, t, d) /* Set of signiﬁcant places for u

i

*/

For each place P

ij

in SP

i

DGM = empty /* Set of discovered group members */

CI =1/* Current number of iterations to identify group members */

Call IdentifyGroupMembers (u

i

, P

ij

)

While CI ≤ MI

Pick u

k

/* Random unprocessed user from DGM */

Call IdentifyGroupMembers (u

k

, P

ij

)

CI = CI +1

Call IdentifyMultipleGroups(DGM)

Output DGM

Function IdentifyGroupMembers(u

i

, P )

NV = NumberOfVisits(u

i

, P )

If NV ≥ MV C

EGM = NV/EVF /* Estimated total group meetings */

Add u

i

to DGM

For each user u

k

in U

CP = CoPresenceCount(u

i

, u

k

, P , d

cp

, t

cp

)

If RCP ≤ CP/EGM

Add u

k

to DGM

Remove data for user u

i

at place P from SP

i

Mark u

i

as processed in DGM

Automatic identiﬁcation of informal social groups and places for geo-social recommendations 5

IdentifyGroupMembers uses copresence information to

identify the group members. This function ﬁrst checks if the

user u

i

has a signiﬁcant number of visits at the place (say P )to

ensure that the algorithm has sufﬁcient visit data for analysis.

This is done by setting a constant for the minimum number

of visits, Minimum Visit Count (MVC). Setting constants in

GPI is an essential part of the algorithm given the volatile

nature of informal social groups. With changing operational

environments, the constants can be set differently to achieve

better performance. Section 4 discusses the criteria used

to set the values of all constants in GPI. If the number of

visits of u

i

at P is at least MVC, the function calculates the

estimated number of group meetings based on the Estimated

Visit Frequency (EVF). Estimation of the group meetings is

required because it is not possible to determine the actual

number of group meetings from the place visit data of a user.

Next, for each other user u

k

, the function analyses her

place visit data to check potential copresence with u

i

at P .

This information is used to build a copresence matrix with

respect to u

i

and P as illustrated in Table 1. For copresence

to be considered in the matrix, the distance between the

identiﬁed places for two users should be less than d

cp

and

the time overlap between the visits should be at least t

cp

. The

function uses the copresence matrix to compute the DCP of

u

i

with all the other users. The DCP is deﬁned as the total

number of times two users are copresent divided by the total

number of group meetings. If the calculated DCP between

u

i

and u

k

is greater than the Required Degree of Copresence

(RCP), u

k

is added to the set Discovered Group Members

(DGM). Finally, the function removes the data for u

i

at place

P and marks the user as processed such that the algorithm

will not analyse u

i

at P again.

Table 1 Copresence matrix for user u

i

at place P , wherein 1

implies copresence with another user and 0 otherwise

Visit u

1

u

2

u

3

u

4

number

(u

i

at P)

1 111 1

2 011 0

3 110 0

4 110 1

5 101 0

6 010 1

In the main part of the algorithm, the function

IdentifyGroupMembers is repeated with an unprocessed user

from the set DGM to discover more group members. This is

necessary because it is possible that certain members were

not present at the group meetings when the ﬁrst user was

present, but they were sufﬁciently copresent with the new

user picked up in this iteration. However, the probability of

encountering such users decreases signiﬁcantly with every

subsequent iteration. To speed up the running time, this

process is repeated for Maximum Iterations (MI) times (less

than the number of users) because no new group members

are expected to be identiﬁed if more iterations are executed.

Finally, GPI analyses DGM to check for multiple groups

at the same place, by calling IdentifyMultipleGroups.In

rare cases, it is possible that the users in DGM belong to

two or more different groups at the same place. This may

happen when there are multiple groups at the same place,

and several shared members have sufﬁcient copresence with

members of all the groups. However, it is easy to detect and

divide such groups considering the observation that besides

the shared members, members of one group do not have

enough copresence with members of another group. For

example, suppose that there are two groups (u

1

, u

2

, u

3

) and

(u

3

, u

4

, u

5

) that routinely hang out at the same place P . Then

u

1

and u

2

have signiﬁcant copresence with each other and u

3

,

but not with u

4

and u

5

. Similarly u

4

and u

5

have signiﬁcant

copresence only with each other and u

3

. We successfully

tested our procedure to split groups, but we do not present

the details due to the lack of space.

Once the algorithm completes, we need to deﬁne the

identiﬁed group place P . We compute the average of the

geographical coordinates of all trace points of all visits by all

users at P (let this be C). C is deﬁned as a point, but most

applications are interested in well-deﬁned places rather than

points. P is determined by looking at the actual geographies

around the point C. For example, if C falls inside an ofﬁce

building, P is deﬁned as all the rooms that overlap with a

circular area of radius E around C, where E is the maximum

error in determining C (i.e. this error is introduced by the

location technology). If the application needs to associate a

place with only one room, then P is considered to be the

room that contains C.

GPI executes off-line, and as such, its running time is

not essential for the applications. Nevertheless, we analysed

its complexity to estimate how long it would take to identify

groups and places for a large user population. The asymptotic

running time of the algorithm is O(n

2

× v

2

+ nt), where n is

the number of users, v is the maximum number of signiﬁcant

visits for a user and t is the maximum number of mobility

trace points for a user. For instance, let us assume that the

user population is 10, 000, and we collect location data for

every user at every 10 sec, for 6 hr a day, during one month

period. Running on a medium size server, GPI will complete

in several hours, which is acceptable considering that it is

executed rarely.

4 Analysis of constants in GPI

As discussed in the previous section, GPI uses six constants

(RCP, MVC, EVF, MI, d

cp

, t

cp

) that affect signiﬁcantly the

performance of the algorithm. It is important to note that

the values of these constants do not change once they have

been set for a certain environment. However, with changing

operational environments, it is possible to achieve better

identiﬁcation results by altering the values of these constants.

For example, if we know that people meet more frequently

in a particular environment, we can set the estimated group

member visit frequency, EVF, higher. Similarly, we can set

the MVC higher, if we know that groups meet very frequently.

Our goal in this section is to provide the reader with an

understanding of how these constants affect the algorithm,

Automatic identification of informal social groups and places for geo-social recommendations

Figures

Citations

A survey of context data distribution for mobile ubiquitous systems

MobiSoC: a middleware for mobile social computing applications

Survey of Context Provisioning Middleware

BREAKING BOUNDARIES: Recasting the “local” newspaper as “geo-social” news in a digital landscape

The MobiSoC middleware for mobile social computing: challenges, design, and early experiences

References

RADAR: an in-building RF-based user location and tracking system

Statistical Inference

The Cricket location-support system

Place lab: device positioning using radio beacons in the wild

Using GPS to learn significant locations and predict movement across multiple users

Related Papers (5)

MobiSoC: a middleware for mobile social computing applications

The familiar stranger: anxiety, comfort, and play in public places

A middleware infrastructure for active spaces

P3 systems: putting the place back into social networks

Learning and recognizing the places we go