dc.description.abstracteng | User attributes refer to a person's various demographic characteristics,
like income, education, job, age, gender, socioeconomic status (SES),
etc. User attributes play an important role in many research areas like
sociology and education. Recently, companies have become more
and more interested in user attributes because these attributes are also
valuable to many emerging applications, such as personalized recommendation, customized marketing and precise advertisements. For example, previous works leverage the users' age, gender, occupation
to improve the performance of personalized recommendation.
The manual survey is the traditional way to collect user attributes,
which is highly expensive and time-consuming. Many researchers
try to infer user attributes based on various kinds of user-generated data,
like people's tweets or cellphone records. Compared with the survey
method, these proposed machine-learning-based user attribute inference
(UAI) methods are much quicker and cheaper. However, there are still
many open challenges: to introduce new kind of user-generated data
source into attribute inference; to improve the accuracy for multiple
attribute prediction based on limited data sources; to improve the performance of user-attribute-enhanced (UAE) tasks by UAI methods.
For the first challenge, human mobility data based socioeconomic
status (SES) inference is chosen as a case study of introducing new
data source into UAI. The notion of SES of a person or family reflects
the corresponding entity's social and economic rank in society. This attribute can help applications like bank loaning decisions and provide
measurable inputs for related studies like social stratification, social
welfare and business planning. Traditionally, estimating SES for a large
population is performed by national statistical institutes through a large
number of household interviews. Recently researchers begin to estimate
individual-level SES from people's social media data. However, these
methods cannot work if researchers cannot get people's cyberspace data.
So we need to continue to introduce new data sources, especially some
widely recorded real-world users' behavior such as human mobility. In
this work, we leverage Smart Card Data (SCD) for public transport
systems, which records the temporal and spatial mobility behavior of
a large population of users. More specifically, we develop S2S, a deep
learning-based method for estimating people's SES based on their SCD.
Essentially, S2S models two types of SES-related features, namely the
temporal-sequential feature and general statistical feature, and leverages
deep learning for SES estimation. We evaluate our approach in an actual
dataset, Shanghai subway SCD, which involves millions of users. The
results show that the proposed method can use mobility data for SES
inference and clearly outperforms several state-of-art methods in terms
of various evaluation metrics.
For the next challenge, home location-based multiple Socioeconomic
Attributes (SEA) Inference is selected as an example problem of improving the accuracy of multiple attribute inference with the limited input
information. Inferring people's socioeconomic attributes (SEAs) including income, occupation and education level is an important problem for
applications like personalized recommendation and targeted advertising.
Some methods have been proposed to estimate SEAs, if users have rich
information like tweet contents through a long period. However, the
accuracy of these methods may be affected if researchers can only get
limited information of users (e.g., no or very few tweet content). Besides,
limited by the budget and time, researchers may have to estimate as many
as attributes with a limited data source. Multi-SEA-inference based on
limited information is even harder. Here we choose home location as an example of limited data sources. The longitude and latitude of home
location is often used as a supportive data source in UAI work. The accuracy of existing methods will be seriously affected if we only get users'
home location. In this work, we try to predict a person's income level,
family income level, occupation type and education level from his/her
home location. We collect people's home locations and socioeconomic
attributes through a survey involving 9 provinces and 85 cities of China.
Then we design new basic features by enriching home location with
the knowledge from real estate websites, government statistics websites,
online map services, etc. To learn a shared representation from input
features as well as attribute-specific representations for different SEAs,
we propose a multi-task learning method with attention mechanism,
which is called H2SEA. The factorization machine-based embedding
component of H2SEA can also generates more kinds of new interacted
features base on the input basic features. Extensive experiment results
show that the proposed H2SEA model outperforms alternative models
for SEA inference in terms of various evaluation metrics, such as AUC,
F-measure, and specificity.
The first two works are focusing on improving the performance of
UAI itself in different scenarios. In the final work, we expand the focus
to improve UAE tasks with the help of UAI. There are two kinds of
tasks relying on user attributes. For user-attribute-based (UAB) tasks,
researchers cannot carry out these tasks without user attributes. For
UAE, attributes are not necessary, but can be used to enhance their
performance.
From the first two challenges, we can see designing an accurate UAI
method requires a lot of works including data mining and model design.
UAE researchers usually would rather give up the benefits of UAI to
lower the cost, especially if the missing rates of attributes are too high or
there are many kinds of missing attributes.
In this thesis, we take collaborative filtering (CF) recommender system as a case study of UAE tasks. CF recommendation methods mainly
rely on user-item history interactions, which may suffer from the interaction sparsity problem. Therefore, some algorithms have been proposed
to leverage user/item attributes (e.g, user location or item brand) to
enhance the recommendation performance. However, in real-world
datasets, user/item attributes are often missing for reasons like privacy
concerns. CF recommender systems usually use unknown tags or zeros
as simple substitutes of missing attributes instead of leveraging UAI. In
the final work, we first conduct empirical experiments to quantify how
the recommending performance can be affected if we just use simple
substitutes for missing attributes. Then we discuss how to alleviate this
negative impact caused by the missing attributes by UAI. Although recommending and UAI are usually separately studied, we argue they can
be both seen as graph node representation learning tasks based on node
interactions. We develop a novel multi-task Attribute-Enhanced Graph
Convolutional Network (AEGCN) method, which enhances recommendation by auxiliary UAI tasks. The auxiliary attribute inference tasks can
send estimated attribute information to the recommending task, improving the recommendation performance with incomplete attributes. More
specifically, we define recommending and profiling in one user-item
bipartite graph. The two kinds of tasks share one graph convolutional
network (GCN) to learn the user/item-hidden representations. Then the
user/item representations are used for profiling while their combination
is used to predict users' preference on items. Extensive experimental
results on three real-world datasets demonstrate that AEGCN is simple
yet effective for missing attributes. Compared with attribute-enhanced
CF models, AEGCN achieves comparable performance when the attributes are complete, and significant improvements when the missing
rate increases.
This thesis chooses mobility-based SES prediction, home-based SEA
prediction and CF recommender system as case studies of three open
challenges of UAI. The three challenges studied in this thesis belong to a general effort to expand UAI from one-attribute-prediction to multiattribute-prediction and finally multi-task framework, which includes
both UAI and UAE tasks. | de |