Understanding the unique and broad challenges of big data is complex, but Ph.D. students and faculty at the Lerner College of Business and Economics are attempting to find the answer to one complex question: How can we identify and classify the real people and business opportunities behind these numbers?
Xin Ji is a third year Ph.D. student in Lerner’s financial services analytics (FSAN) Ph.D. program. After earning her master’s in hospitality business management in 2014, Ji found the first-of-its-kind FSAN program attractive because of its interdisciplinary nature in asking complex, intertwined questions.
“My curiosity in data science initially drew me to continue [on to] a doctoral degree in financial services analytics. I wanted to gain insights into data science and acquire practical toolkits that can help bridge data and business opportunities,” Ji said.
Under the guidance of Adam Fleishhacker, Ji performs her dissertation research focusing on identifying interpretable and accurate methods of clustering to generate and act on insights from vast amounts of data.
“One natural way of clustering observations, be [they of] customers, products or markets,” Ji said, “is to focus on attributes that make each observation unique and to then declare observations that share unique attributes as part of the same group. One can imagine these groups in various settings; example clusters might be interpreted as multi-infant travel passengers, billionaire-led hedge funds or gluten-free bakery shoppers.”
Ji has targeted three objectives:
- To algorithmically identify clusters that share some form of uniqueness in settings where multiple numeric and categorical data attributes are correlated,
- To verify that identified clusters mimic groups perceived by human classifiers in real-world decision settings, and
- To ensure both the scalability and interpretability of the clustering algorithm in large data settings.
“Our idea is to show how numerical transaction data (e.g. credit card transactions), geo-location data (e.g. addresses) and social media tags (e.g. Yelp tags like “breakfast” or “donut shop”) can be combined to identify competitors using an automated algorithm that outperforms all existing algorithms,” Ji said. “We have manually collected transaction data at many of these restaurants and have demonstrated a proof-of-concept using that data.”