I am working on a project to categorize transaction data, and try to understand the life style (income/expense) for our customers. Normally you don't need to work on the full historical data, recent n (n=12 or n=18) months data is enough to understand customers financial profile.
Storage and querying is not that difficult for transnational data (RDBMS do well to maintain a record base data set), and they are almost static, no update/delete. As someone else mentioned, the only consideration is how to partition the data, e.g. by date, by transaction type.
The challenge for us is how to define the categories, and then how to put transaction into correct category. e.g. for one transaction, we are interesting in,
- income or expense
- which type of merchant (groceries - to understand the household size or luxury stuff - to know the life style)
- is it a Bank elsewhere transaction (send money to other bank, or life insurance from other company. Banks always have insurance and superannuation business in Australia)
- Purchase sequence (to build the model which purchase sequence are most likely to take a home loan? eg. regularly save money, pay money to real estate agent)
- Life stage and event (plan wedding, plan to have baby, plan a oversea trip)
I will not talk about the details, but just let you know what does Big Data do, and what does not do. |