NLP/ML/Data Analytics | BuildingBlocksTech

Building blocks technologies proprietary

NLP/ML/Data Analytics

Objective of this effort:

Use Machine Learning to improve the quality and consistency of the collected data
Use Natural Language Processing to extract additional data such as salary information and geographic location
Use Data analytics to produce statistics, track trends and feed into an interactive visualization

Current status:

ML to classify content and improve data quality: 0% complete

NLP to separate desirable data from noise: ~70% complete

NLP to extract and categorize data elements: ~50% complete

Data Analytics to compute statistics and track trends: ~20% complete

Tasks:

ML to classify data as a real job posting vs. advertisements vs. links lists
- Data for training a supervised classifier is available
- Basic Naive Bayes classifier is suggested, but not required
- Data sample size is small ( < 40K samples)
- Data format is flat text in Json format
NLP to filter "static" text from the core, meaningful text of a job posting
- Data examples are available
- Data format is flat text in Json format
- One approach might be to compute similarity and then filter out the text that has a high degree of similarity
Figure out an NLP solution for accurately detecting and extracting geographic information
Figure out an NLP solution for accurately detecting and extracting salary information
Data Analytics: develop a solution for computing statistics and tracking trends across N simultaneous dimensions of the data