Building blocks technologies proprietary
NLP/ML/Data Analytics
Objective of this effort:
-
Use Machine Learning to improve the quality and consistency of the collected data
-
Use Natural Language Processing to extract additional data such as salary information and geographic location
-
Use Data analytics to produce statistics, track trends and feed into an interactive visualization
Current status:
ML to classify content and improve data quality: 0% complete
NLP to separate desirable data from noise: ~70% complete
NLP to extract and categorize data elements: ~50% complete
Data Analytics to compute statistics and track trends: ~20% complete
Tasks:
-
ML to classify data as a real job posting vs. advertisements vs. links lists
-
Data for training a supervised classifier is available
-
Basic Naive Bayes classifier is suggested, but not required
-
Data sample size is small ( < 40K samples)
-
Data format is flat text in Json format
-
-
NLP to filter "static" text from the core, meaningful text of a job posting
-
Data examples are available
-
Data format is flat text in Json format
-
One approach might be to compute similarity and then filter out the text that has a high degree of similarity
-
-
Figure out an NLP solution for accurately detecting and extracting geographic information
-
Figure out an NLP solution for accurately detecting and extracting salary information
-
Data Analytics: develop a solution for computing statistics and tracking trends across N simultaneous dimensions of the data