top of page

Building blocks technologies proprietary

NLP/ML/Data Analytics

Objective of this effort: 

  • Use Machine Learning to improve the quality and consistency of the collected data

  • Use Natural Language Processing to extract additional data such as salary information and geographic location

  • Use Data analytics to produce statistics, track trends and feed into an interactive visualization

Current status:

              ML to classify content and improve data quality:     0% complete

              NLP to separate desirable data from noise:                  ~70% complete

                NLP to extract and categorize data elements:              ~50% complete

                Data Analytics to compute statistics and track trends: ~20% complete                            

Tasks:

  • ML to classify data as a real job posting vs. advertisements vs. links lists

    • Data for training a supervised classifier is available

    • Basic Naive Bayes classifier is suggested, but not required

    • Data sample size is small ( < 40K samples)

    • Data format is flat text in Json format

  • NLP to filter "static" text from the core, meaningful text of a job posting 

    • Data examples are available

    • Data format is flat text in Json format

    • One approach might be to compute similarity and then filter out the text that has a high degree of similarity

  • Figure out an NLP solution for accurately detecting and extracting geographic information

  • Figure out an NLP solution for accurately detecting and extracting salary information

  • Data Analytics:  develop a solution for computing statistics and tracking trends across N simultaneous dimensions of the data

bottom of page