Project highlights
About the client
About the client
GlossaryTech is an HR technology startup created with the goal to optimize the process of resume screening. With the development of the IT industry, recruitment has become a challenging process: new technologies emerge every month, and it gets increasingly difficult for recruiters to keep up with all the tech terms used in the ICT field. A tool that automatically extracts essential information from a candidate's profile and provides explanations of IT terminology could prove to be a lifesaver for numerous recruitment managers.
To develop such a sophisticated solution, GlossaryTech reached out to Flyaps. In close collaboration, we built two AI-powered solutions – a CV scanner and a Chrome extension that help to source tech talent and learn IT-related terms at the same time. Starting with the glossary of IT terms, we proceeded to the CV scanner website that automatically categorizes candidate skills and gives definitions of all the technologies mentioned.
Ultimately, we enhanced this tool by creating a Chrome extension that swiftly scans web pages such as LinkedIn. Within seconds, it classifies the mentioned technologies, determines whether they are associated with back-end, front-end, DevOps, or mobile development, and provides their easy-to-understand definitions. Now featured by Google, the extension is used by recruiters in Tesla, Microsoft, and Amazon.
What makes the tool so useful and how did we develop a plugin downloaded by more than 25,000 users? Discover the most interesting details.
CLIENT REQUEST
Our client needed a team of senior software engineers to build an AI-based Chrome extension that makes it extremely easy for IT recruiters to evaluate the expertise of candidates.
The IT industry is developing at an insane pace. EJB, JSF, JPA, JBoss – new libraries and tools emerge every month, creating new tech terms. This makes candidate screening a complicated process – even for experienced recruiters, it becomes more and more difficult to keep all the technical terms in mind.
This is why GlossaryTech was built – the extension is designed to help recruitment managers with the resume screening process. Within seconds of analyzing a web page, this tool automatically sorts the technologies mentioned by a developer, highlights them in the resume, and provides their definitions. This significantly simplifies and speeds up the process for recruiters to evaluate whether a candidate is suitable for the role.
problem
The solution had to solve a major problem
Understand human language data and recognize technical terms to be able to distinguish multi-word expressions (eg. Ruby on Rails, React Native).
approach
To make the text analysis accurate, we implemented an NLP algorithm using the NLTK Python library and developed our own tokenizer, achieving correct text analysis in mere seconds.
solution
The main challenge in automatic text analysis is developing an algorithm capable of extracting crucial data and accurately interpreting it. Algorithms may miss important information because of unstructured documents and variations in meaning. With tech terms, the data analysis becomes even harder – for example, how do you define if the word "go" used in a developer's resume is a verb or a programming language?
To tackle these challenges, we developed an NLP algorithm that uses our tech glossary as a reference to precisely comprehend the meaning and context of language. By relying on machine learning, NLP can quickly identify the necessary information and categorize it correctly, thus expediting the resume screening process.
To build this algorithm, we used the NLTK library, a top platform for constructing Python programs that work with human language data.
One of the main benefits of using NLTK is its versatility and flexibility – it allows performing a wide range of tasks, including tokenization, stemming, lemmatization, part-of-speech tagging, and sentiment analysis. To make it even more effective, we optimized it for the specifics of our plugin.
Although open-source libraries can work great in certain cases, they don’t allow full control over the analyzing process. To make GlossaryTech faster and more accurate in terms of data extraction, we decided to build our own tokenizer.
This tokenizer is designed for general purpose and works perfectly for the ICT domain. It is optimized for identifying particular entities or phrases, and it performs well not only with single words but also with multi-word phrases, such as "Ruby on Rails".
Developing our own tokenizer also allowed us to have greater control over the text preprocessing, resulting in a more consistent and higher-quality analysis.
Here are some things you can do with our custom tokenizer:
- >>> from funnel.tokenizer import Tokenizer
- >>> mwes = [
- .... ('Text analyzer', { 'description': 'Some cool stuff'}),
- .... ('Python', {'description': 'A programming language'}),
- .... ('Go', {'description': 'A programming language'})
- .... ]
- >>> tokenizer = Tokenizer(language='english', mwes=mwes)
- >>> text = 'This is a text analyzer written in Python. Let`s go write it in Go.'
- >>> tokens = tokenizer.tokenize(text)
- >>> print(tokens)
- [This, is, a, text analyzer, written, in, Python, ., Let`s, go, write, it, in, Go, .]
- >>> for t in tokens:
- print(t.text, t.span, *t.tags, *t.lemmas, t.is_match, t.info if t.info else '')
- This (0, 4) DET ['this'] False
- is (5, 7) VERB ['be'] False
- a (8, 9) DET ['a'] False
- text analyzer (10, 23) ADJ NOUN ['text'] ['analyzer'] True {'description': 'Some cool stuff'}
- written (24, 31) VERB ['write'] False
- in (32, 34) ADP ['in'] False
- Python (35, 41) NOUN ['Python'] True {'description': 'A programming language'}
- . (41, 42) PUNCT ['.'] False
- Let`s (42, 48) NOUN ['Let`s'] False
- go (48, 51) VERB ['go'] False {'description': 'A programming language'}
- write (52, 57) VERB ['write'] False
- it (58, 60) PRON ['it'] False
- in (61, 63) ADP ['in'] False
- Go (64, 66) NOUN ['Go'] True {'description': 'A programming language'}
- . (66, 67) PUNCT ['.'] False
We used open-source technology and built a custom tokenizer to develop an NLP algorithm, resulting in the ultimate AI-based recruiting assistant. This powerful tool significantly improves tech talent sourcing and assists in learning technical terms.
We've recently added support for Microsoft Edge and are now looking to improve the tokenizer even more. We've also started to receive some requests for custom plugin development from other companies looking to implement similar functionality.
Result
GlossaryTech became a plugin with a “Featured” badge and more than 25,000 downloads and 6,800 active monthly users.
The plugin is used by recruiters in global companies such as Amazon, Disney, Tesla, and Cisco.
With an IT glossary and term categorization, the plugin significantly accelerates candidate sourcing, saving recruiters valuable time.
I liked that they really thought about the product and it made me feel like there was someone on the other side of the call that was really putting themselves in my shoes. They inquire about what’s going on with my user traction and also monitor our Google Analytics. They really care about the software they produce in terms of how it affects my business and act as our internal developers by striving to learn about the product. I can rely on Flyaps as a kind of external CTO, and I give them all the freedom to decide what to do on the technical side.
Flyaps has hands-on expertise in implementing custom neural networks and AI-based tools to solve specific business problems.
Let’s discuss your project