IUT - News classification algorithms
Implementation and optimization of text classification algorithms

Context
Project conducted during the first semester of BUT Computer Science in collaboration with Manu Thuillier .
Objective
Implementation and optimization of automatic text classification algorithms to categorize news articles into 5 categories: politics, culture, environment/technology, economics, and sports.
Implementation
Weight based classification
First approach based on weighted scoring per category for each word:
- Word normalization (case, plurals, vowels)
- Binary search optimization
- Common word filtering
- ~65% accuracy
K-nearest neighbors
Second approach using vector representations:
- Articles represented as weight vectors
- Distance calculations between vectors
- Category determined by k closest neighbors vote
- ~75% accuracy with k=5
- Optimized complexity to O(n log n)
Results
Performance analysis detailed in the attached report showed KNN outperforming the weight-based approach while maintaining reasonable computational complexity through optimizations.