Background
I am a second-year Data Science master’s student at Rochester Institute of Technology, honorary recipient of the Fulbright Foreign Student Scholarship Grant.
My capstone porject is Just in Time Prediction of Unknown Classes is supervised by Professor Abu Islam where my objective is to build N-way shot learning with Siemase Network to predict unkown classes not seen in training time.
I also work with Professor Shanchieh Yang on exploring the intersection of machine learning and cybersecurity, specifically addressing the issue of imbalanced data in intrusion detection systems and proposing solutions to improve their performance.
Additionally, I am a member in the Computational Linguistics and Speech Processing Lab (CLaSP) leaded by Professor Cecillia O. Alm.
Research Interests
My research interests include multimodal machine learning, interactive machine learning, affective computing, natural language/speech processing, federated learning, and Reinforcement Learning.
Education
Master’s degree in Data Science2021 - in progress
Rochester Institute of TechnologyRochester, NY, USA
Master’s degree in Industrial Engineering and Systems of Management2017 - 2019
American University of ArmeniaYerevan, Armenia
Bachelor in Theory of Economics2010 - 2016
Armenian State University of EconomicsYerevan, Armenia
Work Experience
Research AssistantJan 2022 - Present
Department of Computer Engineering, RITRochester, NY, USA
- Conducted research on machine learning in cybersecurity and examined the impact of imbalanced data on the performance of intrusion detection systems
- Proposed solutions to address the imbalance problem including undersampling/oversampling using SMOTE
- Used preprocessing techniques, including PCA, and applied clustering techniques, such as DBSCAN to merge minority classes
Data Science InterMay 2022 - Aug 2022
Plat.aiLos Angeles, CA, USA
- Recommending a set of attributes that would maximize the model’s accuracy with statistical analysis
- Found the clients who will likely repay loans by using random forest and logistic regression models with 0.85 AUC score
- Developed Machine Learning deployment pipeline using Docker and REST API, thus automating the data preprocessing, data cleaning, and feature extraction and optimizing prediction in real time
Machine Learning EngineerSep 2019 - Aug 2021
BetConstructYerevan,Armenia
- Reduced traffic load by 30% by finding sessions with not-human like behavior applying Hierarchical clustering methods
- Applied supervised tree-based models such as XGBoost and Random Forest to predict bots with an accuracy of 93%
- Data Analysis of user sessions using Elasticsearch and Kibana, provided reports on user groups overloading the network traffic
Teaching Associate of Programming for Data Science CoursesJuly 2019 - Jun 2020
American University of ArmeniaYerevan, Armenia
- Created a set of code slides and problem sets as extracurricular materials for Programming for Data Science Course
- Assisted students in building Shiny dashboards and optimizing codes for their course projects using Python and R
- Led problem-solving sessions with classes of more than 30 students and organized additional office hours to discuss topics
Data Science InternFeb 2019 - May 2020
Armenian National SDG Lab (United Nations)Yerevan, Armenia
- Extracted over 100,000 tourist reviews about Armenia using data extraction tools
- Topic Extraction and Sentiment Analysis using NLP tools, supported the team in the developing the Travelinsights.ai
- Presented the discovered tourist-related discovered problems in different fields to the Tourism Committee of Armenia
Selected Projects
-
Dialogue and Text Summarization on SAMSUM, XSUM datasets: Developed a text/dialogue summarization project utilizing BART and PEGASIS transformers pre-trained models. Conducted initial text processing and fine-tuned the weights of model and improved performance Deployed the models using Hugging Face for implementation.
-
Refining Duplicate Contribution Detection in Pull-Based Projects:. Implemented a duplicate contribution detection algorithm in pull- based projects by calculating cosine similarity of code changes, titles, and descriptions of pull requests. Optimized the algorithm using recall-rate method for efficient duplicate detection results
-
Lyrics Analyzer and Song Adviser using Machine-Learning in Python:. Preprocessed 7000 songs text data using NLTK, TextBlob and used TF-IDF for feature extraction and Naive Bayes classifier implementation. Developed a song recommendation program using Selenium and sentiment analysis based on user’s mood and preferred genre.
Skills
- Languages: English, Armenian, Russian
- Sofware and Programming:Python R, JAVA, SQL, MongoDB
- Libraries: Scikit-learn, Tensorflow, Keras, Pytorch, OpenAI Gym, nltk, spacy
- Tools: Pandas, Matplotlib, NumPy, Seaborn, Matlab, Jupyter Notebook, conda, pipenv, cookiecutter, FLask
- Developer Tools: Git, Docker, AWS, Jira, RestAPI, FastAPI, Linux
- Deep Learning Convolutional Neural Networks, Recurrent Neural Networks, LSTMs, Transformers, Dropout, Batch- Norm, Xavier/He initialization
- Statistical Models: Regression (Linear, Time-Series, Logistic, Ridge, Lasso, Elastic Net), Decision Tree based models Hypothesis Testing, A/B testing, SVM, PCA, LDA, k-NN, K-means Clustering, Conjoint Analysis