theoretically good with computers

recent updates

(2025-11) I successfully defended my thesis, Practical and Theoretical Aspects of Tokenization, and will graduate in March.
(2025-10) I've moved back to the US and started again full time at Google on the Gboard team.
(2025-08) Tokenization as Finite-State Transduction was accepted to Computational Linguistics. [Paper] [Code]

(2025-08) I gave an invited tutorial at DLT2025 on Subword Tokenization and Formal Language Theory. The slides and reading list can be found here.
(2025-08) Team 🍞 strikes again. We placed 2nd in the 2025 FedCSIS Chess Puzzle Prediction Challenge. [Paper] [Code]
(2025-06) Pitfalls, Subtleties, and Techniques in Automata-Based Subword-Level Constrained Generation was accepted at TokShop @ ICML. This was joint work with David Pohl, Junyoung Lee, and Naoaki Okazaki. [Paper]
(2025-03) Jamo-Level Subword Tokenization in Low-Resource Korean Machine Translation was accepted to LoResMT @ NAACL. This was led by Junyoung Lee and myself along with Sangwhan Moon and Naoaki Okazaki. [Paper] [Code]
(2024-10) I'm on Bluesky @mcognetta.bsky.social.
(2024-09) Team 🍞 (Tyler Woodruff, Oleg Filatov, and myself) won the IEEE BigData Cup: Predicting Chess Puzzle Difficulty Challenge. [Paper] [Code]
(2024-09) Distributional Properties of Subword Regularization (with Vilém Zouhar and Naoaki Okazaki), was accepted to EMNLP. [Paper] [Code]
(2024-02) Two Counterexamples to Tokenization and the Noiseless Channel (with Vilém Zouhar, Sangwhan Moon, and Naoaki Okazaki), was accepted at LREC-COLING 2024. [Paper] [Code]
(2023-07) I presented LotteryTickets.jl: Sparsify Your Flux Models at JuliaCon2023. The recording is here, and the slides and repo are here.
(2023-05) I presented Parameter-Efficient Korean Character-Level Language Modeling at EACL2023. The paper can be found here.

(2022-04) I've moved to Tokyo to join the Okazaki Lab for my PhD. I'll continue working part time at Google Tokyo.

I am a Senior Software Engineer on the Gboard team at Google and a PhD student in NLP in the Okazaki Lab at the Tokyo Institute of Technology. Through most of my PhD, I was also a student researcher at Google Tokyo (also on the Gboard team). Prior to this, I was a software engineer at Google (again on Gboard). I did my MS in Computer Science at Yonsei University and my BS in Discrete Mathematics with a minor in Korean at Georgia Tech.

I am always open to chatting about interesting topics. Please feel free to send me an email ([lastname].[firstname]@gmail.com).

I am (not exhaustively) interested in:

Automata Theory
Scientific Computing
Languages (especially Korean and Esperanto)
Combinatorics
Open Source Software
High School Level Computer Science Education

자기소개

저는 구글 지보드(Gboard)팀의 개발자이고 도쿄공업대학(Tokyo Institute of Technology)의 Okazaki 연구실의 박사과정 학생인 마르코입니다. 박사과정 동안에 구글 도쿄 지보드 팀에서 박사과정 연구원으로도 활동하였습니다. 조지아텍에서 이산수학을 전공하고 한국어를 부전공하였으며 이후에 연세대학교의 계산이론 연구실에서 컴퓨터 과학 석사학위를 완료하였습니다.

흥미로운 주제가 있다면, 언제든 누구와도 이야기 나누고 싶습니다. 이메일로 연락해주세요 ([성].[이름]@gmail.com).

제가 특히 좋아하는 주제들은 다음과 같습니다:

오토마타 이론
수치해석학
언어 (특히 한국어하고 에스페란토)
조합론
오픈소스 소프트웨어
고등학교 수준의 컴퓨터과학교육

모든 게시물은 주로 영어를 사용해서 쓰고 있지만, 한국어를 연습하기 위해서 가끔 한글 게시물을 작성하거나 영어 게시물들을 한국어로 번역해서 올리고 있습니다.

marco cognetta theoretically good with computers

recent updates

recent posts (all posts)

about me

자기소개