marco cognetta theoretically good with computers

recent updates

  • (2025-10) I've moved back to the US and started again full time at Google on the Gboard team.

  • (2025-08) Tokenization as Finite-State Transduction was accepted to Computational Linguistics. [Paper] [Code]

  • (2025-08) I gave an invited tutorial at DLT2025 on Subword Tokenization and Formal Language Theory. The slides and reading list can be found here.

  • (2025-08) Team 🍞 strikes again. We placed 2nd in the 2025 FedCSIS Chess Puzzle Prediction Challenge. [Paper] [Code]

  • (2025-06) Pitfalls, Subtleties, and Techniques in Automata-Based Subword-Level Constrained Generation was accepted at TokShop @ ICML. This was joint work with David Pohl, Junyoung Lee, and Naoaki Okazaki. [Paper]

  • (2025-03) Jamo-Level Subword Tokenization in Low-Resource Korean Machine Translation was accepted to LoResMT @ NAACL. This was led by Junyoung Lee and myself along with Sangwhan Moon and Naoaki Okazaki. [Paper] [Code]

  • (2024-10) I'm on Bluesky @mcognetta.bsky.social.

  • (2024-09) Team 🍞 (Tyler Woodruff, Oleg Filatov, and myself) won the IEEE BigData Cup: Predicting Chess Puzzle Difficulty Challenge. [Paper] [Code]

  • (2024-09) Distributional Properties of Subword Regularization (with VilΓ©m Zouhar and Naoaki Okazaki), was accepted to EMNLP. [Paper] [Code]

  • (2024-02) Two Counterexamples to Tokenization and the Noiseless Channel (with VilΓ©m Zouhar, Sangwhan Moon, and Naoaki Okazaki), was accepted at LREC-COLING 2024. [Paper] [Code]

  • (2023-07) I presented LotteryTickets.jl: Sparsify Your Flux Models at JuliaCon2023. The recording is here, and the slides and repo are here.

  • (2023-05) I presented Parameter-Efficient Korean Character-Level Language Modeling at EACL2023. The paper can be found here.

  • (2022-11) I've joined Mastodon on the sigmoid.social instance, which is focused on the ML/AI research community. My profile is @mc@sigmoid.social.

  • (2022-04) I've moved to Tokyo to join the Okazaki Lab for my PhD. I'll continue working part time at Google Tokyo.

recent posts (all posts)

about me

I am a Senior Software Engineer on the Gboard team at Google and a PhD student in NLP in the Okazaki Lab at the Tokyo Institute of Technology. Through most of my PhD, I was also a student researcher at Google Tokyo (also on the Gboard team). Prior to this, I was a software engineer at Google (again on Gboard). I did my MS in Computer Science at Yonsei University and my BS in Discrete Mathematics with a minor in Korean at Georgia Tech.

I am always open to chatting about interesting topics. Please feel free to send me an email ([lastname].[firstname]@gmail.com).

I am (not exhaustively) interested in:

  • Automata Theory

  • Scientific Computing

  • Languages (especially Korean and Esperanto)

  • Combinatorics

  • Open Source Software

  • High School Level Computer Science Education

μžκΈ°μ†Œκ°œ

μ €λŠ” ꡬ글 μ§€λ³΄λ“œ(Gboard)νŒ€μ˜ 개발자이고 λ„μΏ„κ³΅μ—…λŒ€ν•™(Tokyo Institute of Technology)의 Okazaki μ—°κ΅¬μ‹€μ˜ 박사과정 학생인 마λ₯΄μ½”μž…λ‹ˆλ‹€. 박사과정 λ™μ•ˆμ— ꡬ글 도쿄 μ§€λ³΄λ“œ νŒ€μ—μ„œ 박사과정 μ—°κ΅¬μ›μœΌλ‘œλ„ ν™œλ™ν•˜μ˜€μŠ΅λ‹ˆλ‹€. μ‘°μ§€μ•„ν…μ—μ„œ μ΄μ‚°μˆ˜ν•™μ„ μ „κ³΅ν•˜κ³  ν•œκ΅­μ–΄λ₯Ό λΆ€μ „κ³΅ν•˜μ˜€μœΌλ©° 이후에 μ—°μ„ΈλŒ€ν•™κ΅μ˜ 계산이둠 μ—°κ΅¬μ‹€μ—μ„œ 컴퓨터 κ³Όν•™ μ„μ‚¬ν•™μœ„λ₯Ό μ™„λ£Œν•˜μ˜€μŠ΅λ‹ˆλ‹€.

ν₯미둜운 μ£Όμ œκ°€ μžˆλ‹€λ©΄, μ–Έμ œλ“  λˆ„κ΅¬μ™€λ„ 이야기 λ‚˜λˆ„κ³  μ‹ΆμŠ΅λ‹ˆλ‹€. μ΄λ©”μΌλ‘œ μ—°λ½ν•΄μ£Όμ„Έμš” ([μ„±].[이름]@gmail.com).

μ œκ°€ 특히 μ’‹μ•„ν•˜λŠ” μ£Όμ œλ“€μ€ λ‹€μŒκ³Ό κ°™μŠ΅λ‹ˆλ‹€:

  • μ˜€ν† λ§ˆνƒ€ 이둠

  • μˆ˜μΉ˜ν•΄μ„ν•™

  • μ–Έμ–΄ (특히 ν•œκ΅­μ–΄ν•˜κ³  μ—μŠ€νŽ˜λž€ν† )

  • μ‘°ν•©λ‘ 

  • μ˜€ν”ˆμ†ŒμŠ€ μ†Œν”„νŠΈμ›¨μ–΄

  • 고등학ꡐ μˆ˜μ€€μ˜ μ»΄ν“¨ν„°κ³Όν•™κ΅μœ‘

λͺ¨λ“  κ²Œμ‹œλ¬Όμ€ 주둜 μ˜μ–΄λ₯Ό μ‚¬μš©ν•΄μ„œ μ“°κ³  μžˆμ§€λ§Œ, ν•œκ΅­μ–΄λ₯Ό μ—°μŠ΅ν•˜κΈ° μœ„ν•΄μ„œ 가끔 ν•œκΈ€ κ²Œμ‹œλ¬Όμ„ μž‘μ„±ν•˜κ±°λ‚˜ μ˜μ–΄ κ²Œμ‹œλ¬Όλ“€μ„ ν•œκ΅­μ–΄λ‘œ λ²ˆμ—­ν•΄μ„œ 올리고 μžˆμŠ΅λ‹ˆλ‹€.