Jekyll2018-11-06T15:34:29+00:00https://mcognetta.github.io//Theoretically Good With ComputersComputer Science, TheoreticallymcognettaSecond Conference Paper2018-11-01T03:50:00+00:002018-11-01T03:50:00+00:00https://mcognetta.github.io/2018/11/01/second-conference-paper<p>My second conference paper, entitled “Incremental Computation of Infix Probabilities for Probabilistic Finite Automata”, was accepted to EMNLP and I traveled to Brussels last week to present it. This was written with my advisor, Dr. Yo-Sub Han,
and an undergraduate intern (who now works for Google Korea), Soon Chan Kwon. In it, we solve an open problem of incrementally computing infix probabilities in PFAs. That is, suppose you are given some string $w = w_1w_2\ldots w_n$ and a PFA $\mathcal{P}$ and you want to know the infix probability of each prefix of $w$, is there a faster way then just some naive infix probability calculation on each of the prefixes individually? In other words, is there a way to use the infix probability calculation of $w_1w_2\ldots w_k$ to speed up the computation of the infix probability of $w_1w_2\ldots w_kw_{k+1}$?</p>
<p>The naive method involved computing a DFA for the language of all strings with $w$ as an infix (this can be done using KMP in linear time to the string length),
intersecting it with the PFA, and computing the weight of all strings in the intersection. For a string $w$ and a PFA with $|Q|$ states, this approach takes
$O(|w||Q|)^m)$ time, where $m$ is the optimal matrix multiplication constant. To get the infix probabilities of all prefixes of $w$, this approach can be used to get an $O(|w|(|w||Q|)^m)$ time algorithm.</p>
<p>We describe a procedure that converts regular expressions into matrix calculations on the transition matrices of a PFA. Using this and a state elimination technique,
we find an algorithm that (using dynamic programming) generates transition matrices that evaluate to the infix probability of the current infix being considered.
This approach takes $O(|w|^3 |Q|^m)$ time, which is asymptotically faster than the previous best time.</p>
<p>There are still several open problems, namely finding an incremental method when the PFA is replaced with a Probabilistic Context Free Grammar or doing this
incremental algorithm in an online manner (where we do not know all of $w$ ahead of time and instead receive it character by character).</p>
<p>The paper can be found <a href="http://aclweb.org/anthology/D18-1293">here</a> and the poster can be found <a href="https://mcognetta.github.io/assets/pdfs/emnlpposter.pdf">here</a>.</p>mcognettaMy second conference paper, entitled “Incremental Computation of Infix Probabilities for Probabilistic Finite Automata”, was accepted to EMNLP and I traveled to Brussels last week to present it. This was written with my advisor, Dr. Yo-Sub Han, and an undergraduate intern (who now works for Google Korea), Soon Chan Kwon. In it, we solve an open problem of incrementally computing infix probabilities in PFAs. That is, suppose you are given some string $w = w_1w_2\ldots w_n$ and a PFA $\mathcal{P}$ and you want to know the infix probability of each prefix of $w$, is there a faster way then just some naive infix probability calculation on each of the prefixes individually? In other words, is there a way to use the infix probability calculation of $w_1w_2\ldots w_k$ to speed up the computation of the infix probability of $w_1w_2\ldots w_kw_{k+1}$? The naive method involved computing a DFA for the language of all strings with $w$ as an infix (this can be done using KMP in linear time to the string length), intersecting it with the PFA, and computing the weight of all strings in the intersection. For a string $w$ and a PFA with $|Q|$ states, this approach takes $O(|w||Q|)^m)$ time, where $m$ is the optimal matrix multiplication constant. To get the infix probabilities of all prefixes of $w$, this approach can be used to get an $O(|w|(|w||Q|)^m)$ time algorithm. We describe a procedure that converts regular expressions into matrix calculations on the transition matrices of a PFA. Using this and a state elimination technique, we find an algorithm that (using dynamic programming) generates transition matrices that evaluate to the infix probability of the current infix being considered. This approach takes $O(|w|^3 |Q|^m)$ time, which is asymptotically faster than the previous best time. There are still several open problems, namely finding an incremental method when the PFA is replaced with a Probabilistic Context Free Grammar or doing this incremental algorithm in an online manner (where we do not know all of $w$ ahead of time and instead receive it character by character). The paper can be found here and the poster can be found here.First Conference Paper2018-04-29T03:50:00+00:002018-04-29T03:50:00+00:00https://mcognetta.github.io/2018/04/29/first-conference-paper<p>My first conference paper entitled “Online Stochastic Pattern Matching” was accepted to CIAA’18. This paper was written with my advisor, Dr. Yo-Sub Han.</p>
<p>In this paper, we extend a well known pattern matching problem where one is given a regular expression and a large text and wants to locate all matching substrings. Instead of a regular expression, we have a stochastic language modelled by a probabilistic finite automaton. We wish to find all substrings of a text that have probability greater than some threshold probability $p$.</p>
<p>We describe a method to solve this problem even when the text is given as a stream of characters and we are allowed some constant sized buffer (in the size of the text). This is possible as the length of the longest string with probability at least $p$ depends only on the distribution and not the size of the text. We describe a new upper bound on the longest string with probability at least $p$ that is several orders of magnitude better than the previously best known bounds in practice. We also provide several heuristics for further speeding up the problem and discuss an asymptotically faster algorithm when the stochastic language is modelled by a deterministic probabilistic finite automaton.</p>
<p><a href="https://mcognetta.github.io/assets/pdfs/ciaa_ospm.pdf">A draft of the paper can be found here.</a></p>mcognettaMy first conference paper entitled “Online Stochastic Pattern Matching” was accepted to CIAA’18. This paper was written with my advisor, Dr. Yo-Sub Han. In this paper, we extend a well known pattern matching problem where one is given a regular expression and a large text and wants to locate all matching substrings. Instead of a regular expression, we have a stochastic language modelled by a probabilistic finite automaton. We wish to find all substrings of a text that have probability greater than some threshold probability $p$. We describe a method to solve this problem even when the text is given as a stream of characters and we are allowed some constant sized buffer (in the size of the text). This is possible as the length of the longest string with probability at least $p$ depends only on the distribution and not the size of the text. We describe a new upper bound on the longest string with probability at least $p$ that is several orders of magnitude better than the previously best known bounds in practice. We also provide several heuristics for further speeding up the problem and discuss an asymptotically faster algorithm when the stochastic language is modelled by a deterministic probabilistic finite automaton. A draft of the paper can be found here.