marco cognetta theoretically good with computers

Lichess Combined Puzzle-Game Database

TLDR: A database containing Lichess puzzles joined with their complete game information can be found here. A direct link to download the complete database is here: MEGA link.

Recently, I have been wrapping the Lichess API in Julia (there are a few existing options in other languages, if you are interested). To test that everything is working well, I wanted to try a few small projects. The first one I am doing is related to the Puzzle Database, which contains ~2.9 million puzzles generated from real Lichess games (the repo is here, and an interesting article about automatic puzzle generation using Chess.jl can be found here).

The puzzle database provides a URL specifying the game that a puzzle was derived from as well as what ply the puzzle starts at, but it does not contain the actual game information (PGN, player ratings, result, etc.). All of the game move information could be extracted from the Lichess Game Database, but there are more than 3.6 billion games (939 GB on disk) at the time of writing. Furthermore, the game database contains only a standard PGN entry for each game, but the API exposes more detailed game information, such as clock times per move, evaluations, Lichess-specific metadata, and commentary.

I elected to use the Lichess API to batch export games by ID. This allows you to extract a batch of games directly from the Lichess database, and was a nice test for my Julia API wrapper.

Unfortunately, the batch-extraction endpoint only allows batches of 300 games and Lichess requests that you only make one API call at a time. For a sense of scale, if each call takes 5 seconds, it would take more than 13 hours to pull all of the games (it actually ended up taking substantially longer – over 30 hours). So that others don’t also have to go through this, I joined the games with the puzzle database and published it to a public Github repo. It contains all 2.9 million puzzles as well as their games, which were exported with all of the API flags enabled. They are provided in a single bz2-compressed ndjson file. Each line contains a JSON object with two top-level fields, game and puzzle, containing the game information extracted from the API and the puzzle information extracted from the puzzle database (with field names being derived from the puzzle database’s headers), respectively. An example entry is given below:

{"puzzle":{"Themes":"advantage attackingF2F7 fork long opening","OpeningFamily":"","Popularity":"25","NbPlays":"28","PuzzleId":"00LZf","FEN":"r1b1kb1r/pppp1ppp/2n1p3/3nN3/3P2q1/4B3/PPP1BPPP/RN1Q1RK1 b kq - 9 8","Moves":"d5e3 f2e3 g4g5 e5f7 g5e3 g1h1","Rating":"2205","RatingDeviation":"91","GameUrl":"https://lichess.org/sVgQxr8Q/black#16"},"game":{"analysis":[{"eval":0},{"eval":21},{"eval":0},{"judgment":{"name":"Blunder","comment":"Blunder. d5 was best."},"variation":"d5 Nc3 Nf6 e5 Nfd7 f4 c5 Nf3 Nc6 Be3","eval":175,"best":"d7d5"},{"judgment":{"name":"Mistake","comment":"Mistake. Bd3 was best."},"variation":"Bd3 Qd8 Nc3 Nf6 Nf3 d6 a3 Nfd7 Qe2 Be7","eval":31,"best":"f1d3"},{"eval":0},{"eval":9},{"judgment":{"name":"Inaccuracy","comment":"Inaccuracy. Nf6 was best."},"variation":"Nf6 Bd3 Qc6 c4 Be7 Nc3 d6 O-O O-O h3","eval":66,"best":"g8f6"},{"eval":61},{"eval":33},{"eval":33},{"variation":"d5 Nc3 Bd6 Ne2 f6 Nf4 Bxf4 h3 Bh2+ Kxh2","eval":86,"best":"d7d5"},{"judgment":{"name":"Inaccuracy","comment":"Inaccuracy. c4 was best."},"variation":"c4 d5","eval":-18,"best":"c2c4"},{"eval":0},{"judgment":{"name":"Blunder","comment":"Blunder. Nc3 was best."},"variation":"Nc3 Nxe3 fxe3 Qg6 Nb5 Kd8 Nc3 f6 a4 Qh6 d5 exd5 Qxd5 Nb4","eval":-239,"best":"b1c3"},{"judgment":{"name":"Blunder","comment":"Blunder. Qe4 was best."},"variation":"Qe4 Nc4","eval":391,"best":"g4e4"},{"judgment":{"name":"Blunder","comment":"Blunder. fxe3 was best."},"variation":"fxe3","eval":-392,"best":"f2e3"},{"eval":-385},{"eval":-383},{"eval":-364},{"eval":-364},{"eval":-373},{"eval":-412},{"eval":-417},{"eval":-388},{"judgment":{"name":"Inaccuracy","comment":"Inaccuracy. g5 was best."},"variation":"g5","eval":-287,"best":"g7g5"},{"judgment":{"name":"Inaccuracy","comment":"Inaccuracy. c4 was best."},"variation":"c4","eval":-373,"best":"c2c4"},{"judgment":{"name":"Inaccuracy","comment":"Inaccuracy. Ba3 was best."},"variation":"Ba3","eval":-277,"best":"f8a3"},{"eval":-271},{"eval":-222},{"judgment":{"name":"Mistake","comment":"Mistake. Rac1 was best."},"variation":"Rac1 g5","eval":-440,"best":"a1c1"},{"eval":-427},{"eval":-519},{"eval":-537},{"eval":-536},{"eval":-529},{"eval":-559},{"eval":-561},{"eval":-724},{"eval":-737},{"eval":-729},{"eval":-754},{"eval":-759},{"eval":-675},{"eval":-692},{"eval":-716},{"eval":-784},{"eval":-769},{"eval":-768},{"eval":-761},{"eval":-792},{"eval":-762},{"eval":-772},{"eval":-670},{"eval":-745},{"eval":-700},{"eval":-738},{"eval":-685},{"eval":-692},{"eval":-681},{"eval":-793},{"eval":-738},{"eval":-730},{"eval":-700},{"eval":-714},{"eval":-660},{"eval":-661},{"eval":-581},{"eval":-580},{"eval":-580},{"eval":-678},{"eval":-661},{"eval":-706},{"eval":-648},{"eval":-736},{"eval":-691},{"eval":-700},{"eval":-641},{"eval":-715},{"eval":-722},{"eval":-710},{"eval":-681},{"eval":-720},{"judgment":{"name":"Blunder","comment":"Blunder. Kd6 was best."},"variation":"Kd6","eval":-57,"best":"d7d6"}],"createdAt":1625690189765,"variant":"standard","status":"outoftime","pgn":"[Event \"Rated Blitz game\"]\n[Site \"https://lichess.org/sVgQxr8Q\"]\n[Date \"2021.07.07\"]\n[White \"Arxiv_bb\"]\n[Black \"MangoFruit\"]\n[Result \"0-1\"]\n[UTCDate \"2021.07.07\"]\n[UTCTime \"20:36:29\"]\n[WhiteElo \"1541\"]\n[BlackElo \"1528\"]\n[WhiteRatingDiff \"-6\"]\n[BlackRatingDiff \"+6\"]\n[Variant \"Standard\"]\n[TimeControl \"300+0\"]\n[ECO \"C00\"]\n[Opening \"French Defense: Normal Variation\"]\n[Termination \"Time forfeit\"]\n[Annotator \"lichess.org\"]\n\n1. d4 { [%eval 0.0] [%clk 0:05:00] } 1... e6 { [%eval 0.21] [%clk 0:05:00] } 2. e4 { [%eval 0.0] [%clk 0:04:59] } { C00 French Defense: Normal Variation } 2... Qh4?? { (0.00 → 1.75) Blunder. d5 was best. } { [%eval 1.75] [%clk 0:04:56] } (2... d5 3. Nc3 Nf6 4. e5 Nfd7 5. f4 c5 6. Nf3 Nc6 7. Be3) 3. Nf3? { (1.75 → 0.31) Mistake. Bd3 was best. } { [%eval 0.31] [%clk 0:04:56] } (3. Bd3 Qd8 4. Nc3 Nf6 5. Nf3 d6 6. a3 Nfd7 7. Qe2 Be7) 3... Qxe4+ { [%eval 0.0] [%clk 0:04:45] } 4. Be3 { [%eval 0.09] [%clk 0:04:53] } 4... Nc6?! { (0.09 → 0.66) Inaccuracy. Nf6 was best. } { [%eval 0.66] [%clk 0:04:40] } (4... Nf6 5. Bd3 Qc6 6. c4 Be7 7. Nc3 d6 8. O-O O-O 9. h3) 5. Bd3 { [%eval 0.61] [%clk 0:04:51] } 5... Qg4 { [%eval 0.33] [%clk 0:04:37] } 6. O-O { [%eval 0.33] [%clk 0:04:47] } 6... Nge7 { [%eval 0.86] [%clk 0:04:28] } 7. Be2?! { (0.86 → -0.18) Inaccuracy. c4 was best. } { [%eval -0.18] [%clk 0:04:46] } (7. c4 d5) 7... Nd5 { [%eval 0.0] [%clk 0:04:24] } 8. Ne5?? { (0.00 → -2.39) Blunder. Nc3 was best. } { [%eval -2.39] [%clk 0:04:42] } (8. Nc3 Nxe3 9. fxe3 Qg6 10. Nb5 Kd8 11. Nc3 f6 12. a4 Qh6 13. d5 exd5 14. Qxd5 Nb4) 8... Nxe3?? { (-2.39 → 3.91) Blunder. Qe4 was best. } { [%eval 3.91] [%clk 0:04:21] } (8... Qe4 9. Nc4) 9. Bxg4?? { (3.91 → -3.92) Blunder. fxe3 was best. } { [%eval -3.92] [%clk 0:04:37] } (9. fxe3) 9... Nxd1 { [%eval -3.85] [%clk 0:04:18] } 10. Rxd1 { [%eval -3.83] [%clk 0:04:35] } 10... h5 { [%eval -3.64] [%clk 0:04:15] } 11. Nxc6 { [%eval -3.64] [%clk 0:04:34] } 11... bxc6 { [%eval -3.73] [%clk 0:04:13] } 12. Bf3 { [%eval -4.12] [%clk 0:04:32] } 12... Rb8 { [%eval -4.17] [%clk 0:04:04] } 13. b3 { [%eval -3.88] [%clk 0:04:30] } 13... d5?! { (-3.88 → -2.87) Inaccuracy. g5 was best. } { [%eval -2.87] [%clk 0:04:01] } (13... g5) 14. Nd2?! { (-2.87 → -3.73) Inaccuracy. c4 was best. } { [%eval -3.73] [%clk 0:04:27] } (14. c4) 14... Ba6?! { (-3.73 → -2.77) Inaccuracy. Ba3 was best. } { [%eval -2.77] [%clk 0:03:57] } (14... Ba3) 15. c4 { [%eval -2.71] [%clk 0:04:09] } 15... Be7 { [%eval -2.22] [%clk 0:03:52] } 16. cxd5? { (-2.22 → -4.40) Mistake. Rac1 was best. } { [%eval -4.4] [%clk 0:04:04] } (16. Rac1 g5) 16... cxd5 { [%eval -4.27] [%clk 0:03:52] } 17. Nf1 { [%eval -5.19] [%clk 0:03:56] } 17... Kd7 { [%eval -5.37] [%clk 0:03:48] } 18. Ng3 { [%eval -5.36] [%clk 0:03:53] } 18... h4 { [%eval -5.29] [%clk 0:03:45] } 19. Nh5 { [%eval -5.59] [%clk 0:03:41] } 19... h3 { [%eval -5.61] [%clk 0:03:43] } 20. gxh3 { [%eval -7.24] [%clk 0:03:18] } 20... g6 { [%eval -7.37] [%clk 0:03:40] } 21. Ng3 { [%eval -7.29] [%clk 0:03:07] } 21... Rxh3 { [%eval -7.54] [%clk 0:03:37] } 22. Bg2 { [%eval -7.59] [%clk 0:03:05] } 22... Rh7 { [%eval -6.75] [%clk 0:03:36] } 23. Nf1 { [%eval -6.92] [%clk 0:02:59] } 23... Rh4 { [%eval -7.16] [%clk 0:03:16] } 24. Ng3 { [%eval -7.84] [%clk 0:02:56] } 24... Bf6 { [%eval -7.69] [%clk 0:03:14] } 25. Rac1 { [%eval -7.68] [%clk 0:02:34] } 25... Rxd4 { [%eval -7.61] [%clk 0:03:09] } 26. Rxd4 { [%eval -7.92] [%clk 0:02:31] } 26... Bxd4 { [%eval -7.62] [%clk 0:03:09] } 27. Rd1 { [%eval -7.72] [%clk 0:02:27] } 27... Be5 { [%eval -6.7] [%clk 0:03:06] } 28. Ne4 { [%eval -7.45] [%clk 0:02:18] } 28... Be2 { [%eval -7.0] [%clk 0:02:48] } 29. Rd2 { [%eval -7.38] [%clk 0:02:16] } 29... Bg4 { [%eval -6.85] [%clk 0:02:48] } 30. h3 { [%eval -6.92] [%clk 0:02:02] } 30... Bf5 { [%eval -6.81] [%clk 0:02:45] } 31. Nc5+ { [%eval -7.93] [%clk 0:01:53] } 31... Kd6 { [%eval -7.38] [%clk 0:02:36] } 32. Ne4+ { [%eval -7.3] [%clk 0:01:41] } 32... Kc6 { [%eval -7.0] [%clk 0:02:02] } 33. Ng3 { [%eval -7.14] [%clk 0:01:14] } 33... Bc3 { [%eval -6.6] [%clk 0:01:53] } 34. Rd1 { [%eval -6.61] [%clk 0:00:49] } 34... Kd7 { [%eval -5.81] [%clk 0:01:50] } 35. Nxf5 { [%eval -5.8] [%clk 0:00:47] } 35... gxf5 { [%eval -5.8] [%clk 0:01:49] } 36. Rc1 { [%eval -6.78] [%clk 0:00:40] } 36... Bb2 { [%eval -6.61] [%clk 0:01:43] } 37. Rc2 { [%eval -7.06] [%clk 0:00:38] } 37... Be5 { [%eval -6.48] [%clk 0:01:41] } 38. Kf1 { [%eval -7.36] [%clk 0:00:31] } 38... Rh8 { [%eval -6.91] [%clk 0:01:38] } 39. Re2 { [%eval -7.0] [%clk 0:00:22] } 39... Bd6 { [%eval -6.41] [%clk 0:01:34] } 40. Rc2 { [%eval -7.15] [%clk 0:00:07] } 40... Bb4 { [%eval -7.22] [%clk 0:01:32] } 41. Re2 { [%eval -7.1] [%clk 0:00:02] } 41... Bc5 { [%eval -6.81] [%clk 0:01:31] } 42. Rc2 { [%eval -7.2] [%clk 0:00:01] } 42... Bxf2?? { (-7.20 → -0.57) Blunder. Kd6 was best. } { [%eval -0.57] [%clk 0:01:31] } { Black wins on time. } (42... Kd6) 0-1\n\n\n","id":"sVgQxr8Q","clock":{"increment":0,"totalTime":300,"initial":300},"rated":true,"players":{"white":{"rating":1541,"analysis":{"mistake":2,"acpl":64,"inaccuracy":2,"accuracy":58,"blunder":2},"user":{"name":"Arxiv_bb","id":"arxiv_bb"},"ratingDiff":-6},"black":{"rating":1528,"analysis":{"mistake":0,"acpl":64,"inaccuracy":3,"accuracy":67,"blunder":3},"user":{"name":"MangoFruit","id":"mangofruit"},"ratingDiff":6}},"winner":"black","moves":"d4 e6 e4 Qh4 Nf3 Qxe4+ Be3 Nc6 Bd3 Qg4 O-O Nge7 Be2 Nd5 Ne5 Nxe3 Bxg4 Nxd1 Rxd1 h5 Nxc6 bxc6 Bf3 Rb8 b3 d5 Nd2 Ba6 c4 Be7 cxd5 cxd5 Nf1 Kd7 Ng3 h4 Nh5 h3 gxh3 g6 Ng3 Rxh3 Bg2 Rh7 Nf1 Rh4 Ng3 Bf6 Rac1 Rxd4 Rxd4 Bxd4 Rd1 Be5 Ne4 Be2 Rd2 Bg4 h3 Bf5 Nc5+ Kd6 Ne4+ Kc6 Ng3 Bc3 Rd1 Kd7 Nxf5 gxf5 Rc1 Bb2 Rc2 Be5 Kf1 Rh8 Re2 Bd6 Rc2 Bb4 Re2 Bc5 Rc2 Bxf2","opening":{"name":"French Defense: Normal Variation","eco":"C00","ply":3},"perf":"blitz","speed":"blitz","lastMoveAt":1625690725840}}

This database retains the same Creative Commons CC0 license as the source Lichess databases.

A future post will detail the Julia Lichess API wrapper and some of the projects I have been using it for.