TannerGladson/lichess-frames
The dataset consists of chess‑board states and associated move sequences extracted from games downloaded from lichess.org. Each game is parsed into multiple records; each record starts with a FEN string followed by 1‑10 SAN moves. The data are intended for training the ChessRoberta model and have not been filtered, so they may not be optimal for high‑performance chess modelling.
Description
Dataset Overview
Dataset Name
Dataset Name
Source
The dataset’s game sequences are sourced from https://database.lichess.org/.
Content
The dataset contains chess board states and their associated move sequences. PGN files were downloaded from the Lichess database; each game is parsed into multiple records, each beginning with a FEN string followed by 1‑10 SAN moves.
Intended Use
The data will be used to train the ChessRoberta model, but because they are unfiltered they may not be suitable for building a high‑performance chess engine.
Structure
Each record includes the following fields:
text(Str): String containing the FEN and multiple SAN moves.pgn_start(Int): Index of the first SAN within thetextstring.num_sans(Int): Number of half‑moves present in thetextstring.num_prior_moves(Int): Number of half‑moves that occurred before the FEN (one move for each side counts as two moves).game_id(Str): Lichess identifier for the source game.
Special markers are used as delimiters in the text field:
PGN_START: "~"MOVE_SEP: ">"
Example Record
{
text: "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1~e4>e6>d3>d5>Nd2",
pgn_start: 57,
num_sans: 5,
num_prior_moves: 0,
game_id: "https://lichess.org/PwE2cWn3"
}
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.