Submission and Evaluation

Submission

Model Submission Requirements:

Please provide your solution to TBD. Each team can submit multiple times and we will only use the latest version you submit. Your models and scripts should be accessible and runnable.

Task I FinRL-DeepSeek for Stock Trading

Participants need to submit

A well-organized GitHub repository (and/or Colab) containing the code and the readme for solution implementation instructions.
A Hugging Face link to their data and trading agents.

Task II FinRL-AlphaSeek for Crypto Trading

Participants need to submit a well-organized GitHub repository containing all scripts, models, and any custom libraries used to implement the solution. A readme should be included for solution implementation instructions.

Task III Open FinLLM Leaderboard - Models with ReFT

Participants need to submit a Hugging Face link containing

the model that can be easily loaded,
scripts that load and inference with the model,
evaluation results for all tasks.

Task IV Open FinLLM Leaderboard - DRR

Participants need to submit a Hugging Face link containing

the model that can be easily loaded,
scripts that load and inference with the model.

Paper Submission Requirements:

Each team should submit short papers with 3 complimentary pages and up to 2 extra pages, including all figures, tables, and references. The paper submission is through the FinRLFM special track and should follow its instructions. The title should start with “FinRL Contest 2025 Task 1/2/3/4.”

Evaluation

For each task, the final ranking of participants will be determined by a weighted combination of model evaluation and paper assessment, with weights of 60% and 40% respectively.

Model Evaluation:

Task I FinRL-DeepSeek for Stock Trading

The performance of the model will be assessed using the following metrics:

Cumulative return: It is the total return generated by the trading strategy over a trading period.
Rachev ratio: reward-to-risk ratio representing the potential for extreme positive returns compared to the risk of extreme losses.
Max drawdown: It is the portfolio’s largest percentage drop from a peak to a trough in a certain time period, which provides a measure of downside risk.
Outperformance frequency against the baseline market index, especially during market downturns.

Task II FinRL-AlphaSeek for Crypto Trading

The performance of the model will be assessed by the following metrics:

Cumulative return. It is the total return generated by the trading strategy over a trading period.
Sharpe ratio. It takes into account both the returns of the portfolio and the level of risk.
Max drawdown. It is the portfolio’s largest percentage drop from a peak to a trough in a certain time period, which provides a measure of downside risk.

Task III Open FinLLM Leaderboard - Models with ReFT

The final score of the model is the average score of all tasks. The metrics are specified by the Open FinLLM Leaderboard.

Task IV Open FinLLM Leaderboard - DRR

The model evaluation is the average score of all tasks. The metrics include:

Accuracy: mainly used for questions that require specific answers, such as full expansions of abbreviations, yes-or-no questions, and financial formulas.
FactScore: mainly used for Q&A questions, such as CDM documentation, MOF detailed Q&A and XBRL term explanation.

Details are specified in the data page.

Paper Assessment:

The assessment of the paper will be conducted by invited experts and professionals. The judges will independently rate the data and model analysis, robustness and generalizability, innovation and creativity, organization and readability, each accounting for 20% of the qualitative assessment.

Note that since the paper submission will follow the timeline on the Special Track: Financial Reinforcement Learning and Foundation Models (FinRLFM) and the models can be submitted later than the paper, the discussion of results and performance will not be counted in the paper assessment.