Submission and Evaluation

Submission

Model Submission Requirements:

Please provide your solution to TBD. Each team can submit multiple times and we will only use the latest version you submit. Your models and scripts should be accessible and runnable.

Task I FinRL-DeepSeek for Crypto Trading

Participants need to submit a well-organized GitHub repository containing all scripts, model weights, and any custom libraries used to implement the solution. A readme should be included for solution implementation instructions.

Task II FinGPT Agents in Real Life

Participants need to submit a Hugging Face link containing

the model that can be easily loaded,
scripts that load and inference with the model (including code for calling external tools).

Task III FinRL-DeFi

Participants need to submit a well-organized GitHub repository containing

complete code for training and inference (scripts, classes, and configurations)
trained model weights
An evaluation script (e.g., evaluate_agent.py) that runs inference on the provided test environment on the test dataset,
A short README file with setup and usage instructions
Optionally: Docker container with all dependencies for reproducibility

Paper Submission Requirements:

Each team should submit short papers with 3 complimentary pages and up to 2 extra pages, including all figures, tables, and references. The paper submission is through the special track: SecureFinAI and should follow its instructions. Please include “FinAI Contest 2025 Task 1/2/3” in your abstract.

Evaluation

For each task, the final ranking of participants will be determined by a weighted combination of model evaluation and paper assessment, with weights of 60% and 40% respectively.

Model Evaluation:

Task I FinRL-DeepSeek for Crypto Trading

The performance of the model will be assessed using the following metrics:

Cumulative return. It is the total return generated by the trading strategy over a trading period.
Sharpe ratio. It takes into account both the returns of the portfolio and the level of risk.
Max drawdown. It is the portfolio’s largest percentage drop from a peak to a trough in a certain time period, which provides a measure of downside risk.

The final model ranking will be based solely on the Sharpe Ratio.

Task II FinGPT Agents in Real Life

The final score of the model is the average score of all tasks. The metrics are specified in the starter kit.

Task III FinRL-DeFi

TBA.

Paper Assessment:

The assessment of the paper will be conducted by invited experts and professionals. The judges will independently rate the data and model analysis, robustness and generalizability, innovation and creativity, organization and readability, each accounting for 20% of the qualitative assessment.