Troubleshooting
Debugging running task in distributed training can be difficult. We provide some general suggestions in this page.
Debug Log
You can set the logging verbosity with LOG_LEVEL
environment variable when
launching PERSIA. The value of LOG_LEVEL
can be debug
, info
, warn
, or
error
. The default value is info
.
Grafana Metrics
PERSIA integrates Prometheus to report useful metrics during training phase. This includes current embedding staleness, current total embedding size, the time cost of each stage during an iteration, and more. See Monitoring for more details.