Jan 11, 2021 Perpetual swap system failure report

Published on 11 Jan 2021

1. System downtime description:

From 2021-01-10 17:30:00 UTC, some users reported that they were unable to trade perpetual swaps.
After troubleshooting, we found that the cause of the failure was that some algo orders failed to be triggered and executed as expected — which initiated the server shard emergency mechanism and resulted in the failure of subsequent orders on the shard to be processed normally.

As of 2021-01-10 19:25:00 UTC, OKX has fixed the code logic and suspended the trading feature to recover the data.

As of 2021-01-10 21:15:00 UTC, OKX has recovered the relevant data and resumed its trading feature.

2. What work do we do to ensure the stability of the OKX platform?

OKX provides 24/7 trading services and has been dedicated to making its trading system ultra-stable and smooth. However, given the complexity and unexpected abnormalities of a trading system with high performance, we cannot guarantee that the system will work perfectly at all times. However, we have been working hard to improve system stability and minimize the probability of downtime from all aspects, including:

1). We strengthen engineering quality assurance and optimize the test system. The code for new functions can be launched only after it runs stably for a period of time in demo trading.
2). We upgrade architecture. The high availability of multiple servers in various regions is being realized, with less downtime caused by hardware and software problems.
3). Hot upgrades will be realized in a stateless way, which reduces the impact of the upgrade on user transactions.

3. How to get updates from OKX?

(1) Once we detect failures, we will immediately publish failure notifications on the Status page.
(2)If there is any system upgrade scheduled, we will publish a notification on the Status page and notify users via market and community channels (API user community + regular user community). Meanwhile, API users can be notified of the updates by subscribing to System/Status channel.