Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise in-game atad driver for size #1050

Open
wants to merge 30 commits into
base: master
Choose a base branch
from

Conversation

daf0x
Copy link

@daf0x daf0x commented Aug 8, 2023

Pull Request checklist

Note: these are not necessarily requirements

  • I reformatted the code with clang-format
  • I checked to make sure my submission worked:
    ▷ I tried a few games and they all still seem to work the same.
  • I am the author of submission or have permission from the original author
  • Requires update of the PS2SDK or other dependencies
  • Others (please specify below)

Pull Request description

I've played around with the in-game ATAD driver code a bit and cleaned it up a bit and optimised it a tad, focusing on making it smaller. This probably looks like a rather big change, but I've tried to keep the changes small in commits that can be assessed individually.

Overview of the biggest changes

  • Move GameStar support to a browser-time choice
    I.e. there are now three in-game drivers: hdpro_atad, gamestar_atad, ps2atad, and OPL will select the appropriate one when launching a game.

  • Drop support for SMART commands.
    So the way it seems to work is you send ATA_C_SMART as a command, and then you put the actual SMART command as sub command. But the only supported sub command is ATA_C_SMART_ENABLE, so you can enable SMART, but then you can't actually do anything with it like read the status? This seems unlikely to be needed.

  • Remove code for commands that cannot occur.
    There was some code checking for command types that are no longer in the list of supported commands (ata_cmd_table).

  • Re-use the upper bits of the type field to encode time-out and direction
    This saves on checking conditions in code, as we can encode them in the table at compile time.

  • Re-write the busy-wait to use binary operations.
    Old delay | New delay
    ----------+----------
            0 |         0
            0 |         0
            0 |         0
            0 |         0
            0 |         0
            0 |         0
            0 |         0
            0 |         0
            0 |       128
            0 |       128
          100 |       128
          100 |       128
          100 |       256
          100 |       256
          100 |       256
          100 |       256
          100 |      1024
          100 |      1024
          100 |      1024
          100 |      1024
         1000 |      2048
         1000 |      2048
         1000 |      2048
         1000 |      2048
         1000 |      6144
         1000 |      6144
         1000 |      6144
         1000 |      6144
         1000 |     12288
         1000 |     12288
        10000 |     12288
        10000 |     12288
        10000 |     32768
        10000 |     32768
        10000 |     32768
        10000 |     32768
        10000 |     65536
        10000 |     65536
        10000 |     65536
        10000 |     65536
       100000 |    163840
       100000 |    163840
       100000 |    163840
       100000 |    163840
       100000 |    327680
       100000 |    327680
       100000 |    327680
       100000 |    327680
       100000 |    786432
       100000 |    786432
      1000000 |    786432
      1000000 |    786432
      1000000 |   1572864
      1000000 |   1572864
      1000000 |   1572864
      1000000 |   1572864
      1000000 |
      1000000 |
      1000000 |
      1000000 |
      1000000 |
      1000000 |
      1000000 |
      1000000 |
      1000000 |
      1000000 |
      1000000 |
      1000000 |
      1000000 |
      1000000 |
      1000000 |
      1000000 |
      1000000 |
      1000000 |
      1000000 |
      1000000 |
      1000000 |
      1000000 |
      1000000 |
      1000000 |
      1000000 |
    ----------+----------
     32111000 |  11884032
    
    The main goal was to remove the `switch` and replace it with something that can be computed without branching. This new version has a little bit slower scope and gives up a bit sooner, but do we really need such a long time out? All in all this reduces the total time-out from ~30s to ~10s, which still seems plenty of time for a hard disk to do its thing.

Result

All-in-all optimising just the atad-driver file alone saves 688 bytes:
Open-PS2-Loader/modules/iopcore/cdvdman/hdd_cdvdman.irx:
Size before: 39.084 bytes
Size after: 38.396 bytes

daf0x added 29 commits August 7, 2023 12:31
We know that the PS2 only has 32MiB of ram, so we cannot read more than
65535 anyway. Therefore we can get away with just 2 bytes to store the
sector count.

Also adjust the type and dir fields to use an appropriately sized type.
This function has been re-written for simpler and faster
codegen, but with similar-ish timings to the original.

Checked that less assembly instructions were generated with
godbolt gcc mips 5.4 and 13.1.0.
the table is small enough that this optimization doesn't matter, and
now we get to save a bit of code since we don't need to check the exit
condition and jump out of the loop.
You can enable SMART, but it can't be disabled, nor can
any read/write commands be issued, so it's not useful
anyway.
…this condition:

If the command _was_ ATA_C_SMART it willl no longer be found in the table,
and we enter this check and still return ATA_RES_ERR_NOTREADY. If it was
any other command we would have returned an error, but in that case the
caller would need to error out anyway, so that can be assumed to never
happen in practice or games would likely not function properly due to
missing ATA commands.
This code is unreachable as there is a return on line 3, which
appears to have been there since b30ce3e
… false:

However we probably do want the harddisk to shutdown properly when this
function is called.
We can never reach the if() with res != 0, except from where we add
the goto in this commit.
Even some none-data commands set an alarm on ata_alarm_cb(), but in the
case of a time-out the alarm would not be properly clean up nor would the
LED be turned off.

As a happy side-effect this saves 56 bytes.
@AKuHAK
Copy link
Member

AKuHAK commented Aug 9, 2023

For formatting changes, could you please do it in accordance with clang-format? If you really need to change how clang is formatting your code you can either: change .clang-fromat file according to your vision, or add // clang-format 0ff and //clang-format on macro on blocks that needs such change

@rickgaiser
Copy link
Member

In the current messy situation we have, these changes look good. However, I think the builtin ATA driver should be deleted, and replaced with the BDM enabled ATA driver from ps2sdk:
https://github.com/ps2dev/ps2sdk/tree/master/iop/dev9/atad
https://github.com/ps2dev/ps2sdk/tree/master/iop/dev9/ata_bd

This makes cdvdman more modular, plus the GUI can use the same drivers as the ingame cdvdman. So if your HDD works in the GUI, it will also work ingame.

This change increases the number of iopcores, just like the changes from @grimdoomer to add BDM/exFat support to the HDD did. If we are going down this road, the following different iopcore versions will all have to be compiled into the OPL ELF:

  • hdd_cdvdman.irx (APA/PFS)
  • hdd_gamestar_cdvdman.irx (APA/PFS) - new daf0x
  • hdd_hdpro_cdvdman.irx (APA/PFS)
  • bdm_ata_cdvdman.irx (exFat) - new grimdoomer
  • bdm_ata_gamestar_cdvdman.irx (exFat) - new daf0x+grimdoomer
  • bdm_ata_hdpro_cdvdman.irx (exFat) - new if still supported? or drop support?
  • smb_cdvdman.irx
  • bdm_cdvdman.irx (exFat) on usb/ilink/mx4sio

Preferably I would like this to become:

  • smb_cdvdman.irx
  • bdm_cdvdman.irx (APA/PFS/exFat) on usb/ilink/mx4sio/ata/hdpro

Note that in the ps2sdk drivers there are also compile options for the ata driver. I think your size optimizations would be very welcome there, to create a new "mini" driver version (ata_bd_mini.irx).

The compatibility with "gamestar" adapters should not be in a separate file if you ask me. There also seem to be many different versions of gamestar adapters. As I understand from grimdoomer's research there are:

  • Original adapters, these work well and are officially supported
  • BitfunX clone adapters, these work well and could be officially supported (in the standard ata driver).
  • Random 'shit' clone adapters (PS2Play/FPGA/etc...), these have issues, need special drivers, are not supported.
    It is unknown to me what exactly the ps2sdk ata driver supports when compiled with "gamestar" compatibility, and the same for "gamestar" compatibility in OPL. I'm assuming at least the BitfunX adapters work, and perhaps a few others?

If you merge hdd_cdvdman and hdd_gamestar_cdvdman back into 1, then perhaps this PR can be seen as an improvement to the current messy situation?

@daf0x
Copy link
Author

daf0x commented Aug 13, 2023

For formatting changes, could you please do it in accordance with clang-format? If you really need to change how clang is formatting your code you can either: change .clang-fromat file according to your vision, or add // clang-format 0ff and //clang-format on macro on blocks that needs such change

Done.

In the current messy situation we have, these changes look good. However, I think the builtin ATA driver should be deleted, and replaced with the BDM enabled ATA driver from ps2sdk: ps2dev/ps2sdk@master/iop/dev9/atad ps2dev/ps2sdk@master/iop/dev9/ata_bd

I don't have enough context of the project's goals and direction to really comment on this, but personally it is my understanding that for maximum compatibility the in-game driver should be as small as possible. Using the full ps2sdk driver goes against that goal because that is a generic driver with many more features than is strictly required by games. Because of its additional features it is larger than it needs to be, and hence risks reduced compatibility.

This makes cdvdman more modular,

I can't comment on this.

plus the GUI can use the same drivers as the ingame cdvdman. So if your HDD works in the GUI, it will also work ingame.

This sounds like a nice benefit, but please consider the trade-off in compatibility.

This change increases the number of iopcores, just like the changes from @grimdoomer to add BDM/exFat support to the HDD did. If we are going down this road, the following different iopcore versions will all have to be compiled into the OPL ELF:

* hdd_cdvdman.irx (APA/PFS)

* hdd_gamestar_cdvdman.irx (APA/PFS) - **new** daf0x

* hdd_hdpro_cdvdman.irx (APA/PFS)

* bdm_ata_cdvdman.irx (exFat) - **new** grimdoomer

* bdm_ata_gamestar_cdvdman.irx (exFat) - **new** daf0x+grimdoomer

* bdm_ata_hdpro_cdvdman.irx (exFat) - **new** if still supported? or drop support?

* smb_cdvdman.irx

* bdm_cdvdman.irx (exFat) on usb/ilink/mx4sio

Preferably I would like this to become:

* smb_cdvdman.irx

* bdm_cdvdman.irx (**APA/PFS**/exFat) on usb/ilink/mx4sio/**ata/hdpro**

I was actually thinking of splitting-up zso support in a similar way. If the installed game isn't compressed, having the zso support code in memory is dead weight that could be spent on other features or improved compatibility.

Note that in the ps2sdk drivers there are also compile options for the ata driver. I think your size optimizations would be very welcome there, to create a new "mini" driver version (ata_bd_mini.irx).

Sorry, I don't have the resources to spend on this. Also it seems quite out-of-scope for this PR. Even if the ps2sdk's driver was modified in a similar way, OPL wouldn't use it (in game). From looking at the ps2sdk's version of atad.c it seems that that file has continued to evolve separately from this file in OPL, and the ps2sdk driver has quite some additional features.

If you do want to make a mini version of the driver in ps2sdk, you should probably look at this driver for comparison, and make sure to make everything that is not supported in the OPL version of atad.c an optional feature in the ps2sdk version. E.g. see which commands OPL needs, make everything else optional. And then from there optimise the code in the same way that I did here, continuously checking to see if your changes are actually making things smaller. Sometimes an seemingly obvious change that you expect to result in smaller code actually leads to the compiler emitting more code instead. One challenge with this that becomes immediately obvious is how are you going to keep the file readable and maintainable, I'd think you'd need quite a lot of #ifdef statements sprinkled about...

The compatibility with "gamestar" adapters should not be in a separate file if you ask me. There also seem to be many different versions of gamestar adapters. As I understand from grimdoomer's research there are:

* Original adapters, these work well and are officially supported

* BitfunX clone adapters, these work well and could be officially supported (in the standard ata driver).

* Random 'shit' clone adapters (PS2Play/FPGA/etc...), these have issues, need special drivers, are not supported.
  It is unknown to me what exactly the ps2sdk ata driver supports when compiled with "gamestar" compatibility, and the same for "gamestar" compatibility in OPL. I'm assuming at least the BitfunX adapters work, and perhaps a few others?

If you merge hdd_cdvdman and hdd_gamestar_cdvdman back into 1, then perhaps this PR can be seen as an improvement to the current messy situation?

The only differences are the ordering of ata_set_dir() vs. ata_io_start() (either before or after), and whether the SPD_R_XFR_CTRL gets written with the high bit set or not. The ordering I'm not even sure why it should make a difference really, but as it is a workaround for a hardware bug we will have to assume that it is necessary. This compatibility hack seems to be the same between ps2sdk and OPL.

As to why it should be in a separate file, this is a run-time cost (few bytes) that everyone needs to pay otherwise. By splitting it off we gain a few bytes in both cases, so both sides even have a small advantage.

@daf0x daf0x force-pushed the optimize-ingame-atad-driver branch from 9128ca4 to b746f49 Compare August 13, 2023 21:19
@rickgaiser
Copy link
Member

Even if the ps2sdk's driver was modified in a similar way, OPL wouldn't use it (in game).

The large ps2sdk ATA driver works ingame just fine. I know becouse I'm using it in neutrino. Yes, it will use more memory, and that will reduce game compatibility. But in comparison to USB, the ATA driver will still use a lot less memory.

context of the project's goals and direction

There are no official goals and directions. However this PR is a move in the opposite direction I would personally like it to go, and it's one of the reasons for creating neutrino.

I was actually thinking of splitting-up zso support in a similar way. If the installed game isn't compressed, having the zso support code in memory is dead weight that could be spent on other features or improved compatibility.

That's a good idea, but I wish this too could be done in a modular way. Not doing this modular will lead to many different iopcores for each compile option. The new ATA options will increase it to 8 iopcores. Separating zso will double the cores to 16. The next option will increase to 32... modularity is the only solution if you ask me, but opinions differ on this.

If other developers think differently about this, then please feel free to discuss and/or pull this PR.

PS: The real memory optimizations are found elsewhere. For instance here:

// DMA/reading alignment correction buffer. Used by CDVDMAN and CDVDFSV.
//The minimum size is 2, as one sector may be used for buffer alignment correction.
#define CDVDMAN_FS_SECTORS 8

Reducing this to 2 will free 12KiB of IOP RAM. It will also make the game a little slower. If you ask me this value should be configurable per game. So we can reduce it when needed. And have more speed when the game allows it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants