bsnes/bsnes/snes/ppu/ppu.cpp

154 lines
2.8 KiB
C++
Raw Normal View History

#include <snes.hpp>
#define PPU_CPP
namespace SNES {
#if defined(DEBUGGER)
#include "debugger/debugger.cpp"
PPUDebugger ppu;
#else
PPU ppu;
#endif
#include "background/background.cpp"
#include "mmio/mmio.cpp"
#include "screen/screen.cpp"
#include "sprite/sprite.cpp"
#include "window/window.cpp"
#include "serialization.cpp"
void PPU::step(unsigned clocks) {
clock += clocks;
}
void PPU::synchronize_cpu() {
Updated to v067r21 release. byuu says: This moves toward a profile-selection mode. Right now, it is incomplete. There are three binaries, one for each profile. The GUI selection doesn't actually do anything yet. There will be a launcher in a future release that loads each profile's respective binary. I reverted away from blargg's SMP library for the time being, in favor of my own. This will fix most of the csnes/bsnes-performance bugs. This causes a 10% speed hit on 64-bit platforms, and a 15% speed hit on 32-bit platforms. I hope to be able to regain that speed in the future, I may also experiment with creating my own fast-SMP core which drops bus hold delays and TEST register support (never used by anything, ever.) Save states now work in all three cores, but they are not cross-compatible. The profile name is stored in the description field of the save states, and it won't load a state if the profile name doesn't match. The debugger only works on the research target for now. Give it time and it will return for the other targets. Other than that, let's please resume testing on all three once again. See how far we get this time :) I can confirm the following games have issues on the performance profile: - Armored Police Metal Jacket (minor logo flickering, not a big deal) - Chou Aniki (won't start, so obviously unplayable) - Robocop vs The Terminator (major in-game flickering, unplayable) Anyone still have that gigantic bsnes thread archive from the ZSNES forum? Maybe I posted about how to fix those two broken games in there, heh. I really want to release this as v1.0, but my better judgment says we need to give it another week. Damn.
2010-10-20 22:22:44 +11:00
if(CPU::Threaded == true) {
if(clock >= 0 && scheduler.sync != Scheduler::SynchronizeMode::All) co_switch(cpu.thread);
} else {
while(clock >= 0) cpu.enter();
}
}
void PPU::Enter() { ppu.enter(); }
void PPU::enter() {
while(true) {
if(scheduler.sync == Scheduler::SynchronizeMode::All) {
scheduler.exit(Scheduler::ExitReason::SynchronizeEvent);
}
scanline();
Update to v068r12 release. (there was no r11 release posted to the WIP thread) byuu says: This took ten hours of mind boggling insanity to pull off. It upgrades the S-PPU dot-based renderer to fetch one tile, and then output all of its pixels before fetching again. It sounds easy enough, but it's insanely difficult. I ended up taking one small shortcut, in that rather than fetch at -7, I fetch at the first instance where a tile is needed to plot to x=0. So if you have {-3 to +4 } as a tile, it fetches at -3. That won't work so well on hardware, if two BGs fetch at the same X offset, they won't have time. I have had no luck staggering the reads at BG1=-7, BG3=-5, etc. While I can shift and fetch just fine, what happens is that when a new tile is fetched in, that gives a new palette, priority, etc; and this ends up happening between two tiles which results in the right-most edges of the screen ending up with the wrong colors and such. Offset-per-tile is cheap as always. Although looking at it, I'm not sure how BG3 could pre-fetch, especially with the way one or two OPT modes can fetch two tiles. There's no magic in Hoffset caching yet, so the SMW1 pixel issue is still there. Mode 7 got a bugfix, it was off-by-one horizontally from the mosaic code. After re-designing the BG mosaic, I ended up needing a separate mosaic for Mode7, and in the process I fixed that bug. The obvious change is that the Chrono Trigger Mode7->Mode2 transition doesn't cause the pendulum to jump anymore. Windows were simplified just a tad. The range testing is shared for all modes now. Ironically, it's a bit slower, but I'll take less code over more speed for the accuracy core. Speaking of speed, because there's so much less calculations per pixel for BGs, performance for the entire emulator has gone up by 30% in the accuracy core. Pretty neat overall, I can maintain 60fps in all but, yeah you can guess can't you?
2010-09-04 13:36:03 +10:00
add_clocks(56);
if(vcounter() <= (!regs.overscan ? 224 : 239)) {
Update to v068r12 release. (there was no r11 release posted to the WIP thread) byuu says: This took ten hours of mind boggling insanity to pull off. It upgrades the S-PPU dot-based renderer to fetch one tile, and then output all of its pixels before fetching again. It sounds easy enough, but it's insanely difficult. I ended up taking one small shortcut, in that rather than fetch at -7, I fetch at the first instance where a tile is needed to plot to x=0. So if you have {-3 to +4 } as a tile, it fetches at -3. That won't work so well on hardware, if two BGs fetch at the same X offset, they won't have time. I have had no luck staggering the reads at BG1=-7, BG3=-5, etc. While I can shift and fetch just fine, what happens is that when a new tile is fetched in, that gives a new palette, priority, etc; and this ends up happening between two tiles which results in the right-most edges of the screen ending up with the wrong colors and such. Offset-per-tile is cheap as always. Although looking at it, I'm not sure how BG3 could pre-fetch, especially with the way one or two OPT modes can fetch two tiles. There's no magic in Hoffset caching yet, so the SMW1 pixel issue is still there. Mode 7 got a bugfix, it was off-by-one horizontally from the mosaic code. After re-designing the BG mosaic, I ended up needing a separate mosaic for Mode7, and in the process I fixed that bug. The obvious change is that the Chrono Trigger Mode7->Mode2 transition doesn't cause the pendulum to jump anymore. Windows were simplified just a tad. The range testing is shared for all modes now. Ironically, it's a bit slower, but I'll take less code over more speed for the accuracy core. Speaking of speed, because there's so much less calculations per pixel for BGs, performance for the entire emulator has gone up by 30% in the accuracy core. Pretty neat overall, I can maintain 60fps in all but, yeah you can guess can't you?
2010-09-04 13:36:03 +10:00
add_clocks(4);
for(unsigned pixel = 1; pixel < 8 + 256; pixel++) {
bg1.run();
bg2.run();
bg3.run();
bg4.run();
add_clocks(2);
bg1.run();
bg2.run();
bg3.run();
bg4.run();
Update to v068r12 release. (there was no r11 release posted to the WIP thread) byuu says: This took ten hours of mind boggling insanity to pull off. It upgrades the S-PPU dot-based renderer to fetch one tile, and then output all of its pixels before fetching again. It sounds easy enough, but it's insanely difficult. I ended up taking one small shortcut, in that rather than fetch at -7, I fetch at the first instance where a tile is needed to plot to x=0. So if you have {-3 to +4 } as a tile, it fetches at -3. That won't work so well on hardware, if two BGs fetch at the same X offset, they won't have time. I have had no luck staggering the reads at BG1=-7, BG3=-5, etc. While I can shift and fetch just fine, what happens is that when a new tile is fetched in, that gives a new palette, priority, etc; and this ends up happening between two tiles which results in the right-most edges of the screen ending up with the wrong colors and such. Offset-per-tile is cheap as always. Although looking at it, I'm not sure how BG3 could pre-fetch, especially with the way one or two OPT modes can fetch two tiles. There's no magic in Hoffset caching yet, so the SMW1 pixel issue is still there. Mode 7 got a bugfix, it was off-by-one horizontally from the mosaic code. After re-designing the BG mosaic, I ended up needing a separate mosaic for Mode7, and in the process I fixed that bug. The obvious change is that the Chrono Trigger Mode7->Mode2 transition doesn't cause the pendulum to jump anymore. Windows were simplified just a tad. The range testing is shared for all modes now. Ironically, it's a bit slower, but I'll take less code over more speed for the accuracy core. Speaking of speed, because there's so much less calculations per pixel for BGs, performance for the entire emulator has gone up by 30% in the accuracy core. Pretty neat overall, I can maintain 60fps in all but, yeah you can guess can't you?
2010-09-04 13:36:03 +10:00
if(pixel >= 8) {
oam.run();
window.run();
screen.run();
}
add_clocks(2);
}
add_clocks(22);
oam.tilefetch();
} else {
Update to v068r12 release. (there was no r11 release posted to the WIP thread) byuu says: This took ten hours of mind boggling insanity to pull off. It upgrades the S-PPU dot-based renderer to fetch one tile, and then output all of its pixels before fetching again. It sounds easy enough, but it's insanely difficult. I ended up taking one small shortcut, in that rather than fetch at -7, I fetch at the first instance where a tile is needed to plot to x=0. So if you have {-3 to +4 } as a tile, it fetches at -3. That won't work so well on hardware, if two BGs fetch at the same X offset, they won't have time. I have had no luck staggering the reads at BG1=-7, BG3=-5, etc. While I can shift and fetch just fine, what happens is that when a new tile is fetched in, that gives a new palette, priority, etc; and this ends up happening between two tiles which results in the right-most edges of the screen ending up with the wrong colors and such. Offset-per-tile is cheap as always. Although looking at it, I'm not sure how BG3 could pre-fetch, especially with the way one or two OPT modes can fetch two tiles. There's no magic in Hoffset caching yet, so the SMW1 pixel issue is still there. Mode 7 got a bugfix, it was off-by-one horizontally from the mosaic code. After re-designing the BG mosaic, I ended up needing a separate mosaic for Mode7, and in the process I fixed that bug. The obvious change is that the Chrono Trigger Mode7->Mode2 transition doesn't cause the pendulum to jump anymore. Windows were simplified just a tad. The range testing is shared for all modes now. Ironically, it's a bit slower, but I'll take less code over more speed for the accuracy core. Speaking of speed, because there's so much less calculations per pixel for BGs, performance for the entire emulator has gone up by 30% in the accuracy core. Pretty neat overall, I can maintain 60fps in all but, yeah you can guess can't you?
2010-09-04 13:36:03 +10:00
add_clocks(1056 + 22 + 136);
}
Update to v068r12 release. (there was no r11 release posted to the WIP thread) byuu says: This took ten hours of mind boggling insanity to pull off. It upgrades the S-PPU dot-based renderer to fetch one tile, and then output all of its pixels before fetching again. It sounds easy enough, but it's insanely difficult. I ended up taking one small shortcut, in that rather than fetch at -7, I fetch at the first instance where a tile is needed to plot to x=0. So if you have {-3 to +4 } as a tile, it fetches at -3. That won't work so well on hardware, if two BGs fetch at the same X offset, they won't have time. I have had no luck staggering the reads at BG1=-7, BG3=-5, etc. While I can shift and fetch just fine, what happens is that when a new tile is fetched in, that gives a new palette, priority, etc; and this ends up happening between two tiles which results in the right-most edges of the screen ending up with the wrong colors and such. Offset-per-tile is cheap as always. Although looking at it, I'm not sure how BG3 could pre-fetch, especially with the way one or two OPT modes can fetch two tiles. There's no magic in Hoffset caching yet, so the SMW1 pixel issue is still there. Mode 7 got a bugfix, it was off-by-one horizontally from the mosaic code. After re-designing the BG mosaic, I ended up needing a separate mosaic for Mode7, and in the process I fixed that bug. The obvious change is that the Chrono Trigger Mode7->Mode2 transition doesn't cause the pendulum to jump anymore. Windows were simplified just a tad. The range testing is shared for all modes now. Ironically, it's a bit slower, but I'll take less code over more speed for the accuracy core. Speaking of speed, because there's so much less calculations per pixel for BGs, performance for the entire emulator has gone up by 30% in the accuracy core. Pretty neat overall, I can maintain 60fps in all but, yeah you can guess can't you?
2010-09-04 13:36:03 +10:00
add_clocks(lineclocks() - 56 - 1056 - 22 - 136);
}
}
void PPU::add_clocks(unsigned clocks) {
clocks >>= 1;
while(clocks--) {
tick(2);
step(2);
synchronize_cpu();
}
}
void PPU::power() {
ppu1_version = config.ppu1.version;
ppu2_version = config.ppu2.version;
memset(memory::vram.data(), 0x00, memory::vram.size());
memset(memory::oam.data(), 0x00, memory::oam.size());
memset(memory::cgram.data(), 0x00, memory::cgram.size());
reset();
}
void PPU::reset() {
create(Enter, system.cpu_frequency());
PPUcounter::reset();
memset(surface, 0, 512 * 512 * sizeof(uint16));
mmio_reset();
bg1.reset();
bg2.reset();
bg3.reset();
bg4.reset();
oam.reset();
window.reset();
screen.reset();
frame();
}
void PPU::scanline() {
if(vcounter() == 0) {
frame();
bg1.frame();
bg2.frame();
bg3.frame();
bg4.frame();
}
bg1.scanline();
bg2.scanline();
bg3.scanline();
bg4.scanline();
oam.scanline();
window.scanline();
screen.scanline();
}
void PPU::frame() {
system.frame();
oam.frame();
display.interlace = regs.interlace;
display.overscan = regs.overscan;
}
PPU::PPU() :
bg1(*this, Background::ID::BG1),
bg2(*this, Background::ID::BG2),
bg3(*this, Background::ID::BG3),
bg4(*this, Background::ID::BG4),
oam(*this),
window(*this),
screen(*this) {
surface = new uint16[512 * 512];
output = surface + 16 * 512;
}
PPU::~PPU() {
delete[] surface;
}
}