Well it really depends. Having dumped a few cards without having prior knowledge of their core (micro) and memory map it is doable however in the case of the P4 the point is moot since the P4 is impervious to clock and VCC glitching.
The way to approach this problem is to use simple logic and a systematic approach of attack. The first order of business is to develop H/W that can communicate with the target card and be able to count card clocks. The purpose of this device is to map tangible events to card clock number. The general idea is to develop a road map of card clock cycle number to ATR data bits. The process starts by reseting the card and counting how many card clock cycles elapse from reset is the first start bit of the first byte of the ATR. Next using the start bit of every subsequent ATR data byte you time the complete byte. At the end of this cumbersome process you will have the time it take the card to fetch a byte from memory and serially transmit it. The key to this is that you know the bit timing of every data bit and any wait time inserted between bytes because the card sends that information as part of the ATR. Using this info you can calculate the actual number of clocks it takes the card to serially send one ATR data byte. If you subtract this value from the actual number of clocks counted from each byte you will have left the absolute overhead consumed by the card to fetch and/or preprocess the byte for transmission. The idea is to compare all the accumulated byte clock count and find a sequence of bytes that have the same or very similar (varies only by 1 or 2 cycles) clock count. That usually points to a section of the ATR send routine that is a loop where the loop has a counter preloaded with the byte count to send or in other words how many times the loop will fetch and send. Once that has been determined the next thing that needs to be done is to try to change 1 bit of a byte that is about to be transmitted out the serial port. If you take a look at most of the card code out there you will notice that the very first byte is usually stored in ROM code space and it is either 3F or 3B depending on the communications convention used by the card. For argument sake, lets use 3F in this theoretical example. If you take the clock count you logged for 3F when it was transmitted and divide it by 2 it should put you somewhere in the middle of the byte. 3F in binary looks like this 0011 1111 so we have a very good possibility of hitting a 1 bit if we apply a glitch somewhere in the middle of the byte.
Our systematic glitch search would be something like this:
Divide 3Fclock count by 2 Add clock count from reset to 1 start bit wait that amout of time and receive a bit (just to verify what it is) start: Reset card Receive 3-4 bits Waste 1/2 bit Apply glitch Get rest of bits in byte Did card crash? Yes add 1 clock cycle (or subtract 1 I like to alternate) goto start No, Did byte change? No +1 or --1 clock goto start Yes record glitch params End glitch search.
Now we are ready to attack what we have deduced may be a loop. Since now we have glitch params that the card likes, our glitch is more likely to succeed.
Thinking from a coder's prospective how would we code such a loop? Well one way would be to load the byte count into a register and use that as the loop counter. We would decrement the loop counter after every byte has been sent. Assuming that this unknown loop is in fact coded similarly we subtract 3 bit time (the parity, stop bit and 1 for good luck ) from our logged clock count and target our glitch at that clock count using our previous logged glitch params. Increasing the clock count by 1 on each subsequent try. When we hit the loop control opcode (djnz, bne, breq dec or inc or something equivalent) 256 bytes will be outputed from the card. Now we want to analyze these bytes and hopefully identify the micro from that sequence of bytes. Once the micro is identified the rest is history since you already know what kind of glitches the card likes and what loop can be attacked. This loop can potentially dump the whole card with two glitches 1 at the load Address pointer to make it zero and then just continue to glitch the loop control instruction at every FF byte received to make the counter roll to zero without exiting the loop.