You are not logged in.
Well I come to you with more general programming problem ;)
I'm studying computer science and in this year I have to prepare a project with a couple of friends. Our project's subject it "Mechanisms of binary translation". Whole project was divided into two parts. In theoretical part, we have to write longer text about idea of binary translation, overview most popular implementation etc. For practical part of project, we have to write some kind of virtual machine, which will translate and run "just in time" program compiled for Atmel AVR on x86 processor.
I know, it's quite nasty problem, but maybe some of you can help me. I'm looking for any interesting books, articles and other things (sample implementations of virtual machines?), which may help me.
Thanks in advance!
You might want to take a look at Qemu...
But in your case you should focus on just binary translating AVR to x86 code,
and hopefully forget about the machine emulation thing.
Considering you probably don't have that much time, I'd keep it simple and
add complexity when needed. What you probably need to make is:
- AVR decoder which decodes AVR instructions. You probably want to
translate it to a simple representation which fits in well with the rest of your
code. Then translate that to x86. This is the easy part. The tricky bit is that
different registers can behave differently, and to keep proper track of ALU
http://en.wikipedia.org/wiki/Atmel_AVR_instruction_set + links
is a good start, but you probably got that info already.
- Probably need a memory model. Not hard either, considering how litte RAM
they have. Just a direct mapping for stack, RAM, instruction code and EEPROM.
The nasty part is handling IO or device memories where memory operations
have side effects, but I've the feeling you don't have to handle those.
- If you're unlucky you also need to make an AVR CPU model, with interrupts
and virtual IO and clock and everything. Basically the machine emulation
part I said you should hope you don't have to make. If you have to do the
latter it might have to be cycle accurate too. Add a good timer system...
The question is whether the focus is on speed or on emulation completeness.
If it's speed then you probably need to focus on the fastest x86 instructions
which achieve the same result, and probably forget about the whole timing
thing (it will be variable). If it's emulation accuracy, it's more hairy. On the
upside, you would probably get very specific information about what to
But considering the subject is "Mechanisms of binary translation" I think you
only have to do the fun part.
Sounds like a good project, enjoy!
Project is great, it's "true" computer science ;)
My teacher was talking about even simpler, because an AVR program which we will translate should be very simple. No dynamic memory allocation, no I/O, only static variables and basic mathematical operations, moving memory between registers etc. Maybe some operations on strings too.
How about theoretical part? Can you recommend me any documents related to binary translation? Some descriptions about Java JIT compiler, Mono runtime environment or maybe WINE?
I'm searching Internet for such things, looking on Mono forums etc. I found some potentially interesting documents created by organization similar to IEEE, but they aren't free. Maybe some of you is a member of such organization and can download and share some documents with me? For education purpose only of course :)
Memory allocation, strings and static variables are all programming
language constructs, you can't really see it back in the AVR code.
Usually the university has a membership for all its members and students,
try those pages from the uni network. Otherwise there's this thing called a
Wine Is Not an Emulator. It does no binary translation, so it's not interesting
for your purpose.
Virtual machines are a much wider topic and don't necessarily involve binary
translation, so I'd focus on strict binary translation, things like JIT and such.
I don't know any literature on binary translation, it seems straightforward
enough to just go for it. The devil is in the details when implementing it. If
you go the theoretical way I'd go the more abstract and more mathematical
direction. If you have a mathematical model of the instruction sets you want
to translate between, then the translation looks a lot like a homomorphism.
How good is your algebra?
Anyway, it seems you more or less have to write a summary of the theory
you're going to get in the lectures, and pick one example of each possible
approach or something like that.
For searching stuff Google Scholar is good. Try:
http://scholar.google.co.uk/scholar?q=b … s_sdt=2001
I want to reopen this topic.
I have to run executable file for compiled for PowerPC on x86. For now it can be something simple like "Hello world" program.
I'm new to ELF executable files format, so I'm asking you for help. I would like to know how to parse ELF files, extract static data (like strings), find first application's instruction (and further instructions) and understand addressing inside binary file (e.g. how to count address of static variable, when instruction tries to access it).
I'm back :) I made a lot of progress: I analyse ELF information from PowerPC executable file, decode instructions, translate them and compose x86 machine code. Now I'm trying to execute this code at runtime. Here is how I intended to do this:
I allocate some memory using malloc() and copy there generated machine code. Next I try to execute it using this code:
char *machine_code = malloc (1234); int return_address; // ... move machine code to memory pointed by 'machine_code' ... __asm__ ( "pushal\n" "movl %1,%%eax\n" "call * %%eax\n" "movl %%eax,%0\n" "popal\n" : "=m" (return_address) : "m" (machine_code) );
But it of course doesn't work... It fails with segfault. Any idea ho to make it working? I guess, that Linux allows to execute instructions only from particular locations in memory.
Ok, I figured out, that I can use mprotect() to make memory with generated code executable by making whole memory page executable. Now I don't know how to properly use mprotect()...
You could give Valgrind a try, but in this case gdb might be more useful:
When it crashes, do a backtrace and use the disassemble command to see
where it goes wrong.
If I had to guess, you mess up the stack pointer(s) or something along that line.
Or the function you call is main() and it expects different arguments.
I don't think mprotect will solve this problem, because on x86 readable pages
are also executable. It's only needed with x86_64. But I would allocate the
memory with mmap(2) instead of malloc(), then you get the right permissions
and avoid the malloc overhead.
mmap() solved the problem. It was last blocking issue and now my translation *works* pretty nice ;) I'm able to execute simple PowerPC binary on x86.
Now it's time to optimize it. Main bottleneck registers, i have to map PowePC registers in memory, when i execute program on x86.
I use 32bit Linux distributions on Intel Core 2 Duo, which support 64bit code, so it have 16 additional 64bit registers. I was wondering if it's possible to access (I suppose not, but I always cloud ask) these registers in 32bit operating system?
Ok, referring to Intel manuals I cannot access 8 additional registers R8-R15 in 32 bit mode.
Ah, I forgot that the NX bit is also used by new enough 32 bits kernels running
on X86_64 CPUs.
16 registers still isn't enough to hold all PPC registers, so you need to use the
stack now and then anyway. The keyword is "now and then", instead of all the
time. Basically you have to translate the PPC instructions to some abstract
internal thing your program understands and can optimize.
Which registers are used is encoded in the opcode. Adding more bits to access
additional registers would change the opcode size, making it incompatible and
effectively a different CPU instruction set. That's what they had to do when
adding 64 bits support. Adding one is bad enough.
There are architectures where you can freely mix 32 and 64 bits code, but
x86_64 isn't one of them. There you need to switch between 64 bits and 32 bits
execution mode by changing the CS register. I'm not sure if you're allowed to do
that from user space though.
Have fun managing all endianness cases correctly. ;-)