3077
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 1 /*
698
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 2 aclib - advanced C library ;)
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 3 This file contains functions which improve and expand standard C-library
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 4 */
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 5
1123
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 6 #ifndef HAVE_SSE2
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 7 /*
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 8 P3 processor has only one SSE decoder so can execute only 1 sse insn per
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 9 cpu clock, but it has 3 mmx decoders (include load/store unit)
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 10 and executes 3 mmx insns per cpu clock.
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 11 P4 processor has some chances, but after reading:
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 12 http://www.emulators.com/pentium4.htm
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 13 I have doubts. Anyway SSE2 version of this code can be written better.
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 14 */
698
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 15 #undef HAVE_SSE
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 16 #endif
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 17
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 18
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 19 /*
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 20 This part of code was taken by me from Linux-2.4.3 and slightly modified
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 21 for MMX, MMX2, SSE instruction set. I have done it since linux uses page aligned
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 22 blocks but mplayer uses weakly ordered data and original sources can not
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 23 speedup them. Only using PREFETCHNTA and MOVNTQ together have effect!
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 24
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 25 >From IA-32 Intel Architecture Software Developer's Manual Volume 1,
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 26
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 27 Order Number 245470:
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 28 "10.4.6. Cacheability Control, Prefetch, and Memory Ordering Instructions"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 29
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 30 Data referenced by a program can be temporal (data will be used again) or
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 31 non-temporal (data will be referenced once and not reused in the immediate
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 32 future). To make efficient use of the processor's caches, it is generally
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 33 desirable to cache temporal data and not cache non-temporal data. Overloading
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 34 the processor's caches with non-temporal data is sometimes referred to as
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 35 "polluting the caches".
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 36 The non-temporal data is written to memory with Write-Combining semantics.
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 37
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 38 The PREFETCHh instructions permits a program to load data into the processor
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 39 at a suggested cache level, so that it is closer to the processors load and
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 40 store unit when it is needed. If the data is already present in a level of
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 41 the cache hierarchy that is closer to the processor, the PREFETCHh instruction
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 42 will not result in any data movement.
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 43 But we should you PREFETCHNTA: Non-temporal data fetch data into location
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 44 close to the processor, minimizing cache pollution.
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 45
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 46 The MOVNTQ (store quadword using non-temporal hint) instruction stores
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 47 packed integer data from an MMX register to memory, using a non-temporal hint.
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 48 The MOVNTPS (store packed single-precision floating-point values using
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 49 non-temporal hint) instruction stores packed floating-point data from an
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 50 XMM register to memory, using a non-temporal hint.
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 51
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 52 The SFENCE (Store Fence) instruction controls write ordering by creating a
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 53 fence for memory store operations. This instruction guarantees that the results
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 54 of every store instruction that precedes the store fence in program order is
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 55 globally visible before any store instruction that follows the fence. The
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 56 SFENCE instruction provides an efficient way of ensuring ordering between
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 57 procedures that produce weakly-ordered data and procedures that consume that
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 58 data.
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 59
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 60 If you have questions please contact with me: Nick Kurshev: nickols_k@mail.ru.
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 61 */
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 62
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 63 // 3dnow memcpy support from kernel 2.4.2
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 64 // by Pontscho/fresh!mindworkz
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 65
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 66
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 67 #undef HAVE_MMX1
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 68 #if defined(HAVE_MMX) && !defined(HAVE_MMX2) && !defined(HAVE_3DNOW) && !defined(HAVE_SSE)
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 69 /* means: mmx v.1. Note: Since we added alignment of destinition it speedups
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 70 of memory copying on PentMMX, Celeron-1 and P2 upto 12% versus
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 71 standard (non MMX-optimized) version.
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 72 Note: on K6-2+ it speedups memory copying upto 25% and
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 73 on K7 and P3 about 500% (5 times). */
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 74 #define HAVE_MMX1
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 75 #endif
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 76
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 77
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 78 #undef HAVE_K6_2PLUS
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 79 #if !defined( HAVE_MMX2) && defined( HAVE_3DNOW)
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 80 #define HAVE_K6_2PLUS
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 81 #endif
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 82
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 83 /* for small memory blocks (<256 bytes) this version is faster */
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 84 #define small_memcpy(to,from,n)\
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 85 {\
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 86 register unsigned long int dummy;\
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 87 __asm__ __volatile__(\
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 88 "rep; movsb"\
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 89 :"=&D"(to), "=&S"(from), "=&c"(dummy)\
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 90 /* It's most portable way to notify compiler */\
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 91 /* that edi, esi and ecx are clobbered in asm block. */\
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 92 /* Thanks to A'rpi for hint!!! */\
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 93 :"0" (to), "1" (from),"2" (n)\
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 94 : "memory");\
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 95 }
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 96
3393
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 97 #undef MMREG_SIZE
698
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 98 #ifdef HAVE_SSE
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 99 #define MMREG_SIZE 16
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 100 #else
3077
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 101 #define MMREG_SIZE 64 //8
698
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 102 #endif
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 103
3393
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 104 #undef PREFETCH
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 105 #undef EMMS
5660
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 106
5662
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 107 #ifdef HAVE_MMX2
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 108 #define PREFETCH "prefetchnta"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 109 #elif defined ( HAVE_3DNOW )
5660
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 110 #define PREFETCH "prefetch"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 111 #else
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 112 #define PREFETCH "/nop"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 113 #endif
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 114
698
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 115 /* On K6 femms is faster of emms. On K7 femms is directly mapped on emms. */
5660
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 116 #ifdef HAVE_3DNOW
698
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 117 #define EMMS "femms"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 118 #else
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 119 #define EMMS "emms"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 120 #endif
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 121
3393
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 122 #undef MOVNTQ
698
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 123 #ifdef HAVE_MMX2
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 124 #define MOVNTQ "movntq"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 125 #else
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 126 #define MOVNTQ "movq"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 127 #endif
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 128
3393
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 129 #undef MIN_LEN
698
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 130 #ifdef HAVE_MMX1
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 131 #define MIN_LEN 0x800 /* 2K blocks */
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 132 #else
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 133 #define MIN_LEN 0x40 /* 64-byte blocks */
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 134 #endif
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 135
7072
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 136 static void * RENAME(fast_memcpy)(void * to, const void * from, size_t len)
698
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 137 {
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 138 void *retval;
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 139 size_t i;
3077
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 140 retval = to;
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 141 #ifdef STATISTICS
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 142 {
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 143 static int freq[33];
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 144 static int t=0;
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 145 int i;
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 146 for(i=0; len>(1<<i); i++);
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 147 freq[i]++;
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 148 t++;
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 149 if(1024*1024*1024 % t == 0)
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 150 for(i=0; i<32; i++)
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 151 printf("freq < %8d %4d\n", 1<<i, freq[i]);
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 152 }
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 153 #endif
698
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 154 #ifndef HAVE_MMX1
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 155 /* PREFETCH has effect even for MOVSB instruction ;) */
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 156 __asm__ __volatile__ (
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 157 PREFETCH" (%0)\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 158 PREFETCH" 64(%0)\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 159 PREFETCH" 128(%0)\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 160 PREFETCH" 192(%0)\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 161 PREFETCH" 256(%0)\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 162 : : "r" (from) );
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 163 #endif
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 164 if(len >= MIN_LEN)
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 165 {
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 166 register unsigned long int delta;
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 167 /* Align destinition to MMREG_SIZE -boundary */
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 168 delta = ((unsigned long int)to)&(MMREG_SIZE-1);
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 169 if(delta)
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 170 {
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 171 delta=MMREG_SIZE-delta;
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 172 len -= delta;
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 173 small_memcpy(to, from, delta);
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 174 }
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 175 i = len >> 6; /* len/64 */
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 176 len&=63;
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 177 /*
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 178 This algorithm is top effective when the code consequently
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 179 reads and writes blocks which have size of cache line.
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 180 Size of cache line is processor-dependent.
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 181 It will, however, be a minimum of 32 bytes on any processors.
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 182 It would be better to have a number of instructions which
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 183 perform reading and writing to be multiple to a number of
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 184 processor's decoders, but it's not always possible.
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 185 */
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 186 #ifdef HAVE_SSE /* Only P3 (may be Cyrix3) */
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 187 if(((unsigned long)from) & 15)
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 188 /* if SRC is misaligned */
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 189 for(; i>0; i--)
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 190 {
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 191 __asm__ __volatile__ (
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 192 PREFETCH" 320(%0)\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 193 "movups (%0), %%xmm0\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 194 "movups 16(%0), %%xmm1\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 195 "movups 32(%0), %%xmm2\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 196 "movups 48(%0), %%xmm3\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 197 "movntps %%xmm0, (%1)\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 198 "movntps %%xmm1, 16(%1)\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 199 "movntps %%xmm2, 32(%1)\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 200 "movntps %%xmm3, 48(%1)\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 201 :: "r" (from), "r" (to) : "memory");
14565
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 202 from=((const unsigned char *) from)+64;
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 203 to=((unsigned char *)to)+64;
698
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 204 }
3077
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 205 else
698
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 206 /*
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 207 Only if SRC is aligned on 16-byte boundary.
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 208 It allows to use movaps instead of movups, which required data
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 209 to be aligned or a general-protection exception (#GP) is generated.
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 210 */
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 211 for(; i>0; i--)
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 212 {
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 213 __asm__ __volatile__ (
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 214 PREFETCH" 320(%0)\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 215 "movaps (%0), %%xmm0\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 216 "movaps 16(%0), %%xmm1\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 217 "movaps 32(%0), %%xmm2\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 218 "movaps 48(%0), %%xmm3\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 219 "movntps %%xmm0, (%1)\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 220 "movntps %%xmm1, 16(%1)\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 221 "movntps %%xmm2, 32(%1)\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 222 "movntps %%xmm3, 48(%1)\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 223 :: "r" (from), "r" (to) : "memory");
14565
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 224 from=((const unsigned char *)from)+64;
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 225 to=((unsigned char *)to)+64;
698
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 226 }
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 227 #else
3077
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 228 // Align destination at BLOCK_SIZE boundary
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 229 for(; ((int)to & (BLOCK_SIZE-1)) && i>0; i--)
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 230 {
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 231 __asm__ __volatile__ (
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 232 #ifndef HAVE_MMX1
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 233 PREFETCH" 320(%0)\n"
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 234 #endif
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 235 "movq (%0), %%mm0\n"
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 236 "movq 8(%0), %%mm1\n"
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 237 "movq 16(%0), %%mm2\n"
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 238 "movq 24(%0), %%mm3\n"
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 239 "movq 32(%0), %%mm4\n"
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 240 "movq 40(%0), %%mm5\n"
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 241 "movq 48(%0), %%mm6\n"
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 242 "movq 56(%0), %%mm7\n"
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 243 MOVNTQ" %%mm0, (%1)\n"
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 244 MOVNTQ" %%mm1, 8(%1)\n"
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 245 MOVNTQ" %%mm2, 16(%1)\n"
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 246 MOVNTQ" %%mm3, 24(%1)\n"
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 247 MOVNTQ" %%mm4, 32(%1)\n"
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 248 MOVNTQ" %%mm5, 40(%1)\n"
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 249 MOVNTQ" %%mm6, 48(%1)\n"
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 250 MOVNTQ" %%mm7, 56(%1)\n"
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 251 :: "r" (from), "r" (to) : "memory");
15639
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 252 from=((const unsigned char *)from)+64;
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 253 to=((unsigned char *)to)+64;
3077
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 254 }
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 255
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 256 // printf(" %d %d\n", (int)from&1023, (int)to&1023);
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 257 // Pure Assembly cuz gcc is a bit unpredictable ;)
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 258 if(i>=BLOCK_SIZE/64)
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 259 asm volatile(
13720
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 260 "xor %%"REG_a", %%"REG_a" \n\t"
3077
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 261 ".balign 16 \n\t"
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 262 "1: \n\t"
13720
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 263 "movl (%0, %%"REG_a"), %%ebx \n\t"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 264 "movl 32(%0, %%"REG_a"), %%ebx \n\t"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 265 "movl 64(%0, %%"REG_a"), %%ebx \n\t"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 266 "movl 96(%0, %%"REG_a"), %%ebx \n\t"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 267 "add $128, %%"REG_a" \n\t"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 268 "cmp %3, %%"REG_a" \n\t"
3077
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 269 " jb 1b \n\t"
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 270
13720
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 271 "xor %%"REG_a", %%"REG_a" \n\t"
3077
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 272
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 273 ".balign 16 \n\t"
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 274 "2: \n\t"
13720
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 275 "movq (%0, %%"REG_a"), %%mm0\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 276 "movq 8(%0, %%"REG_a"), %%mm1\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 277 "movq 16(%0, %%"REG_a"), %%mm2\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 278 "movq 24(%0, %%"REG_a"), %%mm3\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 279 "movq 32(%0, %%"REG_a"), %%mm4\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 280 "movq 40(%0, %%"REG_a"), %%mm5\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 281 "movq 48(%0, %%"REG_a"), %%mm6\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 282 "movq 56(%0, %%"REG_a"), %%mm7\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 283 MOVNTQ" %%mm0, (%1, %%"REG_a")\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 284 MOVNTQ" %%mm1, 8(%1, %%"REG_a")\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 285 MOVNTQ" %%mm2, 16(%1, %%"REG_a")\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 286 MOVNTQ" %%mm3, 24(%1, %%"REG_a")\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 287 MOVNTQ" %%mm4, 32(%1, %%"REG_a")\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 288 MOVNTQ" %%mm5, 40(%1, %%"REG_a")\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 289 MOVNTQ" %%mm6, 48(%1, %%"REG_a")\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 290 MOVNTQ" %%mm7, 56(%1, %%"REG_a")\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 291 "add $64, %%"REG_a" \n\t"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 292 "cmp %3, %%"REG_a" \n\t"
3077
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 293 "jb 2b \n\t"
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 294
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 295 #if CONFUSION_FACTOR > 0
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 296 // a few percent speedup on out of order executing CPUs
13720
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 297 "mov %5, %%"REG_a" \n\t"
3077
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 298 "2: \n\t"
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 299 "movl (%0), %%ebx \n\t"
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 300 "movl (%0), %%ebx \n\t"
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 301 "movl (%0), %%ebx \n\t"
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 302 "movl (%0), %%ebx \n\t"
13720
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 303 "dec %%"REG_a" \n\t"
3077
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 304 " jnz 2b \n\t"
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 305 #endif
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 306
13720
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 307 "xor %%"REG_a", %%"REG_a" \n\t"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 308 "add %3, %0 \n\t"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 309 "add %3, %1 \n\t"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 310 "sub %4, %2 \n\t"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 311 "cmp %4, %2 \n\t"
3077
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 312 " jae 1b \n\t"
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 313 : "+r" (from), "+r" (to), "+r" (i)
13720
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 314 : "r" ((long)BLOCK_SIZE), "i" (BLOCK_SIZE/64), "i" ((long)CONFUSION_FACTOR)
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 315 : "%"REG_a, "%ebx"
3077
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 316 );
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 317
698
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 318 for(; i>0; i--)
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 319 {
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 320 __asm__ __volatile__ (
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 321 #ifndef HAVE_MMX1
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 322 PREFETCH" 320(%0)\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 323 #endif
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 324 "movq (%0), %%mm0\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 325 "movq 8(%0), %%mm1\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 326 "movq 16(%0), %%mm2\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 327 "movq 24(%0), %%mm3\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 328 "movq 32(%0), %%mm4\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 329 "movq 40(%0), %%mm5\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 330 "movq 48(%0), %%mm6\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 331 "movq 56(%0), %%mm7\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 332 MOVNTQ" %%mm0, (%1)\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 333 MOVNTQ" %%mm1, 8(%1)\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 334 MOVNTQ" %%mm2, 16(%1)\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 335 MOVNTQ" %%mm3, 24(%1)\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 336 MOVNTQ" %%mm4, 32(%1)\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 337 MOVNTQ" %%mm5, 40(%1)\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 338 MOVNTQ" %%mm6, 48(%1)\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 339 MOVNTQ" %%mm7, 56(%1)\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 340 :: "r" (from), "r" (to) : "memory");
15639
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 341 from=((const unsigned char *)from)+64;
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 342 to=((unsigned char *)to)+64;
698
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 343 }
3077
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 344
698
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 345 #endif /* Have SSE */
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 346 #ifdef HAVE_MMX2
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 347 /* since movntq is weakly-ordered, a "sfence"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 348 * is needed to become ordered again. */
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 349 __asm__ __volatile__ ("sfence":::"memory");
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 350 #endif
3077
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 351 #ifndef HAVE_SSE
698
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 352 /* enables to use FPU */
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 353 __asm__ __volatile__ (EMMS:::"memory");
3077
99f6db3255aa
10-20% faster fastmemcpy :) on my p3 at least but the algo is mostly from "amd athlon processor x86 code optimization guide" so it should be faster for amd chips too, but i fear it might be slower for mem->vram copies (someone should check that, i cant) ... there are 2 #defines to finetune it (BLOCK_SIZE & CONFUSION_FACTOR)
michael
diff
changeset
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 354 #endif
698
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 355 }
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 356 /*
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 357 * Now do the tail of the block
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 358 */
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 359 if(len) small_memcpy(to, from, len);
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 360 return retval;
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 361 }
4681
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 362
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 363 /**
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 364 * special copy routine for mem -> agp/pci copy (based upon fast_memcpy)
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 365 */
7072
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 366 static void * RENAME(mem2agpcpy)(void * to, const void * from, size_t len)
4681
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 367 {
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 368 void *retval;
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 369 size_t i;
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 370 retval = to;
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 371 #ifdef STATISTICS
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 372 {
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 373 static int freq[33];
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 374 static int t=0;
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 375 int i;
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 376 for(i=0; len>(1<<i); i++);
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 377 freq[i]++;
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 378 t++;
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 379 if(1024*1024*1024 % t == 0)
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 380 for(i=0; i<32; i++)
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 381 printf("mem2agp freq < %8d %4d\n", 1<<i, freq[i]);
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 382 }
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 383 #endif
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 384 if(len >= MIN_LEN)
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 385 {
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 386 register unsigned long int delta;
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 387 /* Align destinition to MMREG_SIZE -boundary */
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 388 delta = ((unsigned long int)to)&7;
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 389 if(delta)
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 390 {
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 391 delta=8-delta;
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 392 len -= delta;
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 393 small_memcpy(to, from, delta);
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 394 }
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 395 i = len >> 6; /* len/64 */
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 396 len &= 63;
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 397 /*
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 398 This algorithm is top effective when the code consequently
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 399 reads and writes blocks which have size of cache line.
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 400 Size of cache line is processor-dependent.
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 401 It will, however, be a minimum of 32 bytes on any processors.
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 402 It would be better to have a number of instructions which
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 403 perform reading and writing to be multiple to a number of
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 404 processor's decoders, but it's not always possible.
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 405 */
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 406 for(; i>0; i--)
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 407 {
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 408 __asm__ __volatile__ (
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 409 PREFETCH" 320(%0)\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 410 "movq (%0), %%mm0\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 411 "movq 8(%0), %%mm1\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 412 "movq 16(%0), %%mm2\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 413 "movq 24(%0), %%mm3\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 414 "movq 32(%0), %%mm4\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 415 "movq 40(%0), %%mm5\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 416 "movq 48(%0), %%mm6\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 417 "movq 56(%0), %%mm7\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 418 MOVNTQ" %%mm0, (%1)\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 419 MOVNTQ" %%mm1, 8(%1)\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 420 MOVNTQ" %%mm2, 16(%1)\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 421 MOVNTQ" %%mm3, 24(%1)\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 422 MOVNTQ" %%mm4, 32(%1)\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 423 MOVNTQ" %%mm5, 40(%1)\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 424 MOVNTQ" %%mm6, 48(%1)\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 425 MOVNTQ" %%mm7, 56(%1)\n"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 426 :: "r" (from), "r" (to) : "memory");
14565
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 427 from=((const unsigned char *)from)+64;
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 428 to=((unsigned char *)to)+64;
4681
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 429 }
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 430 #ifdef HAVE_MMX2
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 431 /* since movntq is weakly-ordered, a "sfence"
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 432 * is needed to become ordered again. */
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 433 __asm__ __volatile__ ("sfence":::"memory");
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 434 #endif
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 435 /* enables to use FPU */
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 436 __asm__ __volatile__ (EMMS:::"memory");
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 437 }
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 438 /*
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 439 * Now do the tail of the block
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 440 */
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 441 if(len) small_memcpy(to, from, len);
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 442 return retval;
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 443 }
+ 鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃醇�鐃緒申��申鐃醇柑鐃緒��鰹申薜合�渇��膩��渇����鐃緒申鐃緒申��膩��鰹申鐃順�渇����鐃初��膩��鰹申鐃順�渇����紮�鐃醇�鐃緒申��申鐃醇�鐃緒申鐃緒申��膩��渇��膩��鰹申鐃処��申鐃初姐�渇����鐃醇�鐃緒申鐃緒申膣�申����鐃初��膩��渇��膩��鰹申鐃緒申鐃初����鐃緒申��申鐃初姐�渇����鐃醇�鐃緒申��申鐃緒申 444