Sun, 05 Sep 2010 10:10:16 +0000 |
reimar |
Use "d" suffix for general-purpose registers used with movd.
libavcodec
|
Tue, 24 Aug 2010 16:52:27 +0000 |
rbultje |
Mark xmm registers as clobbered in simple loopfilter. Should fix the last
libavcodec
|
Mon, 23 Aug 2010 02:41:22 +0000 |
rbultje |
Fix segfaults in VP8 SIMD code on Win64 (and FATE/win64 failures).
libavcodec
|
Mon, 02 Aug 2010 20:18:09 +0000 |
darkshikari |
VP8: move zeroing of luma DC block into the WHT
libavcodec
|
Sat, 31 Jul 2010 23:13:15 +0000 |
rbultje |
Use word-writing instead of dword-writing (with two cached but otherwise
libavcodec
|
Mon, 26 Jul 2010 21:18:19 +0000 |
rbultje |
Use pmaddubsw for the mbedge_filter (>=ssse3), 6-10 cycles faster.
libavcodec
|
Mon, 26 Jul 2010 19:34:00 +0000 |
darkshikari |
VP8: Much faster SSE2 MC
libavcodec
|
Mon, 26 Jul 2010 14:07:57 +0000 |
rbultje |
Enable no-loop memory/register saving for ssse3/sse4 also.
libavcodec
|
Mon, 26 Jul 2010 14:00:15 +0000 |
rbultje |
Save a register (or regsize of stackspace for x86-32) for the no-loop
libavcodec
|
Mon, 26 Jul 2010 13:56:51 +0000 |
rbultje |
Use nested ifs instead of &&, which appears to not work with %ifidn (i.e. this
libavcodec
|
Mon, 26 Jul 2010 13:50:59 +0000 |
rbultje |
Split pextrw macro-spaghetti into several opt-specific macros, this will make
libavcodec
|
Sun, 25 Jul 2010 02:42:40 +0000 |
rbultje |
Fix obvious bug in assignment. Somehow, the test vectors don't test this...
libavcodec
|
Sat, 24 Jul 2010 19:33:05 +0000 |
rbultje |
Fix SPLATB_REG mess. Used to be a if/elseif/elseif/elseif spaghetti, so this
libavcodec
|
Fri, 23 Jul 2010 06:02:52 +0000 |
darkshikari |
VP8: optimize DC-only chroma case in the same way as luma.
libavcodec
|
Fri, 23 Jul 2010 03:02:56 +0000 |
darkshikari |
VP8 asm: cosmetics (spacing)
libavcodec
|
Fri, 23 Jul 2010 02:58:27 +0000 |
darkshikari |
VP8: 30% faster idct_mb
libavcodec
|
Fri, 23 Jul 2010 00:07:16 +0000 |
darkshikari |
VP8: clear DCT blocks in iDCT instead of using clear_blocks.
libavcodec
|
Thu, 22 Jul 2010 19:59:34 +0000 |
rbultje |
Use pextrw for SSE4 mbedge filter result writing, speedup 5-10cycles on
libavcodec
|
Thu, 22 Jul 2010 01:35:26 +0000 |
rbultje |
Fix and enable horizontal >=SSE2 mbedge loopfilter.
libavcodec
|
Wed, 21 Jul 2010 22:41:37 +0000 |
darkshikari |
Eliminate one instruction in VP8 dc_add_sse4
libavcodec
|
Wed, 21 Jul 2010 22:11:03 +0000 |
darkshikari |
Various VP8 x86 deblocking speedups
libavcodec
|
Wed, 21 Jul 2010 20:51:01 +0000 |
darkshikari |
Make mmx VP8 WHT faster
libavcodec
|
Tue, 20 Jul 2010 22:58:56 +0000 |
rbultje |
VP8 MBedge loopfilter MMX/MMX2/SSE2 functions for both luma (width=16)
libavcodec
|
Tue, 20 Jul 2010 22:04:18 +0000 |
rbultje |
Chroma (width=8) inner loopfilter MMX/MMX2/SSE2 for VP8 decoder.
libavcodec
|
Mon, 19 Jul 2010 23:57:09 +0000 |
rbultje |
Revert r24339 (it causes fate failures on x86-64) - I'll figure out what's
libavcodec
|
Mon, 19 Jul 2010 21:53:28 +0000 |
rbultje |
Implement chroma (width=8) inner loopfilter MMX/MMX2/SSE2 functions.
libavcodec
|
Mon, 19 Jul 2010 21:45:36 +0000 |
rbultje |
Be more efficient with registers or stack memory. Saves 8/16 bytes stack
libavcodec
|
Mon, 19 Jul 2010 21:18:04 +0000 |
rbultje |
Change function prototypes for width=8 inner and mbedge loopfilter functions
libavcodec
|
Fri, 16 Jul 2010 21:35:30 +0000 |
rbultje |
Attempt to fix x86-64 testsuite on fate.
libavcodec
|
Fri, 16 Jul 2010 19:54:47 +0000 |
rbultje |
Remove duplicate define.
libavcodec
|
Fri, 16 Jul 2010 19:54:25 +0000 |
rbultje |
Revert 24270, it contained some stuff that shouldn't have been in there.
libavcodec
|
Fri, 16 Jul 2010 19:42:32 +0000 |
rbultje |
Remove duplicate define.
libavcodec
|
Fri, 16 Jul 2010 19:38:10 +0000 |
rbultje |
Give x86 r%d registers names, this will simplify implementation of the chroma
libavcodec
|
Fri, 16 Jul 2010 18:29:14 +0000 |
rbultje |
Change return statement, the REP_RET is a mistake since the else case (x86-64,
libavcodec
|
Thu, 15 Jul 2010 23:02:34 +0000 |
rbultje |
VP8 H/V inner loopfilter MMX/MMXEXT/SSE2 optimizations.
libavcodec
|
Sat, 03 Jul 2010 19:26:30 +0000 |
rbultje |
Simple H/V loopfilter for VP8 in MMX, MMX2 and SSE2 (yay for yasm macros).
libavcodec
|
Sat, 03 Jul 2010 00:48:12 +0000 |
darkshikari |
SSSE3 versions of vp8 width4 bilinear MC functions
libavcodec
|
Fri, 02 Jul 2010 05:27:41 +0000 |
darkshikari |
SSSE3 versions of width4 VP8 6-tap MC functions
libavcodec
|
Tue, 29 Jun 2010 17:23:17 +0000 |
darkshikari |
Use add instead of lshift in mmxext vp8 idct
libavcodec
|
Tue, 29 Jun 2010 17:04:29 +0000 |
rbultje |
Remove unused macros (duplicates from the now-LGPL x86util.asm).
libavcodec
|
Tue, 29 Jun 2010 14:43:11 +0000 |
rbultje |
MMX idct_add for VP8.
libavcodec
|
Tue, 29 Jun 2010 01:41:59 +0000 |
darkshikari |
Add mmxext version of VP8 DC Hadamard transform
libavcodec
|
Mon, 28 Jun 2010 22:13:14 +0000 |
darkshikari |
Fix VP8 bilinear mc on x86_64
libavcodec
|
Mon, 28 Jun 2010 19:14:40 +0000 |
darkshikari |
Add x86 asm functions for VP8 put_pixels
libavcodec
|
Mon, 28 Jun 2010 18:56:24 +0000 |
darkshikari |
Add MMX, SSE2, SSSE3 asm for VP8 bilinear MC
libavcodec
|
Sun, 27 Jun 2010 02:01:45 +0000 |
rbultje |
First shot at VP8 optimizations:
libavcodec
|