Source Activity

Syndicate content
Updated: 57 min 26 sec ago

libroot/x86_64: new memset implementation

Sun, 2014-09-14 17:16
libroot/x86_64: new memset implementation

This patch introduces new memset() implementation that improves the
performance when the buffer is small. It was written for processors that
support ERMSB, but performs reasonably well on older CPUs as well.

The following benchmarks were done on Haswell i7 running Debian Jessie
with Linux 3.16.1. In each iteration 64MB buffer was memset()ed, the
parameter "size" is the size of the buffer passed in a single call (i.e.
for "size: 2" memset() was called ~32 million times to memset the whole
64MB).

f - original implementation, g - new implementation, all buffers 16 byte
aligned

set, size:        8, f:    66885 µs, g:    17768 µs, ∆:   73.44%
set, size:       32, f:    17123 µs, g:     9163 µs, ∆:   46.49%
set, size:      128, f:     6677 µs, g:     6919 µs, ∆:   -3.62%
set, size:      512, f:    11656 µs, g:     7715 µs, ∆:   33.81%
set, size:     1024, f:     9156 µs, g:     7359 µs, ∆:   19.63%
set, size:     4096, f:     4936 µs, g:     5159 µs, ∆:   -4.52%

f - glibc 2.19 implementation, g - new implementation, all buffers 16 byte
aligned

set, size:        8, f:    19631 µs, g:    17828 µs, ∆:    9.18%
set, size:       32, f:     8545 µs, g:     9047 µs, ∆:   -5.87%
set, size:      128, f:     8304 µs, g:     6874 µs, ∆:   17.22%
set, size:      512, f:     7373 µs, g:     7486 µs, ∆:   -1.53%
set, size:     1024, f:     9007 µs, g:     7344 µs, ∆:   18.46%
set, size:     4096, f:     8169 µs, g:     5146 µs, ∆:   37.01%

Apparently, glibc uses SSE even for large buffers and therefore does not
takes advantage of ERMSB:

set, size:    16384, f:     7007 µs, g:     3223 µs, ∆:   54.00%
set, size:    32768, f:     6979 µs, g:     2930 µs, ∆:   58.02%
set, size:    65536, f:     6907 µs, g:     2826 µs, ∆:   59.08%
set, size:   131072, f:     6919 µs, g:     2752 µs, ∆:   60.23%

The new implementation handles unaligned buffers quite well:

f - glibc 2.19 implementation, g - new implementation, all buffers unaligned

set, size:       16, f:    10045 µs, g:    10498 µs, ∆:   -4.51%
set, size:       32, f:     8590 µs, g:     9358 µs, ∆:   -8.94%
set, size:       64, f:     8618 µs, g:     8585 µs, ∆:    0.38%
set, size:      128, f:     8393 µs, g:     6893 µs, ∆:   17.87%
set, size:      256, f:     8042 µs, g:     7621 µs, ∆:    5.24%
set, size:      512, f:     9661 µs, g:     7738 µs, ∆:   19.90%

Signed-off-by: Paweł Dziepak 
Categories: Development

kernel/x86_64: clear xmm0-15 registers on syscall exit

Sun, 2014-09-14 17:16
kernel/x86_64: clear xmm0-15 registers on syscall exit

As Alex pointed out we can leak possibly sensitive data in xmm registers
when returning from the kernel. To prevent that xmm0-15 are zeroed
before sysret or iret. The cost is negligible.

Signed-off-by: Paweł Dziepak 
Categories: Development

kernel/x86_64: save fpu state at interrupts

Sun, 2014-09-14 17:16
kernel/x86_64: save fpu state at interrupts

The kernel is allowed to use fpu anywhere so we must make sure that
user state is not clobbered by saving fpu state at interrupt entry.
There is no need to do that in case of system calls since all fpu
data registers are caller saved.

We do not need, though, to save the whole fpu state at task swich
(again, thanks to calling convention). Only status and control
registers are preserved. This patch actually adds xmm0-15 register
to clobber list of task swich code, but the only reason of that is
to make sure that nothing bad happens inside the function that
executes that task swich. Inspection of the generated code shows
that no xmm registers are actually saved.

Signed-off-by: Paweł Dziepak 
Categories: Development

boot/x86_64: enable sse early

Sun, 2014-09-14 17:16
boot/x86_64: enable sse early

Enable SSE as a part of the "preparation of the environment to run any
C or C++ code" in the entry points of stage2 bootloader.

SSE2 is going to be used by memset() and memcpy().

Signed-off-by: Paweł Dziepak 
Categories: Development

kernel/x86_64: make sure stack is properly aligned in syscalls

Sun, 2014-09-14 17:16
kernel/x86_64: make sure stack is properly aligned in syscalls

Just following the path of least resistance and adding andq $~15, %rsp
where appropriate. That should also make things harder to break
when changing the amount of stuff placed on stack before calling the
actual syscall routine.

Signed-off-by: Paweł Dziepak 
Categories: Development

kernel/x86_64: remove memset and memcpy from commpage

Sun, 2014-09-14 17:16
kernel/x86_64: remove memset and memcpy from commpage

There is absolutely no reason for these functions to be in commpage,
they don't do anything that involves the kernel in any way.

Additionaly, this patch rewrites memset and memcpy to C++, current
implementation is quite simple (though it may perform surprisingly
well when dealing with large buffers on cpus with ermsb). Better
versions are coming soon.

Signed-off-by: Paweł Dziepak 
Categories: Development

kernel/x86[_64]: remove get_optimized_functions from cpu modules

Sun, 2014-09-14 17:16
kernel/x86[_64]: remove get_optimized_functions from cpu modules

The possibility to specify custom memcpy and memset implementations
in cpu modules is currently unused and there is generally no point
in such feature.

There are only 2 x86 vendors that really matter and there isn't
very big difference in performance of the generic optmized versions
of these funcions across different models. Even if we wanted different
versions of memset and memcpy depending on the processor model or
features much better solution would be to use STT_GNU_IFUNC and save
one indirect call.

Long story short, we don't really benefit in any way from
get_optimized_functions and the feature it implements and it only adds
unnecessary complexity to the code.

Signed-off-by: Paweł Dziepak 
Categories: Development

Fix (hopefully) bootstrap build with HAIKU_NO_DOWNLOADS=1

Sun, 2014-09-14 12:30
Fix (hopefully) bootstrap build with HAIKU_NO_DOWNLOADS=1

* With HAIKU_NO_DOWNLOADS=1, the check against existing package files
  in the download folder should only be done in the phase that is
  adding packages to be put onto the resulting target image, not in the 
  phase that is adding the bootstrap packages (as here those packages 
  will be *built*, not downloaded).
Categories: Development

Installer: don't replace system/settings.

Sun, 2014-09-14 12:11
Installer: don't replace system/settings.

* Instead of removing "system" in the target completely, only
  replace all of its subfolders.
* The downside of the current solution is that extra files, and
  directories in "system" will not be removed. Improvements
  welcome.
Categories: Development

libroot: sethostname() now uses ftruncate().

Sun, 2014-09-14 12:11
libroot: sethostname() now uses ftruncate().

* Before, it would just overwrite the previous name, leaving extra
  bytes from the previous name (they wouldn't become part of the
  host name, but it just didn't look that nice).
Categories: Development

HaikuDepot: Fix x86_64 build failure

Sun, 2014-09-14 08:27
HaikuDepot: Fix x86_64 build failure

Fix the following errors:

  PackageInfo.cpp:85:23: error: comparison between signed and unsigned integer expressions
  PackageInfo.cpp:93:21: error: comparison between signed and unsigned integer expressions

Signed-off-by: Alex Smith 
Categories: Development

Add get_name and fix get_next_object.

Sat, 2014-09-13 22:01
Add get_name and fix get_next_object.

TODO: Need to add defines or enum for nameType.
Categories: Development

HaikuDepot: Fixed reading bigger screenshots

Sat, 2014-09-13 21:03
HaikuDepot: Fixed reading bigger screenshots

 * Increase the RAM limit to 128K per screenshot
 * Reduce retrieved size to 320 pixel wide
 * Don't expect to be able to read the stream in one call,
   read it in 4K chunks.
 * Print some errors in this code-path to stderr.
Categories: Development

HaikuDepot: Improve filter UI, fix pkg translations

Sat, 2014-09-13 21:03
HaikuDepot: Improve filter UI, fix pkg translations

 * The categories and other filter options are orthogonal. Don't
   force the user to choose between real categories and for example