| Age | Commit message (Collapse) | Author | Files | Lines |
|
Add documentation for LANDLOCK_RESTRICT_SELF_TSYNC. It does not need to go
into the main example, but it has a section in the ABI compatibility notes.
In the HTML rendering, the main reference is the system call documentation,
which is included from the landlock.h header file.
Cc: Andrew G. Morgan <morgan@kernel.org>
Cc: John Johansen <john.johansen@canonical.com>
Cc: Paul Moore <paul@paul-moore.com>
Signed-off-by: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20251127115136.3064948-4-gnoack@google.com
[mic: Update date]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Exercise various scenarios where Landlock domains are enforced across
all of a processes' threads.
Test coverage for security/landlock is 91.6% of 2130 lines according to
LLVM 21.
Cc: Andrew G. Morgan <morgan@kernel.org>
Cc: John Johansen <john.johansen@canonical.com>
Cc: Paul Moore <paul@paul-moore.com>
Signed-off-by: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20251127115136.3064948-3-gnoack@google.com
[mic: Fix subject, use EXPECT_EQ(close()), make helpers static, add test
coverage]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Introduce the LANDLOCK_RESTRICT_SELF_TSYNC flag. With this flag, a
given Landlock ruleset is applied to all threads of the calling
process, instead of only the current one.
Without this flag, multithreaded userspace programs currently resort
to using the nptl(7)/libpsx hack for multithreaded policy enforcement,
which is also used by libcap and for setuid(2). Using this
userspace-based scheme, the threads of a process enforce the same
Landlock policy, but the resulting Landlock domains are still
separate. The domains being separate causes multiple problems:
* When using Landlock's "scoped" access rights, the domain identity is
used to determine whether an operation is permitted. As a result,
when using LANLDOCK_SCOPE_SIGNAL, signaling between sibling threads
stops working. This is a problem for programming languages and
frameworks which are inherently multithreaded (e.g. Go).
* In audit logging, the domains of separate threads in a process will
get logged with different domain IDs, even when they are based on
the same ruleset FD, which might confuse users.
Cc: Andrew G. Morgan <morgan@kernel.org>
Cc: John Johansen <john.johansen@canonical.com>
Cc: Paul Moore <paul@paul-moore.com>
Suggested-by: Jann Horn <jannh@google.com>
Signed-off-by: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20251127115136.3064948-2-gnoack@google.com
[mic: Fix restrict_self_flags test, clean up Makefile, allign comments,
reduce local variable scope, add missing includes]
Closes: https://github.com/landlock-lsm/linux/issues/2
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
In line with the previous patch, the __weak arch_sdt_arg_parse_op()
function is removed.
Architectural-specific implementations in the arch/ directory are now
converted into sub-functions within the util/perf-regs-arch/ directory.
The perf_sdt_arg_parse_op() function will call these sub-functions based
on the EM_HOST.
This change enables cross-architecture calls to arch_sdt_arg_parse_op().
No functional changes are intended.
Suggested-by: Ian Rogers <irogers@google.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Guo Ren <guoren@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <pjw@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Falcon <thomas.falcon@intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xudong Hao <xudong.hao@intel.com>
Cc: Zide Chen <zide.chen@intel.com>
[ Fixed up somme fuzz with powerpc and x86 Build files wrt removing perf_regs.o ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Currently, some architecture-specific perf-regs functions, such as
arch__intr_reg_mask() and arch__user_reg_mask(), are defined with the
__weak attribute.
This approach ensures that only functions matching the architecture of
the build/run host are compiled and executed, reducing build time and
binary size.
However, this __weak attribute restricts these functions to be called
only on the same architecture, preventing cross-architecture
functionality.
For example, a perf.data file captured on x86 cannot be parsed on an ARM
platform.
To address this limitation, this patch removes the __weak attribute from
these perf-regs functions.
The architecture-specific code is moved from the arch/ directory to the
util/perf-regs-arch/ directory.
The appropriate architectural functions are then called based on the
EM_HOST.
No functional changes are intended.
Suggested-by: Ian Rogers <irogers@google.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Guo Ren <guoren@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <pjw@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Falcon <thomas.falcon@intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xudong Hao <xudong.hao@intel.com>
Cc: Zide Chen <zide.chen@intel.com>
[ Fixed up somme fuzz with s390 and riscv Build files wrt removing perf_regs.o ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The architectural specific headers perf_regs.h currently rely on the
host architecture's 'asm/perf_regs.h'.
This can lead to compilation inconsistencies or failures when including
and building perf for a target architecture that differs from the host's
architecture.
Explicitly point to the UAPI headers within the tools source tree using
relative paths.
This ensures that perf is always built against the intended
architecture.
No functional changes are intended.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Guo Ren <guoren@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <pjw@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Falcon <thomas.falcon@intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xudong Hao <xudong.hao@intel.com>
Cc: Zide Chen <zide.chen@intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Fix an issue where the `perf` tool aborts unexpectedly when running the
following command:
```
perf record -e cycles -I -- true
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-I, --intr-regs[=<any register>]
sample selected machine registers on interrupt, use '-I?' to list register names
```
The usage of the `-I` or `--user-regs` options without specifying any
registers should default to sampling all general-purpose registers.
However, this currently causes an abnormal termination.
The issue was introduced by commit 3d06db9bad1a ("perf regs: Refactor
use of arch__sample_reg_masks() to perf_reg_name()").
This patch resolves the problem, ensuring that the `-I` or `--user-regs`
options work as intended without causing an abort.
Fixes: 3d06db9bad1ad8e6 ("perf regs: Refactor use of arch__sample_reg_masks() to perf_reg_name()")
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Guo Ren <guoren@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-csky@vger.kernel.org
Cc: linux-riscv@lists.infradead.org
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <pjw@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Falcon <thomas.falcon@intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xudong Hao <xudong.hao@intel.com>
Cc: Zide Chen <zide.chen@intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
This reverts commit 62eb557580eb2177cf16c3fd2b6efadff297b29a.
The revocable implementation uses two separate abstractions, struct
revocable_provider and struct revocable, in order to store the SRCU read
lock index which must be passed unaltered to srcu_read_unlock() in the
same context when a resource is no longer needed.
With the merged revocable API, multiple threads could however share the
same struct revocable and therefore potentially overwrite the SRCU index
of another thread which can cause the SRCU synchronisation in
revocable_provider_revoke() to never complete. [1]
An example revocable conversion of the gpiolib code also turned out to
be fundamentally flawed and could lead to use-after-free. [2]
An attempt to address both issues was quickly put together and merged,
but revocable is still fundamentally broken. [3]
Specifically, the latest design relies on RCU for storing a pointer to
the revocable provider, but since the resource can be shared by value
(e.g. as in the now reverted selftests) this does not work at all and
can also lead to use-after-free:
static void revocable_provider_release(struct kref *kref)
{
struct revocable_provider *rp = container_of(kref,
struct revocable_provider, kref);
cleanup_srcu_struct(&rp->srcu);
kfree_rcu(rp, rcu);
}
void revocable_provider_revoke(struct revocable_provider __rcu **rp_ptr)
{
struct revocable_provider *rp;
rp = rcu_replace_pointer(*rp_ptr, NULL, 1);
...
kref_put(&rp->kref, revocable_provider_release);
}
int revocable_init(struct revocable_provider __rcu *_rp,
struct revocable *rev)
{
struct revocable_provider *rp;
...
scoped_guard(rcu) {
rp = rcu_dereference(_rp);
if (!rp)
return -ENODEV;
if (!kref_get_unless_zero(&rp->kref))
return -ENODEV;
}
...
}
producer:
priv->rp = revocable_provider_alloc(&priv->res);
// pass priv->rp by value to consumer
revocable_provider_revoke(&priv->rp);
consumer:
struct revocable_provider __rcu *rp = filp->private_data;
struct revocable *rev;
revocable_init(rp, &rev);
as _rp would still be non-NULL in revocable_init() regardless of whether
the producer has revoked the resource and set its pointer to NULL.
Essentially revocable still relies on having a pointer to reference
counted driver data which holds the revocable provider, which makes all
the RCU protection unnecessary along with most of the current revocable
design and implementation.
As the above shows, and as has been pointed out repeatedly elsewhere,
these kind of issues are not something that should be addressed
incrementally. [4]
Revert the revocable implementation until a redesign has been proposed
and evaluated properly.
Link: https://lore.kernel.org/all/20260124170535.11756-4-johan@kernel.org/ [1]
Link: https://lore.kernel.org/all/aXT45B6vLf9R3Pbf@hovoldconsulting.com/ [2]
Link: https://lore.kernel.org/all/20260129143733.45618-1-tzungbi@kernel.org/ [3]
Link: https://lore.kernel.org/all/aXobzoeooJqxMkEj@hovoldconsulting.com/ [4]
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260204142849.22055-4-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This reverts commit cd7693419bb5abd91ad2f407dab69c480e417a61.
The new revocable functionality is fundamentally broken and at a minimum
needs to be redesigned.
Drop the revocable Kunit tests to allow the implementation to be reverted.
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260204142849.22055-3-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This reverts commit 9d4502fef00fa7a798d3c0806d4da4466a7ffc6f.
The new revocable functionality is fundamentally broken and at a minimum
needs to be redesigned.
Drop the revocable selftests to allow the implementation to be reverted.
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260204142849.22055-2-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The failure to find a table of metrics with a CPUID shouldn't early
exit as the metric code will now also consider the default table.
When searching for a metric or metric group,
pmu_metrics_table__for_each_metric() considers all tables and so the
caller doesn't need to switch the table to do this.
Fixes: c7adeb0974f18da4 ("perf jevents: Add set of common metrics based on default ones")
Reviewed-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Leo Yan <leo.yan@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Leo reported 'perf stat' being broken and this highlighted that the
'make NO_JEVENTS=1' variant is missing from 'make -C tools/perf
build-test', add it.
Closes: https://lore.kernel.org/linux-perf-users/20260205175250.GC3529712@e132581.arm.com/
Reported-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Recently 'perf stat' regressed in per CPU mode [1].
Let's expand test coverage to catch the same breakage again as well as
to test the repeat, pid, detailed and no aggregation options.
[1] https://lore.kernel.org/linux-perf-users/cgja46br2smmznxs7kbeabs6zgv3b4olfqgh2fdp5mxk2yom4v@w6jjgov6hdi6/
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andres Freund <andres@anarazel.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Since commit ee27476fa3004f83 ("perf record: Skip don't fail for events
that don't open"), if a user does not have permission to access a PMU
event, perf reports:
perf record -e cs_etm// -C 3 -- ls
Error:
Failure to open event 'cs_etm//u' on PMU 'cs_etm' which will be removed.
No fallback found for 'cs_etm//u' for error 13
Error:
Failure to open event 'dummy:u' on PMU 'software' which will be removed.
No fallback found for 'dummy:u' for error 13
Error:
Failure to open any events for recording.
The log is not very helpful, as no clear indication of what "error 13"
means or how to address the issue.
This commit restores evsel__open_strerror() to generate a readable error
message and print it out:
perf record -e cs_etm// -C 3 -- ls
Error:
Failure to open event 'cs_etm//' on PMU 'cs_etm' which will be removed.
Access to performance monitoring and observability operations is limited.
Consider adjusting /proc/sys/kernel/perf_event_paranoid setting to open
access to performance monitoring and observability operations for processes
without CAP_PERFMON, CAP_SYS_PTRACE or CAP_SYS_ADMIN Linux capability.
More information can be found at 'Perf events and tool security' document:
https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
perf_event_paranoid setting is 1:
-1: Allow use of (almost) all events by all users
Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
>= 0: Disallow raw and ftrace function tracepoint access
>= 1: Disallow CPU event access
>= 2: Disallow kernel profiling
To make the adjusted perf_event_paranoid setting permanent preserve it
in /etc/sysctl.conf (e.g. kernel.perf_event_paranoid = <setting>)
Error:
Failure to open event 'dummy:u' on PMU 'software' which will be removed.
Access to performance monitoring and observability operations is limited.
Consider adjusting /proc/sys/kernel/perf_event_paranoid setting to open
access to performance monitoring and observability operations for processes
without CAP_PERFMON, CAP_SYS_PTRACE or CAP_SYS_ADMIN Linux capability.
More information can be found at 'Perf events and tool security' document:
https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
perf_event_paranoid setting is 1:
-1: Allow use of (almost) all events by all users
Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
>= 0: Disallow raw and ftrace function tracepoint access
>= 1: Disallow CPU event access
>= 2: Disallow kernel profiling
To make the adjusted perf_event_paranoid setting permanent preserve it
in /etc/sysctl.conf (e.g. kernel.perf_event_paranoid = <setting>)
Error:
Failure to open any events for recording.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Introduce support for Thunderbolt (TBT) alternate mode to the UCSI driver.
This allows the driver to manage the entry and exit of TBT altmode by
providing the necessary typec_altmode_ops.
ucsi_altmode_update_active() is invoked when the Connector Partner Changed
bit is set in the GET_CONNECTOR_STATUS data. This ensures that the
alternate mode's active state is synchronized with the current mode the
connector is operating in.
Signed-off-by: Andrei Kuchynski <akuchynski@chromium.org>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://patch.msgid.link/20260206115754.828954-1-akuchynski@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Currently io_uring supports restricting operations on a per-ring basis.
To use those, the ring must be setup in a disabled state by setting
IORING_SETUP_R_DISABLED. Then restrictions can be set for the ring, and
the ring can then be enabled.
This commit adds support for IORING_REGISTER_RESTRICTIONS with ring_fd
== -1, like the other "blind" register opcodes which work on the task
rather than a specific ring. This allows registration of the same kind
of restrictions as can been done on a specific ring, but with the task
itself. Once done, any ring created will inherit these restrictions.
If a restriction filter is registered with a task, then it's inherited
on fork for its children. Children may only further restrict operations,
not extend them.
Inheriting restrictions include both the classic
IORING_REGISTER_RESTRICTIONS based restrictions, as well as the BPF
filters that have been registered with the task via
IORING_REGISTER_BPF_FILTER.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Called when copy_process() is called to copy state to a new child.
Right now this is just a stub, but will be used shortly to properly
handle fork'ing of task based io_uring restrictions.
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This was converted to a _scoped() loop but this of_node_put() was
accidentally left behind which is a double free.
Fixes: aa8cb72c2018 ("mtd: spi-nor: hisi-sfc: Simplify with scoped for each OF child loop")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Pratyush Yadav <pratyush@kernel.org>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
|
|
The commit 1b35a57b1c178 ("sparc32: Kill off software 32-bit multiply/divide
routines") removed the last usage of strtab in funtion module_frob_arch_sections
Therefore, it can be removed now.
Reported-by: kernel test robot <lkp@intel.com>
Cc: sparclinux@vger.kernel.org
Cc: David S. Miller <davem@davemloft.net>
Cc: Andreas Larsson <andreas@gaisler.com>
Signed-off-by: Alex Shi <alexs@kernel.org>
Reviewed-by: Andreas Larsson <andreas@gaisler.com>
Signed-off-by: Andreas Larsson <andreas@gaisler.com>
|
|
arch/sparc/mm/init_64.c: In function 'arch_hugetlb_valid_size':
arch/sparc/mm/init_64.c:361:24: warning: variable 'hv_pgsz_idx'
set but not used [-Wunused-but-set-variable]
361 | unsigned short hv_pgsz_idx;
| ^~~~~~~~~~~
Reported-by: kernel test robot <lkp@intel.com>
Cc: sparclinux@vger.kernel.org
CC: Nitin Gupta <nitin.m.gupta@oracle.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Alex Shi <alexs@kernel.org>
Reviewed-by: Andreas Larsson <andreas@gaisler.com>
Signed-off-by: Andreas Larsson <andreas@gaisler.com>
|
|
Similar in nature to commit ab107276607a ("powerpc: Fix struct termio related ioctl macros").
glibc-2.42 drops the legacy termio struct, but the ioctls.h header still
defines some TC* constants in terms of termio (via sizeof). Hardcode the
values instead.
This fixes building Python for example, which falls over like:
./Modules/termios.c:1119:16: error: invalid application of 'sizeof' to incomplete type 'struct termio'
Link: https://bugs.gentoo.org/961769
Link: https://bugs.gentoo.org/962600
Signed-off-by: Sam James <sam@gentoo.org>
Reviewed-by: Andreas Larsson <andreas@gaisler.com>
Signed-off-by: Andreas Larsson <andreas@gaisler.com>
|
|
Replace snprintf("%s", ...) with the faster and more direct strscpy().
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Andreas Larsson <andreas@gaisler.com>
Signed-off-by: Andreas Larsson <andreas@gaisler.com>
|
|
Add support for the clone3 system call to the SPARC architectures.
The implementation follows the pattern of the original clone syscall.
However, instead of explicitly calling kernel_clone, the clone3
handler calls the generic sys_clone3 handler in kernel/fork.
In case no stack is provided, the parents stack is reused.
The return value convention for clone3 follows the regular kernel return
value convention (in contrast to the original clone/fork on SPARC).
Closes: https://github.com/sparclinux/issues/issues/10
Signed-off-by: Ludwig Rydberg <ludwig.rydberg@gaisler.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Andreas Larsson <andreas@gaisler.com>
Tested-by: Andreas Larsson <andreas@gaisler.com>
Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Link: https://lore.kernel.org/r/20260119144753.27945-3-ludwig.rydberg@gaisler.com
Signed-off-by: Andreas Larsson <andreas@gaisler.com>
|
|
Flush all uncommitted user windows before calling the generic syscall
handlers for clone, fork, and vfork.
Prior to entering the arch common handlers sparc_{clone|fork|vfork}, the
arch-specific syscall wrappers for these syscalls will attempt to flush
all windows (including user windows).
In the window overflow trap handlers on both SPARC{32|64},
if the window can't be stored (i.e due to MMU related faults) the routine
backups the user window and increments a thread counter (wsaved).
By adding a synchronization point after the flush attempt, when fault
handling is enabled, any uncommitted user windows will be flushed.
Link: https://sourceware.org/bugzilla/show_bug.cgi?id=31394
Closes: https://lore.kernel.org/sparclinux/fe5cc47167430007560501aabb28ba154985b661.camel@physik.fu-berlin.de/
Signed-off-by: Andreas Larsson <andreas@gaisler.com>
Signed-off-by: Ludwig Rydberg <ludwig.rydberg@gaisler.com>
Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Link: https://lore.kernel.org/r/20260119144753.27945-2-ludwig.rydberg@gaisler.com
Signed-off-by: Andreas Larsson <andreas@gaisler.com>
|
|
No need to assign "uctl" to itself. Delete it.
Fixes: 55f98ece9939 ("ALSA: oss: Relax __free() variable declarations")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://patch.msgid.link/aYXvm2YoV2yRimhk@stanley.mountain
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
Currently, KVM assumes the minimum of implemented HGEIE bits and
"BIT(gc->guest_index_bits) - 1" as the number of guest files available
across all CPUs. This will not work when CPUs have different number
of guest files because KVM may incorrectly allocate a guest file on a
CPU with fewer guest files.
To address above, during initialization, calculate the number of
available guest interrupt files according to MMIO resources and
constrain the number of guest interrupt files that can be allocated
by KVM.
Signed-off-by: Xu Lu <luxu.kernel@bytedance.com>
Reviewed-by: Nutty Liu <nutty.liu@hotmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Acked-by: Thomas Gleixner <tglx@kernel.org>
Link: https://lore.kernel.org/r/20260104133457.57742-1-luxu.kernel@bytedance.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
Use block mapping if backed by a THP, as implemented in architectures
like ARM and x86_64.
Signed-off-by: Jessica Liu <liu.xuemei1@zte.com.cn>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20251127165137780QbUOVPKPAfWSGAFl5qtRy@zte.com.cn
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
The KVM RISC-V allows Zalasr extensions for Guest/VM so add this
extension to get-reg-list test.
Signed-off-by: Xu Lu <luxu.kernel@bytedance.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20251020042904.32096-1-luxu.kernel@bytedance.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
Extend the KVM ISA extension ONE_REG interface to allow KVM user space
to detect and enable Zalasr extensions for Guest/VM.
Signed-off-by: Xu Lu <luxu.kernel@bytedance.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20251020042457.30915-5-luxu.kernel@bytedance.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
Current vm modes cannot represent riscv guest modes precisely, here add
all 9 combinations of P(56,40,41) x V(57,48,39). Also the default vm
mode is detected on runtime instead of hardcoded one, which might not be
supported on specific machine.
Signed-off-by: Wu Fei <wu.fei9@sanechips.com.cn>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Nutty Liu <nutty.liu@hotmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20251105151442.28767-1-wu.fei9@sanechips.com.cn
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
The KVM RISC-V allows Zilsd and Zclsd extensions for Guest/VM so add
this extension to get-reg-list test.
Signed-off-by: Pincheng Wang <pincheng.plct@isrc.iscas.ac.cn>
Reviewed-by: Nutty Liu <nutty.liu@hotmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20250826162939.1494021-6-pincheng.plct@isrc.iscas.ac.cn
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
Extend the KVM ISA extension ONE_REG interface to allow KVM user space
to detect and enable Zilsd and Zclsd extensions for Guest/VM.
Signed-off-by: Pincheng Wang <pincheng.plct@isrc.iscas.ac.cn>
Reviewed-by: Nutty Liu <nutty.liu@hotmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20250826162939.1494021-5-pincheng.plct@isrc.iscas.ac.cn
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
kvm_riscv_vcpu_aia_imsic_update() assumes that the vCPU IMSIC state has
already been initialized and unconditionally accesses imsic->vsfile_lock.
However, in fuzzed ioctl sequences, the AIA device may be initialized at
the VM level while the per-vCPU IMSIC state is still NULL.
This leads to invalid access when entering the vCPU run loop before
IMSIC initialization has completed.
The crash manifests as:
Unable to handle kernel paging request at virtual address
dfffffff00000006
...
kvm_riscv_vcpu_aia_imsic_update arch/riscv/kvm/aia_imsic.c:801
kvm_riscv_vcpu_aia_update arch/riscv/kvm/aia_device.c:493
kvm_arch_vcpu_ioctl_run arch/riscv/kvm/vcpu.c:927
...
Add a guard to skip the IMSIC update path when imsic_state is NULL. This
allows the vCPU run loop to continue safely.
This issue was discovered during fuzzing of RISC-V KVM code.
Fixes: db8b7e97d6137a ("RISC-V: KVM: Add in-kernel virtualization of AIA IMSIC")
Signed-off-by: Jiakai Xu <xujiakai2025@iscas.ac.cn>
Signed-off-by: Jiakai Xu <jiakaiPeanut@gmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260127084313.3496485-1-xujiakai2025@iscas.ac.cn
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
Add a null pointer check for imsic_state before dereferencing it in
kvm_riscv_aia_imsic_rw_attr(). While the function checks that the
vcpu exists, it doesn't verify that the vcpu's imsic_state has been
initialized, leading to a null pointer dereference when accessed.
The crash manifests as:
Unable to handle kernel paging request at virtual address
dfffffff00000006
...
kvm_riscv_aia_imsic_rw_attr+0x2d8/0x854 arch/riscv/kvm/aia_imsic.c:958
aia_set_attr+0x2ee/0x1726 arch/riscv/kvm/aia_device.c:354
kvm_device_ioctl_attr virt/kvm/kvm_main.c:4744 [inline]
kvm_device_ioctl+0x296/0x374 virt/kvm/kvm_main.c:4761
vfs_ioctl fs/ioctl.c:51 [inline]
...
The fix adds a check to return -ENODEV if imsic_state is NULL and moves
isel assignment after imsic_state NULL check.
Fixes: 5463091a51cfaa ("RISC-V: KVM: Expose IMSIC registers as attributes of AIA irqchip")
Signed-off-by: Jiakai Xu <xujiakai2025@iscas.ac.cn>
Signed-off-by: Jiakai Xu <jiakaiPeanut@gmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260127072219.3366607-1-xujiakai2025@iscas.ac.cn
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
Add a null pointer check for imsic_state before dereferencing it in
kvm_riscv_aia_imsic_has_attr(). While the function checks that the
vcpu exists, it doesn't verify that the vcpu's imsic_state has been
initialized, leading to a null pointer dereference when accessed.
This issue was discovered during fuzzing of RISC-V KVM code. The
crash occurs when userspace calls KVM_HAS_DEVICE_ATTR ioctl on an
AIA IMSIC device before the IMSIC state has been fully initialized
for a vcpu.
The crash manifests as:
Unable to handle kernel paging request at virtual address
dfffffff00000001
...
epc : kvm_riscv_aia_imsic_has_attr+0x464/0x50e
arch/riscv/kvm/aia_imsic.c:998
...
kvm_riscv_aia_imsic_has_attr+0x464/0x50e arch/riscv/kvm/aia_imsic.c:998
aia_has_attr+0x128/0x2bc arch/riscv/kvm/aia_device.c:471
kvm_device_ioctl_attr virt/kvm/kvm_main.c:4722 [inline]
kvm_device_ioctl+0x296/0x374 virt/kvm/kvm_main.c:4739
...
The fix adds a check to return -ENODEV if imsic_state is NULL, which
is consistent with other error handling in the function and prevents
the null pointer dereference.
Fixes: 5463091a51cf ("RISC-V: KVM: Expose IMSIC registers as attributes of AIA irqchip")
Signed-off-by: Jiakai Xu <xujiakai2025@iscas.ac.cn>
Signed-off-by: Jiakai Xu <jiakaiPeanut@gmail.com>
Reviewed-by: Nutty Liu <nutty.liu@hotmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260125143344.2515451-1-xujiakai2025@iscas.ac.cn
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
If execution reaches "ret = 0" assignment in
kvm_riscv_vcpu_pmu_event_info() then it means
kvm_vcpu_write_guest() returned 0 hence ret is
already zero and does not need to be assigned 0.
Fixes: e309fd113b9f ("RISC-V: KVM: Implement get event info function")
Signed-off-by: Qiang Ma <maqianga@uniontech.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20251229072530.3075496-1-maqianga@uniontech.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
The generic CC_CAN_LINK detection does not handle different byte orders.
This may lead to userprogs which are not actually runnable on the target
kernel.
Use architecture-specific logic supporting byte orders instead.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
|
|
uuid=off,index=on required that all upper/lower directories are on the
same filesystem.
Relax the requirement so that only all the lower directories need to be
on the same filesystem.
Reported-by: André Almeida <andrealmeid@igalia.com>
Link: https://lore.kernel.org/r/20260114-tonyk-get_disk_uuid-v1-3-e6a319e25d57@igalia.com/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
Open intervals do not have an end element, in particular an open
interval at the end of the set is hard to validate because of it is
lacking the end element, and interval validation relies on such end
element to perform the checks.
This patch adds a new flag field to struct nft_set_elem, this is not an
issue because this is a temporary object that is allocated in the stack
from the insert/deactivate path. This flag field is used to specify that
this is the last element in this add/delete command.
The last flag is used, in combination with the start element cookie, to
check if there is a partial overlap, eg.
Already exists: 255.255.255.0-255.255.255.254
Add interval: 255.255.255.0-255.255.255.255
~~~~~~~~~~~~~
start element overlap
Basically, the idea is to check for an existing end element in the set
if there is an overlap with an existing start element.
However, the last open interval can come in any position in the add
command, the corner case can get a bit more complicated:
Already exists: 255.255.255.0-255.255.255.254
Add intervals: 255.255.255.0-255.255.255.255,255.255.255.0-255.255.255.254
~~~~~~~~~~~~~
start element overlap
To catch this overlap, annotate that the new start element is a possible
overlap, then report the overlap if the next element is another start
element that confirms that previous element in an open interval at the
end of the set.
For deletions, do not update the start cookie when deleting an open
interval, otherwise this can trigger spurious EEXIST when adding new
elements.
Unfortunately, there is no NFT_SET_ELEM_INTERVAL_OPEN flag which would
make easier to detect open interval overlaps.
Fixes: 7c84d41416d8 ("netfilter: nft_set_rbtree: Detect partial overlaps on insertion")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
The existing partial overlap detection does not check if the elements
belong to the interval, eg.
add element inet x y { 1.1.1.1-2.2.2.2, 4.4.4.4-5.5.5.5 }
add element inet x y { 1.1.1.1-5.5.5.5 } => this should fail: ENOENT
Similar situation occurs with deletions:
add element inet x y { 1.1.1.1-2.2.2.2, 4.4.4.4-5.5.5.5}
delete element inet x y { 1.1.1.1-5.5.5.5 } => this should fail: ENOENT
This currently works via mitigation by nft in userspace, which is
performing the overlap detection before sending the elements to the
kernel. This requires a previous netlink dump of the set content which
slows down incremental updates on interval sets, because a netlink set
content dump is needed.
This patch extends the existing overlap detection to track the most
recent start element that already exists. The pointer to the existing
start element is stored as a cookie (no pointer dereference is ever
possible). If the end element is added and it already exists, then
check that the existing end element is adjacent to the already existing
start element. Similar logic applies to element deactivation.
This patch also annotates the timestamp to identify if start cookie
comes from an older batch, in such case reset it. Otherwise, a failing
create element command leaves the start cookie in place, resulting in
bogus error reporting.
There is still a few more corner cases of overlap detection related to
the open interval that are addressed in follow up patches.
This is address an early design mistake where an interval is expressed
as two elements, using the NFT_SET_ELEM_INTERVAL_END flag, instead of
the more recent NFTA_SET_ELEM_KEY_END attribute that pipapo already
uses.
Fixes: 7c84d41416d8 ("netfilter: nft_set_rbtree: Detect partial overlaps on insertion")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
Userspace provides an optimized representation in case intervals are
adjacent, where the end element is omitted.
The existing partial overlap detection logic skips anonymous set checks
on start elements for this reason.
However, it is possible to add intervals that overlap to this anonymous
where two start elements with the same, eg. A-B, A-C where C < B.
start end
A B
start end
A C
Restore the check on overlapping start elements to report an overlap.
Fixes: c9e6978e2725 ("netfilter: nft_set_rbtree: Switch to node list walk for overlap detection")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
Userspace adds a non-matching null element to the kernel for historical
reasons. This null element is added when the set is populated with
elements. Inclusion of this element is conditional, therefore,
userspace needs to dump the set content to check for its presence.
If the NLM_F_CREATE flag is turned on, this becomes an issue because
kernel bogusly reports EEXIST.
Add special case to ignore NLM_F_CREATE in this case, therefore,
re-adding the nul-element never fails.
Fixes: c016c7e45ddf ("netfilter: nf_tables: honor NLM_F_EXCL flag in set element insertion")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
nft_counter_reset() calls u64_stats_add() with a negative value to reset
the counter. This will work on 64bit archs, hence the negative value
added will wrap as a 64bit value which then can wrap the stat counter as
well.
On 32bit archs, the added negative value will wrap as a 32bit value and
_not_ wrapping the stat counter properly. In most cases, this would just
lead to a very large 32bit value being added to the stat counter.
Fix by introducing u64_stats_sub().
Fixes: 4a1d3acd6ea8 ("netfilter: nft_counter: Use u64_stats_t for statistic.")
Signed-off-by: Anders Grahn <anders.grahn@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
tests/shell/testcases/packetpath/set_match_nomatch_hash_fast
fails on big endian with:
Error: Could not process rule: No such file or directory
reset element ip test s { 244.147.90.126 }
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Fatal: Cannot fetch element "244.147.90.126"
... because the wrong bucket is searched, jhash() and jhash1_word are
not interchangeable on big endian.
Fixes: 3b02b0adc242 ("netfilter: nft_set_hash: fix lookups with fixed size hash on big endian")
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
The script now requires IPV6 tunnel support, enable this.
This should have caught by CI, but as the config option is missin |