C++: Add some test cases for cpp/wrong-type-format-argument#21421
C++: Add some test cases for cpp/wrong-type-format-argument#21421geoffw0 wants to merge 3 commits intogithub:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds new test coverage for cpp/wrong-type-format-argument to reproduce/track a build-mode-none-style failure mode where size_t can be misinterpreted (e.g., as a function pointer type) and yields nonsensical alert messages.
Changes:
- Adds a buildless-style repro involving
size_tandprintfformat specifiers. - Updates the expected test output for
WrongTypeFormatArgumentsaccordingly. - Adds supporting buildless test inputs to exercise the extractor/query behavior under errors.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| cpp/ql/test/query-tests/Likely Bugs/Format/WrongTypeFormatArguments/Buildless/second.cpp | New/updated repro that exercises the size_t mis-typing scenario in a buildless/error-tolerant extraction context. |
| cpp/ql/test/query-tests/Likely Bugs/Format/WrongTypeFormatArguments/Buildless/first.cpp | Provides a plausible size_t typedef for the multi-TU scenario. |
| cpp/ql/test/query-tests/Likely Bugs/Format/WrongTypeFormatArguments/Buildless/tests.c | Additional buildless/error-handling coverage for format argument typing. |
| cpp/ql/test/query-tests/Likely Bugs/Format/WrongTypeFormatArguments/Buildless/tests2.c | Additional wrapper/multiple-definition scenario coverage. |
| cpp/ql/test/query-tests/Likely Bugs/Format/WrongTypeFormatArguments/Buildless/WrongTypeFormatArguments.expected | Updates expected alerts to capture the reproduced behavior. |
| // defines type size_t plausibly | ||
| typedef unsigned long size_t; |
There was a problem hiding this comment.
Do we need this file at all?
There was a problem hiding this comment.
Without this line we lose two of the results, and they're the two closest to the real world cases I was looking at:
-| second.cpp:13:19:13:19 | s | This format specifier for type 'size_t' does not match the argument type '..(*)(..)'. |
-| second.cpp:14:19:14:19 | s | This format specifier for type 'size_t' does not match the argument type '..(*)(..)'. |
Though we would retain the other results involving function pointer types, for example:
| second.cpp:15:18:15:18 | s | This format specifier for type 'int' does not match the argument type '..(*)(..)'. |
What we can't do is move the definition of size_t into second.cpp. That makes the entire phenomena disappear.
TL;DR: I'd prefer to keep this as it better models real world cases; but it's not strictly necessary to reproduce an issue.
| typedef size_t (*myFunctionPointerType) (); | ||
|
|
||
| void test_size_t() { | ||
| size_t s = 0; |
There was a problem hiding this comment.
Maybe we should add a comment on what our frontend thinks size_t is at this point?
There was a problem hiding this comment.
There are two size_t types in the database, both are typedefs but only one is a typedef to an integral type as we would expect. I think this fits the theory that the definition of myFunctionPointerType is somehow creating an alternative size_t typedef type.
But which type does the variable "s" have? ... it looks like there is no Variable "s" extracted in the database. I only see variables "buffer" and "format" in this file. Thus, there are also no VariableAccesses to "s" in the printf calls, I see FunctionAccesses instead, understandably with function types. This seems to be the direct cause of the spurious results I'm seeing.
If I take the definition of myFunctionPointerType away, the variable "s" then exists with an error type, and it looks like we then don't extract the access to "s" at all (which means we don't get erroneous query results).
So I think we have two paths available to fixing this:
- fix the extraction of
myFunctionPointerTypeso that it doesn't spuriously generate asize_ttypedef, perhaps by simply erroring on this line. This approach has the upside of potentially resolving other related issues. - patch the query to ignore arguments that are accesses to functions with an erroneous type (or just: that are function accesses). This approach has the advantage of being simple and immediate.
| printf("%ld", &buffer[1023] - buffer); // BAD [NOT DETECTED] | ||
| printf("%lld", &buffer[1023] - buffer); // BAD [NOT DETECTED] |
There was a problem hiding this comment.
Because the pointer type is ssize_t not long int, and hose types may or may-not have the same size depending on platform. Though ... in the interests of filtering out results considered noisy, we've been increasingly lenient on this kind of thing in this query. The user may know more about the platform than we do.
There was a problem hiding this comment.
Because the pointer type is ssize_t not long int, and hose types may or may-not have the same size depending on platform.
Ok. I think I misunderstand why we're running into problems with the query. There seem to be at least three reasons going by the tests you're adding here: size_t not defined, size_t defined elsewhere, some other reason related to pointers that I don't yet understand. Is that correct?
There was a problem hiding this comment.
The primary issue is the results where the query claims the argument has a function type '..(*)(..)', when it doesn't. These messages are incorrect and confusing.
Add some test cases for
cpp/wrong-type-format-argument, showing how we can apparently mistakesize_tfor a function pointer type in some circumstances. This is based on an issue I've seen in the wild (in a build-mode-none database).The typedef for
myFunctionPointerTypeis a compiler error as written, but we're simulating a situation where the definition ofsize_tis present, but invisible to the extractor due to BMN. I think the underlying issue may be something to do with how we handle this error case - though there are a few ways the problem could be addressed in the query if necessary.Note that the query is very permissive, amongst other things it already allows integer signed-ness mismatches - so I'm of the opinion that using
%zufor a signedsize_tshould not be flagged. Regardless, the message claiming thesize_tis type..(*)(..)is a problem, unless I'm missing something here.