Message-ID: <365157831.8949.1711709023834.JavaMail.confluence@host3.pipelinefx.com> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_8948_162421424.1711709023833" ------=_Part_8948_162421424.1711709023833 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html
################################################################=
##############
@RELEASE: 6.3.6
##################################=
############################################
=3D=3D=3D=3D CL 10514 =3D=3D=3D=3D
@FIX: another patch for out-of-o=
rder issue. Fixed unexpected short-circuit evaluation that was happening in=
the startResources() routine
=3D=3D=3D=3D CL 10513 =3D=3D=3D=3D
@FIX: another patch for out-of-o=
rder issue. Fixed unexpected short-circuit evaluation that was happening in=
the startHost() routine
=3D=3D=3D=3D CL 10512 =3D=3D=3D=3D
@INTERNAL: QbJob object's _subjo=
bswaiting data was not being initialized or copied correctly, causing some =
job comparisons based on subjobs waiting counts to unexpectedly fail.
=3D=3D=3D=3D CL 10504 =3D=3D=3D=3D
@INTERNAL: added more log output=
for debugging builds, added more comments while working on out-of-order is=
sue.
ZD: 8198
=3D=3D=3D=3D CL 10477 =3D=3D=3D=3D
@FIX: Another out-of-order fix. =
Jobs at the same numerical and cluster priority should dispatch in the corr=
ect FIFO order now.
The FIFO enforcing should work most of the time, but there still will be=
occasional out-of-order behavior, due to the multi-threaded nature of=
the
supervisor. ("qbshove"-ing the older job should correct=
it, when it's seen)
ZD: 8198
=3D=3D=3D=3D CL 10462 =3D=3D=3D=3D
@FIX: yet yet another fix for ou=
t-of-order dispatch behavior-- eliminate race-condition that would allow lo=
wer priority jobs that were just preempted to get workers before higher-pri=
ority jobs.
See also CL10440 10452
ZD: 8198
=3D=3D=3D=3D CL 10461 =3D=3D=3D=3D
@CHANGE: modified/compacted the =
multi-line "found a duty to replace" logging to be a single line.=
=3D=3D=3D=3D CL 10452 =3D=3D=3D=3D
@FIX: yet another fix for out-of=
-order dispatch behavior-- eliminate race-condition that would allow lower =
priority jobs that were just preempted to get workers before higher-priorit=
y jobs.
See also CL10440
ZD: 8198
=3D=3D=3D=3D CL 10441 =3D=3D=3D=3D
@FIX: killing an already finishe=
d (complete, failed, killed) job leaves the job in the "dying" st=
ate.
=3D=3D=3D=3D CL 10440 =3D=3D=3D=3D
@FIX: another fix for out-of-ord=
er dispatch behavior-- eliminate race-condition that would allow lower prio=
rity jobs that were just preempted to get workers before higher-priority jo=
bs.
ZD: 8198
=3D=3D=3D=3D CL 10429 =3D=3D=3D=3D
@FIX: out-of-order job dispatchi=
ng issue with jobs using the "+" sign with the "host.process=
ors" reservations.
ZD: 8198 8261 8229 8233 8228
=3D=3D=3D=3D CL 10189 =3D=3D=3D=3D
@FIX: timing issue where some wo=
rker resources (host.xyz) would disappear after the worker received a remot=
e config.
@FIX: issue where supervisor tries to dispatch a subjob to a worker with=
insufficient resources (reduced the likeliness of that from happening=
)
@FIX: the above 2 fixes combined should now prevent some of the
out=
-of-priority-order dispatch issues, especially in environments where
w=
orker resources are deployed.
ZD: 7885
=3D=3D=3D=3D CL 10118 =3D=3D=3D=3D
@FIX: fixed issue where agenda t=
imeouts don't work properly on the first agenda item processed by a subjob,=
on Unix (Linux/OSX) workers
=3D=3D=3D=3D CL 10117 =3D=3D=3D=3D
@FIX: fixed issue where agenda i=
tems that fail because of timeout don't get automatically retried via retry=
work
ZD: 7763
=3D=3D=3D=3D CL 10022 =3D=3D=3D=3D
@FIX: modified the worker to onl=
y report to the supe of its host status when subjobs are completely done an=
d removed, and NOT when they are only marked/scheduled for removal.
This was causing jobs to sometimes run out-of-order, especially when the=
re
are many subjobs to each job (such as one subjob per frame), since =
that
situation tends to increase the chance of the supervisor dispatch=
ing the
same subjob to the same worker. The subjob will be dispatched =
to the same
worker, but rejected since the worker thinks it's a duplic=
ate assignment of
a subjob that's being removed (and consequently a lo=
wer priority job will
get the worker's slot, causing out-of-order job =
execution)
ZD: 7601
=3D=3D=3D=3D CL 9903 =3D=3D=3D=3D
@FIX: better message from worker =
when it rejects a dispatched subjob because it's a duplicate (being preempt=
ed or migrated on the same worker)
=3D=3D=3D=3D CL 9838 =3D=3D=3D=3D
@CHANGE: upped the default value =
for supervisor_max_threads to 100, and worker_max_threads to 32
########################################################################=
######
@RELEASE: 6.3.5
##########################################=
####################################
=3D=3D=3D=3D CL 9785 =3D=3D=3D=3D
@FIX: worker issue where desktop =
worker would randomly crash.
ZD: 6778
=3D=3D=3D=3D CL 9730 =3D=3D=3D=3D
@TWEAK: modified so that worker n=
ame and IP print when job is accepted by worker, in assignJob()
=3D=3D=3D=3D CL 9729 =3D=3D=3D=3D
@INTERNAL: changed all calls to q=
bvcout to qbout in the QbDaemon, QbPreforkDaemon and QbDatabaseMysql code, =
so that the timestamp, hostname and pid, are always printed.
=3D=3D=3D=3D CL 9698 =3D=3D=3D=3D
@FIX: fixed false-negative warnin=
g message pertaining to "select() in checkpoint()" seen in supelo=
g.
Examples of these messages:
select() in checkpoint(): Operation timed out
select() in checkpoin=
t(): Interrupted system call
=3D=3D=3D=3D CL 9694 =3D=3D=3D=3D
@FIX: fixed issue with the supe t=
hreads getting tied up on "subjob X seems to be already assigned"=
message.
On a farm with busy workers, the time between the supe dispatching a sub=
job to the worker via assignJob() and the worker reporting that the =
"subjob
is running" can be several seconds to sometimes eve=
n several minutes, which
was causing many supe threads to attempt dis=
patching the same subjob over
and over. All of those threads end up h=
itting the "subjob X seems to be
already assigned... retrying&qu=
ot; message, and get tied up for 3 seconds while
they retry.
BUGZID:
ZD: 6760 7125
=3D=3D=3D=3D CL 9689 =3D=3D=3D=3D
@FIX: fixed bug in clustering alg=
orithm where it incorrectly gave more
weight to a job when the only di=
fference was the last letter in the cluster
specification.
For example, if:
host cluster: /3D/projA
job1 cluster: /3D/pro=
jB
job2 cluster: /3D
job1 was getting more weight than job2, which is incorrect.
BUGZID: 63740
ZD: 7043
=3D=3D=3D=3D CL 9686 =3D=3D=3D=3D
@FIX: using deprecated "wait=
for" attribute with Python api causes qb.submit() to raise a KeyError<=
br />@FIX: properly convert "waitfor" value (jobid integer) to pr=
oper "dependency" string of "link-done-job-<id>"<=
/p>
=3D=3D=3D=3D CL 9676 =3D=3D=3D=3D
@FIX: update documentation and GU=
I help text to show correct "||" syntax for job restrictions list=
.
=3D=3D=3D=3D CL 9662 =3D=3D=3D=3D
@FIX: supervisor was failing post=
flight upgrade scripts on OSX Server, expliclty set the mysql socket to /tm=
p/mysql.sock in /etc/my.cnf and /etc/qb.conf to avoid conflicting with the =
factory-installed default of /var/lib/mysql/mysql.sock
=3D=3D=3D=3D CL 9615 =3D=3D=3D=3D
@FIX: Added code to properly log frames (to supelog and job log) when th= ey go back to "pending" after the processing subjob/worker is fou= nd dead.
@FIX: Added code in the supervisor to retry a failed worker connectionafter a random 5-10 sec sleep/delay, to alleviate network hiccups durin=
g
network commands (kill, preempt, etc. of running subjobs).
ZD: 6760
=3D=3D=3D=3D CL 9614 =3D=3D=3D=3D
@INTERNAL: fixed a small cosmetic=
bug introduced in CL 9606
=3D=3D=3D=3D CL 9607 =3D=3D=3D=3D
@INTERNAL: added converseWorkerWi=
thRetries() and also fixed small bug in the retry loop of converseSubSuperv=
isorWithRetries()
=3D=3D=3D=3D CL 9585 =3D=3D=3D=3D
@FIX: issue where some jobs get s=
tuck in the "dying" state when attempted to be killed
ZD: 6616
=3D=3D=3D=3D CL 9570 =3D=3D=3D=3D
@FIX: improvements to the handlin=
g of GET_LOCK (aka"reserveJob()") timeout situations.
ZD: 6617
=3D=3D=3D=3D CL 9500 =3D=3D=3D=3D
@FIX: Windows Vista/7/2008-R2 ins=
taller - don't error out when installing the worker or supervisor as an Adm=
in-equivalent account during creation of scheduled tasks. Properly remove s=
cheduled tasks during uninstall.
########################################################################=
######
@RELEASE: 6.3.4
##########################################=
####################################
=3D=3D=3D=3D CL 9550 =3D=3D=3D=3D
@FIX: qbwrk.conf files that had a=
ny commented-lines before the first valid template was encountered would ca=
use an exception to be raised, QubeGUI->worker->RMB->Configure (wh=
ich uses qb.updateworkerconfig()) would fail silently
=3D=3D=3D=3D CL 9535 =3D=3D=3D=3D
@NEW: add submit-agenda-timeout-j=
ob.py example python script, to demonstrate submission of a job with frame-=
level timeouts.
ZD: 6099
=3D=3D=3D=3D CL 9530 =3D=3D=3D=3D
@FIX:Submitting paths to shotgun =
no longer depends on the visibility of output paths to the supervisor.
@FIX:Shotgun submission script fails gracefully & logs a reason as to =
why it can't generate a thumbnail when thumbnail creation fails.
=3D=3D=3D=3D CL 9523 =3D=3D=3D=3D
@FIX: fixed issue where the super=
visor fails to correctly track the host assignment for subjobs.
Symptom for this included seeing in the supelog, messages like "sta= tusJob(): aberrant report from worker...", then followed by "subj= ob[xxxx] is assinged to worker[] with mac address[00:00:00:00:00:00]".=
These subjobs would then be in the "running" state, but not as= signed to a worker.
=3D=3D=3D=3D CL 9522 =3D=3D=3D=3D
@FIX: removed code that skipped c=
ode that made local decision on the supe to test for resource reservations,=
for jobs with host.processors set to > 1, delegating the decision-makin=
g to the workers and resulting in more network traffic and latency.
ZD: 6141
=3D=3D=3D=3D CL 9507 =3D=3D=3D=3D
@FIX: added more robust code that=
talks to the SMTP server when sending out email,
to support some emai=
l servers with non-standard response behavior.
ZD: 6209
=3D=3D=3D=3D CL 9504 =3D=3D=3D=3D
@FIX: catch case where sg_path_to=
_frames is part of the Shotgun versionName, but the job has no outputPaths =
for the first frame; fallback to naming the version "job id: 123 jobNa=
me: ..."
=3D=3D=3D=3D CL 9496 =3D=3D=3D=3D
@FIX: catch case when inserting i=
n a new cluster into cluster_dim when more than 1 worker exists in the new =
cluster; occurs during run of regular_slotcount.sql, doesn't prevent new re=
cord from being added, just generates line noise and error emails from cron=
...
=3D=3D=3D=3D CL 9494 =3D=3D=3D=3D
@CHANGE: make explanation of &quo=
t;+ | *" in job/host restrictions less ambiguous
=3D=3D=3D=3D CL 9484 =3D=3D=3D=3D
@FIX: calculate cpu-seconds for a=
genda-based jobs by summing up work times, not subjobs. Better support for =
resetting of the start times for retried work.
=3D=3D=3D=3D CL 9467 =3D=3D=3D=3D
@NEW: add a random offset to the =
startup so that all workers don't report at the same time if they've starte=
d up at the same time.
@CHANGE: don't retrieve job name, it's extraneo=
us and not reported; cuts down the query count by one.
@CHANGE: set wo=
rkname for subjob to job.subid, not subid; easier to detect case where an a=
genda-based job falsely reports not having an agenda, so subjob id won't co=
nflict with a frame number
=3D=3D=3D=3D CL 9463 =3D=3D=3D=3D
@FIX: don't report memory usage i=
n the case where MySQL fails to return a valid agenda name, usually caused =
by timeouts or maxed out connections.
=3D=3D=3D=3D CL 9456 =3D=3D=3D=3D
@FIX: moved the location of QbTab=
leVersion29.cpp (rel-6.3) inside the upgrade_supervisor.vcproj file from th=
e incorrect "Resouces Files" folder to the proper "Source Fi=
les" folder.
It appeared as though the file was missing from the build.
(probabl=
y mostly only cosmetic, but was also was confusing).
=3D=3D=3D=3D CL 9449 =3D=3D=3D=3D
@FIX: fixed issue with removal of=
workers using the mac address (i.e. "qbadmin -worker remove <macad=
dr>") not working properly.
BUGZID: 63447
=3D=3D=3D=3D CL 9446 =3D=3D=3D=3D
@FIX: added "pgrp" modi=
fying support to the supervisor code and the qbmodify() C++ API, qb.modify(=
) Python API, and qb::modify() Perl API routines, and added a "-mpgrp =
<int>" option to the qbmodify command-line tool.
BUGZID: 63680
=3D=3D=3D=3D CL 9442 =3D=3D=3D=3D
@FIX: modified to raise exception=
when parameter "fields" is not of type list.
BUGZID: 63627
ZD: 3998
=3D=3D=3D=3D CL 9440 =3D=3D=3D=3D
@FIX: variables such as $qb::jobi=
d not working in callbacks on Windows
BUGZID: 63686
ZD: 5240
=3D=3D=3D=3D CL 9427 =3D=3D=3D=3D
@FIX: added code to make sure all=
end-of-line in email data are CRLF (not just LF) in accordance to RFC2822.=
This was causing notification emails to not work with some email servers= , as they will not responding, and the communicating supe thread would just= stall.
ZD: 5752
=3D=3D=3D=3D CL 9411 =3D=3D=3D=3D
@FIX: added code to chmod and ope=
n up the file permission of .out and .err files in the job log folder.
This was causing subjobs to fail on systems with "mounted" job= log path, as the supervisor will initially create these files when when a = subjob that previouly never started is retried (the supe writes "qube!= - retry/requeue on blahblah...") under the "root" user's ow= nership with mode 644, and the workers who get the subjobs can't write to i= t.
ZD: 5965
=3D=3D=3D=3D CL 9402 =3D=3D=3D=3D
@FIX: adding "qbhash" c=
ommand to windows.
=3D=3D=3D=3D CL 9395 =3D=3D=3D=3D
@FIX: fixed issue causing the sup=
ervisor to crash at initialization, right after "finding other supes..=
." was printed in the supelog.
The fix was in one of the base commuinication library routines QbConnect= ion::receiveUdp().
Sometimes, unknown/malformed data would be received on the UDP socket, a= nd was causing the code to attempt to access beyond the buffer array (index= out-of-bounds error).
ZD: 5638
BUGZID: 63305
########################################################################=
######
@RELEASE: 6.3.3
##########################################=
####################################
=3D=3D=3D=3D CL 9370 =3D=3D=3D=3D
@FIX: recreate the pfx_dw stored =
procedures and functions on Windows, as the MSI installer wipes them out du=
ring an upgrade.
=3D=3D=3D=3D CL 9342 =3D=3D=3D=3D
@FIX: fixed a supe thread crashin=
g issue, when global_host or license_host resource tracking is used.
ZD: 5749
=3D=3D=3D=3D CL 9334 =3D=3D=3D=3D
@FIX: add error handler for MySQL=
error 1146 "Table 'x' doesn't exist" for work and cpu time calcu=
lations for job data collector script
@NEW: increment datawarehouse ve=
rsion to 10 to allow for installing this patch into existing databases
=3D=3D=3D=3D CL 9325 =3D=3D=3D=3D
@FIX: add qbhash program to be in=
cluded in qube-core RPM package.
BUGZID: 63693
ZD: 5744
=3D=3D=3D=3D CL 9318 =3D=3D=3D=3D
@FIX: fixed crash bugs that were =
introduce when the "dying" state was implemented for 6.3.1.
ZD: 5794
=3D=3D=3D=3D CL 9311 =3D=3D=3D=3D
@FIX: add mail template for auto-=
wrangling emails to the installers
########################################################################=
######
@RELEASE: 6.3.2
##########################################=
####################################
=3D=3D=3D=3D CL 9265 =3D=3D=3D=3D
@FIX: fixed job-level history not=
being recorded into .hst file.
(Bug was introduced in CL9145, 9146)
ZD: 5609
=3D=3D=3D=3D CL 9261 =3D=3D=3D=3D
@CHANGE: cut down on the cmdline =
& cmdrange jobtypes' stdout; don't print 'LOG: ...' lines, make regex s=
ummaries much clearer, change printing or regex's to stderr to make it clea=
rer that they're not actual errors, but rather things being searched for in=
the stderr stream.
=3D=3D=3D=3D CL 9252 =3D=3D=3D=3D
@FIX: properly find qb.conf on Wi=
ndows versions Vista and later when unable to contact the supervisor direct=
ly.
=3D=3D=3D=3D CL 9245 =3D=3D=3D=3D
@FIX: GUI changes to be able to h=
andle when supervisor host goes down, and both supervisor and MySQL server =
are unavailable. Also fix jobList not refreshing on down supervisor.
=3D=3D=3D=3D CL 9241 =3D=3D=3D=3D
@FIX: fix GUI crashbug in MySQLCo=
nnect when supervisor does not answer a qb.ping
=3D=3D=3D=3D CL 9239 =3D=3D=3D=3D
@FIX: global resource tables were=
not getting created in new instances of the datawarehouse db, only on upgr=
ades.
=3D=3D=3D=3D CL 9234 =3D=3D=3D=3D
@FIX: disable permission check of=
worker_logpath, as it was creating false-alarms and putting the worker to =
be in panic mode unnecessarily.
ZD: 5445 5236
BUGZID: 63683
=3D=3D=3D=3D CL 9232 =3D=3D=3D=3D
@FIX: fixed example python code (=
jobSubmit06.py) to work on Windows too.
=3D=3D=3D=3D CL 9211 =3D=3D=3D=3D
@FIX: added code to prevent the Q=
bQueue::getSubjobReadyfindReady() routine from returning the same subjob to=
be dispatched over and over.
This was causing the findSubjobAndReserveJob() and startJob() routines t=
o
hit the "subjob [N] seems to be already assigned" situatio=
n, and cause
threads to enter a long, sometimes semi-infinite, sleep-a=
nd-retry loop.
Fixed by adding code in the startJob() routine to quickly update the sub=
job
status when the the assignJob() returns QB_ASSIGN_OK (i.e., worker=
says it
has accepted the subjob), instead of waiting until the worker=
later reports
that the subjob is "running" via the STATUS_J=
OB message, which can take
more than several seconds on a busy farm.=
p>
Also reduced the number of maximum retries to 3 (MAX_ATTEMPTS), in thesituations where a subjob "seems to be already assigned" or w=
hen a worker
host says it's busy (QB_ASSIGN_BUSY). This prevents the t=
hreads to get
stuck for 10 or more seconds in a sleep-retry loop, and =
allow them to give
up quickly and move on.
ZD: 5449
=3D=3D=3D=3D CL 9198 =3D=3D=3D=3D
@FIX: fixed issue with non-node-l=
ocked licenses ("FF:FF:...") not working (since 6.3.0)
=3D=3D=3D=3D CL 9173 =3D=3D=3D=3D
@FIX: ensure that mail sent by qb=
amdin --emailtest is RFC2822-compliant (no bare LF's, only CRLF)
########################################################################=
######
@RELEASE: 6.3.1
##########################################=
####################################
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
@NEW: Add CentOS/RHEL 6 x64 support
=3D=3D=3D=3D CL 9150 =3D=3D=3D=3D
@INTERNAL: QbDebug::filename(QbSt=
ring) took if statement out, so resetting _filename is allowed
=3D=3D=3D=3D CL 9145 =3D=3D=3D=3D
@FIX: disabled logging to /var/sp=
ool/qube/{host,user}, as it was creating large log files and causing sluggi=
sh performance.
An option to enable these logs may be made available in the future.
=3D=3D=3D=3D CL 9142 =3D=3D=3D=3D
@FIX: fixed issue where global re=
sources tracking drift sand more subjobs than can be accomodated by the act=
ual global resource count is dispatched.
ZD: 5074
=3D=3D=3D=3D CL 9133 =3D=3D=3D=3D
@INTERNAL: CentOS support for &qu=
ot;buildpyc" in rpm/quberpm.pm
=3D=3D=3D=3D CL 9105 =3D=3D=3D=3D
@NEW: A new transitional "dy=
ing" state for jobs that have been ordered to be "killed", b=
ut still being processed by the system
=3D=3D=3D=3D CL 9085 =3D=3D=3D=3D
@INTEG: main -> rel-6.[0,1,2,3=
] CL 9083, 9084
-----
@CHANGE: increase MySQL wait_timeout value =
from default of 8 hours to 36 hours to decrease frequency of "MySQL se=
rver has gone away (2006)" error messages.
@CHANGE: increase MySQ=
L max_allowed_packet value from default of 1MB to 64MB to decrease frequenc=
y of "MySQL server has gone away (2006)" error messages.
=3D=3D=3D=3D CL 9084 =3D=3D=3D=3D
@CHANGE: increase MySQL max_allow=
ed_packet value from default of 1MB to 64MB to decrease frequency of "=
MySQL server has gone away (2006)" error messages.
=3D=3D=3D=3D CL 9066 =3D=3D=3D=3D
@FIX: fixed "cpus" (sub=
job) count inaccuracy when a job's "cpus" was modifed down and th=
en up.
For example, if a job with initially 10 "cpus" was reduced to =
5, then
subsequently increased to 6, the system had inaccurately recom=
puted the
subjob count to be 10.
=3D=3D=3D=3D CL 9058 =3D=3D=3D=3D
@FIX: renaming logs during rotati=
on would fail on Windows
=3D=3D=3D=3D CL 8939 =3D=3D=3D=3D
@FIX: fixed another small "h=
ole" that could cause race-conditions to dispatch a single subjob more=
than once
ZD: 4783
BUGZID: 63657
=3D=3D=3D=3D CL 8937 =3D=3D=3D=3D
@FIX: supe issue where the same s=
ubjob can be dispatched more than once to worker(s).
ZD: 4783
BUGZID: 63657
########################################################################=
######
@RELEASE: 6.3.0
##########################################=
####################################
=3D=3D=3D=3D CL 9013 =3D=3D=3D=3D
@NEW: added description of superv=
isor_job_flags in the qb.conf.template file
=3D=3D=3D=3D CL 9010 =3D=3D=3D=3D
@FIX: fixed memory bloat issue in=
supervisor threads on start up, on farms with many jobs.
In some case=
s, it had been reported that each supe thread was taking up 500+ MB.
=3D=3D=3D=3D CL 8975 =3D=3D=3D=3D
@NEW: add section (8.7) for "=
;externally updatable worker resources and properties" to Administrati=
on.doc
=3D=3D=3D=3D CL 8957 =3D=3D=3D=3D
@NEW: add user name to print to s=
upelog when a worker lock is updated
BUGZID: 63661
ZD: 4860
=3D=3D=3D=3D CL 8949 =3D=3D=3D=3D
@FIX: fix datawarehouse crontab s=
o that 7-day tables are rebuilt twice a day
=3D=3D=3D=3D CL 8948 =3D=3D=3D=3D
@NEW: add global_resource trackin=
g to the datawarehouse
=3D=3D=3D=3D CL 8935 =3D=3D=3D=3D
@FIX: update qb.conf templates to=
show the correct default value for supervisor_default_security
@INTER=
NAL: previous setting was an hex value, which seems to be unsupported now.<=
/p>
=3D=3D=3D=3D CL 8910 =3D=3D=3D=3D
@NEW: add C++ examples for using =
the qbupdateworkerresource(), qbupdateworkerproperties(), qbdeleteworkerres=
ources(), and qbdeleteworkerproperties() routines
=3D=3D=3D=3D CL 8909 =3D=3D=3D=3D
@NEW: add Perl API routines for e=
xternally updated worker resources/properties
* add bindings to perl
** add qb::updateworkerresources() and updat=
eworkerproperties() to perl api
qb::updateworkerresources("shiny=
ambp.local", "host.ooga=3D2/3,host.extern=3D0/10")
qb:=
:updateworkerrproperties("shinyambp.local", "host.oogaprop=
=3D3,host.oogaextprop2=3D11")
** add deleteworkerresources() and deleteworkerproperties() to perl
qb::deleteworkerresources($host, @resources);
qb::deleteworkerresou=
rces("shinyambp.local", "host.extenres", "host.oog=
a");
=3D=3D=3D=3D CL 8901 =3D=3D=3D=3D
@FIX: fixed bug where subjobs wil=
l be retried indefinitely when retrysubjob is set.
BUGZID: 63517
ZD: 2950 4661
=3D=3D=3D=3D CL 8889 =3D=3D=3D=3D
@FIX: fixed issue where the super=
visor kept adding duplicate auto-wrangling and mail callbacks when jobs are=
resubmitted
BUGZID: 63655
ZD: 4661
=3D=3D=3D=3D CL 8886 =3D=3D=3D=3D
@INTEG: rel-6.2 -> main
-=
---
@FIX: properly remove datawarehouse scheduled tasks for round-robi=
n tables
=3D=3D=3D=3D CL 8885 =3D=3D=3D=3D
@FIX: properly remove datawarehou=
se scheduled tasks for round-robin tables
=3D=3D=3D=3D CL 8872 =3D=3D=3D=3D
@FIX: issue introduced in 6.2.1 t=
hat broke callbacks (not being triggered)
=3D=3D=3D=3D CL 8859 =3D=3D=3D=3D
@FIX: add bookmarks (TOC) to Admi=
n docs, update section for qblock to refer to "Users guide" inste=
ad of non-existent "Command Reference"
=3D=3D=3D=3D CL 8857 =3D=3D=3D=3D
@NEW: add externally-updatable wo=
rker resources and properties
BUGZID:
=3D=3D=3D=3D CL 8847 =3D=3D=3D=3D
@CHANGE: upgrade_config tool no l=
onger comments out some of the customized paths in qb.conf
ZD: 4470
=3D=3D=3D=3D CL 8846 =3D=3D=3D=3D
@FIX: supe and worker RPMs now co=
rrectly "require" specific qube-core version (like "6.2-1&qu=
ot;)
BUGZID: 63644
ZD: 4470
=3D=3D=3D=3D CL 8841 =3D=3D=3D=3D
@FIX: issue with supervisor threa=
ds stalling, waiting for NFS I/O on the "mounted" job logs, when =
NFS latency is large.
=3D=3D=3D=3D CL 8840 =3D=3D=3D=3D
@UPDATE: "Use" doc with=
p-agenda documentation
@UPDATE: also added/updated some qbsub example=
s
BUGZID: 63636
=3D=3D=3D=3D CL 8837 =3D=3D=3D=3D
@NEW: add example scripts to demo=
nstrate submission of p-agenda jobs in perl and python
BUGZID: 63636
=3D=3D=3D=3D CL 8836 =3D=3D=3D=3D
@NEW: adding docs for retryworkde=
lay (qbsub option)
=3D=3D=3D=3D CL 8811 =3D=3D=3D=3D
@FIX: fixed worker installer to s=
tart the worker service iff the system has not already turned it OFF via ch=
kconfig.
ZD: 4286
=3D=3D=3D=3D CL 8798 =3D=3D=3D=3D
@NEW: optimization when submittin=
g big groups of jobs via qbsubmit() loaded with callbacks and dependencies<=
br />Fixed reported issue where submission performance will degrade linearl=
y proportional to the number of jobs in the queue.
=3D=3D=3D=3D CL 8795 =3D=3D=3D=3D
@UPDATE: added descriptions of ne=
w/missing qb.conf parameters to the qb.conf.template file, which is used to=
build the default qb.conf.
* added p-agenda params (supe and client)
* added auto-wrangling pa=
rams (supe)
* added per-user/pgrp subjob limit params (supe)
* ad=
ded mail setup params (supe)
* added database setup params (supe)
=3D=3D=3D=3D CL 8794 =3D=3D=3D=3D
@NEW: add p-agenda submission opt=
ions to qbsub (p_agenda, p_priority, and p_cpus), and updated online help t=
ext.
=3D=3D=3D=3D CL 8790 =3D=3D=3D=3D
@CHANGE: Python API qb.reportjob(=
) now takes a subjob object (dict). It can still take just the status (stri=
ng).
This should enable the custom jobtype back-end programmer to pass back s= ubjob-level "resultpackage" data to the supe, for example.
=3D=3D=3D=3D CL 8783 =3D=3D=3D=3D
@NEW: add supervisor_p_agenda_max=
qb.conf parameter, for the site-admin to control the maximum number of p-a=
genda any job can have.
=3D=3D=3D=3D CL 8782 =3D=3D=3D=3D
@NEW: add p_agenda_cpus to enable=
control of the number of "cpus" used for the p-agenda jobs. Defa=
ults to number of p-agenda items.
@CHANGE: removed code that automatically makes a job to become a p-agend=
a job when
p_agenda_priority() is set. The "p_agenda" list o=
r the "p_agenda" job flag must be explicitly
set for a job t=
o be a p-agenda job.
=3D=3D=3D=3D CL 8781 =3D=3D=3D=3D
@CHANGE: if an agenda-based job s=
pecifies the p_agenda_priority, then automatically add the p_agenda flag.=
p>
@CHANGE: added code to check that the job being submitted is an agenda-b= ased one, before doing the p-agenda magic
=3D=3D=3D=3D CL 8775 =3D=3D=3D=3D
@UPDATE: doc update w/ "qbha=
sh" and encrypted DB password descriptions
@UPDATE: Added section for qbhash, and updated section for qblogin.
@UPDATE: section for database_password
BUGZID: 63383 63628 39741
=3D=3D=3D=3D CL 8769 =3D=3D=3D=3D
@NEW: add "qbhash" tool=
, used to generate/display encrypted passwords
@NEW: add "-password" option to qblogin, to specify password i= n a command-line option instead of on the stdin
BUGZD: 63383
=3D=3D=3D=3D CL 8767 =3D=3D=3D=3D
@FIX: install datawarehouse plist=
s on OSX (missing from installer package)
=3D=3D=3D=3D CL 8764 =3D=3D=3D=3D
@NEW: add p-agenda (p-frames, &qu=
ot;p" stands for Priority/Preview/Poster) support, where a select few =
agenda items of a job can be sent at a higher priority for quicker turn aro=
und for previewing purposes.
To use in API: set the "p_agenda" job flag when submitting an = agenda-based job.
Optionally attach a list, job['p_agenda'] in python API, to the job on s= ubmission to explicitly specify the p-agenda items. If not set explicitly, = the system will automatically choose the 1st, last, and middle items to be = rendered at a higher priority.
The priority of the p-agenda items may also be specified on submission, = by setting the job's p_agenda_priority parameter.
p-agenda job support for the standard submission tools (GUI, qbsub) comi= ng shortly.
@NEW: qb.conf parameters: client_p_agenda_priority, supervisor_default_p= _agenda_priority (default 1)
=3D=3D=3D=3D CL 8760 =3D=3D=3D=3D
@UPDATE: Administration.doc with =
details about the new worker_boot_diagnostic_retries and worker_boot_diagno=
stics_retry_interval parameters
BUGZID: 63600
=3D=3D=3D=3D CL 8755 =3D=3D=3D=3D
@FIX: Added worker_boot_diagnosti=
cs_retries and worker_boot_diagnostics_retry_interval
These new configuration parameters tell the worker to automatically retr=
y the boot-time
diagnostic routines for "worker_boot_diagnostics_=
retries" times, with
"worker_boot_diagnostics_retry_interva=
l" seconds of sleep time inbetween the retries.
By default, they =
are set to 1 and 30 (seconds) respectively. These values may be
set in=
the local qb.conf file, or in the qbwrk.conf file.
@FIX: issue where worker will "panic" when proxy settings are = set in the remote qbwrk.conf file.
BUGZID: 63600 63422 63407
ZD: 3650 1638 2035
=3D=3D=3D=3D CL 8743 =3D=3D=3D=3D
@NEW: add qb.frontend package, wi=
ll serve as base class for constructing jobs for new python jobtypes
=3D=3D=3D=3D CL 8727 =3D=3D=3D=3D
@CHANGE: database_password is now=
expected to be encrypted.
Plain text password still works, but if a password has been set up to ac=
cess the MySQL db, site administrators are
recommended, but not requi=
red, that they use "qblogin -display" to generate the encrypted p=
assword, and set
database_password in qb.conf to the encrypted string=
for more security.
BUGZID: 63628
=3D=3D=3D=3D CL 8722 =3D=3D=3D=3D
@NEW: add optional artificial del=
ay before auto-retry of agenda items via "retrywork"
When a failed frame is automatically retried via "retrywork", =
an artificial delay may be inserted before the subjob starts processing it.=
Requested by customers to work around issues with, for example, appli=
cation license contentions.
Submission APIs (C++, Perl, Python) and clients (qbsub, QubeGUI) modifie= d to allow specifying "retrywork_delay" when submitting jobs.
=3D=3D=3D=3D CL 8717 =3D=3D=3D=3D
@FIX: logs written into a &=
quot;hidden" file, in "log/user/.hst", which grows very larg=
e
Actions initiated by the supe (as opposed to a particular user), such as=
"starting a subjob on worker", were logged into this hidden=
".hst"
file. Fixed it so the file has a special folder/name=
,
"__QUBE_SYSTEM__/__QUBE_SYSTEM__.hst".
Also modified code so that if the "user" flag was ommitted fro=
m the
"supervisor_log_flags", then this user action logging =
is disabled
altogether.
BUGZID: 62030
=3D=3D=3D=3D CL 8713 =3D=3D=3D=3D
@FIX: turned off worker debug-lev=
el logging that accidentally made it into the 6.2.0 release.
=3D=3D=3D=3D CL 8712 =3D=3D=3D=3D
@FIX: issue where worker processe=
s will stall when a config field, such as "worker_description" ha=
s quotes in them.
=3D=3D=3D=3D CL 8704 =3D=3D=3D=3D
@FIX: support bash exported funct=
ion definitions, which are saved as multi-line environment variable values<=
/p>
BUGZID: 63624
ZD: 4100
=3D=3D=3D=3D CL 8702 =3D=3D=3D=3D
@NEW: add perl 5.12 and 5.14 supp=
ort for windows x64 and 32-bit.
BUGZID: 63631
=3D=3D=3D=3D CL 8695 =3D=3D=3D=3D
@FIX: export_environment now work=
s properly with built-in cmd* jobtypes
@FIX: cmd* jobtype backends will run jobs in a non-login shell if
=
export_environment flag is set on the job, to avoid overriding of
env=
ironment variables set by the job's submission environment.
@NEW: QbApi::qbsystem() now optionally takes a boolean to specify
=
commands to be run in a login shell.
@CHANGE: By default now, QbApi::qbsystem() will run the given command in=
a
non-login shell.
@NEW: added optional "shell" parameter to QbEnv::setToEnv(user=
, [shell])
method, so the user environment for a non-default shell ca=
n be fetched.
This new method is called from QbWorker::QbUnix.cpp now=
.
BUGZID: 63625
ZD: 4100
=3D=3D=3D=3D CL 8693 =3D=3D=3D=3D
@FIX: fix "ERROR 1290 (HY000=
) at line 31 in file: '.\create_stored_programs.sql'" on new Windows i=
nstallations
@FIX: fix Windows 5.15-beta version specific SQL syntax e=
rror (does not exhibit in later versions of MySQL)
=3D=3D=3D=3D CL 8676 =3D=3D=3D=3D
@FIX: add code to license check r=
outine to validate hostid against all mac addresses on the host, as opposed=
to just the primary one.
Note: this involves changes to the base library (utils/QbList, utils/QbS= erver)
BUGZID: 63621
--
@CHANGE: modify license verification code to =
only run when the license file had been changed, or a new day has arrived, =
or on boot.
The code still checks to modification time of the license file everytime= that a license access is required but most of the logic is now short-circu= ited, if no mod was made to the file.
It turns out to be rather tricky to, say, add a "reread" optio= n to "qbadmin" to only read the license on demand, since all supe= thread must be told to read the file (for quick access, license data is ke= pt in memory of each thread/proc), and such "broadcast" type of i= nstruction to go out to all threads is not supported at the moment.
The optimization being checked in, however, should significantly reduce = the overhead in license-checking nontheless, especially with the new code w= here each license key's hostid is checked against all mac addresses for val= idation.
BUGZID: 63622
=3D=3D=3D=3D CL 8668 =3D=3D=3D=3D
@FIX: fix "/etc/rc.d/init.d/=
supervisor: line 139: [: /var/spool/qube/user/jburk/jburk.hst: binary opera=
tor expected" error message in supervisor startup
BUGZID:63618
=3D=3D=3D=3D CL 8662 =3D=3D=3D=3D
@UPDATE: update doc with mail_fro=
m parameter description.
BUGZID: 63591
=3D=3D=3D=3D CL 8654 =3D=3D=3D=3D
@FIX: Made the qube-core RPM &quo=
t;obsolete" the "qube" package, to
accomodate the chang=
e in RPM package name.
BUGZID: 63611
ZD: 3950
=3D=3D=3D=3D CL 8641 =3D=3D=3D=3D
@FIX: added more details to defau=
lt qb.conf template's description of proxy_nice_value, and also included ex=
planation for Windows.
Also corrected the commented-out default proxy_=
account to "qubeproxy" (from "proxyuser") in the same q=
b.conf.template.
@DOC: update proxy_nice_value doc accordingly.
=3D=3D=3D=3D CL 8610 =3D=3D=3D=3D
@FIX: issue where supe will insta=
ll but not run, due to missing python25.dll file.
=3D=3D=3D=3D CL 8606 =3D=3D=3D=3D
@FIX: The "Start Time" =
parameter for SCHTASKS.EXE (/ST option) must be in hh:mm:ss format for earl=
ier versions of Windows (notably winxp 32).
=3D=3D=3D=3D CL 8598 =3D=3D=3D=3D
@NEW: add sample perl-based submi=
t script that submits jobs with per-work email notification callbacks.
ZD: 3854
=3D=3D=3D=3D CL 8550 =3D=3D=3D=3D
@FIX: rolling back to linking sup=
e against python 2.5 for its embedded interpreter instead of 2.7 to avoid r=
untime linkage issues with 2.7
BUGZID:
=3D=3D=3D=3D CL 8547 =3D=3D=3D=3D
@FIX: added "post" as p=
ossible supervisor_language_flags
@FIX: default for supervisor_manifes=
t_flags should be empty
=3D=3D=3D=3D CL 8535 =3D=3D=3D=3D
@CHANGE:Enhanced shotgun integrat=
ion in job submission.
=3D=3D=3D=3D CL 8503 =3D=3D=3D=3D
@CHANGE: grant access to the PFX_=
*QBTIME* functions to MySQL user "qube_readonly"
@CHANGE: gr=
ant the pfx_dw user all rights to the pfx_stats DB
=3D=3D=3D=3D CL 8483 =3D=3D=3D=3D
@CHANGE: add support for non-cmdr=
ange type backends, don't require qbTokens
=3D=3D=3D=3D CL 8482 =3D=3D=3D=3D
@NEW: a framework for python-base=
d jobtype backends, as well as a base class for jobtypes which use an appli=
cation's embedded python terminal prompt
* for use by the Nuke python=
jobtype (dynamic allocation)
* used by the intra-frame progress in py=
cmdrange
* can be used for Houdini jobytpe dynamic allocation (not man=
tra cmd-line renderer though)
=3D=3D=3D=3D CL 8480 =3D=3D=3D=3D
@FIX: fixed issue with perl API w=
here the system won't respect the "retrywork" specified in jobs p=
rocessed with a perl-based custom jobtype back-end.
@CHANGE: added some useful logging message to print to supelog when retr= ywork is being considered
=3D=3D=3D=3D CL 8472 =3D=3D=3D=3D
@FIX: issue where perl-based cust=
om policy didn't work on some systems.
Embedded perl interpreter had to be initialized much earlier than it was=
, before the supervisor goes into multi-proc, and
before initializing =
customizable modules (algorithm, policy) that rely on it.
ZD: 3718
BUGZID: 63603
=3D=3D=3D=3D CL 8468 =3D=3D=3D=3D
@FIX: (Windows) modified worker m=
emory tracking to store values in KB instead of bytes, to avoid buffer over=
flow.
ZD: 3308
=3D=3D=3D=3D CL 8466 =3D=3D=3D=3D
@FIX: (OSX) modified worker memor=
y tracking to store values in KB instead of bytes, to avoid buffer overflow=
.
ZD: 3308
=3D=3D=3D=3D CL 8465 =3D=3D=3D=3D
@FIX: (Linux) modified worker mem=
ory tracking to store values in KB instead of bytes, to avoid buffer overfl=
ow.
BUGZID: 3308
ZD:
=3D=3D=3D=3D CL 8458 =3D=3D=3D=3D
@NEW: add documentation for the &=
quot;Get Next n Jobs" jobList pagination, document the new behavior of=
the User filterCtrl, since it now serves as both a display and request fil=
ter.
=3D=3D=3D=3D CL 8453 =3D=3D=3D=3D
@NEW: implement the ability to pa=
rse the logs on the fly to determine intra-chunk progress
@INTERNAL: c=
lean up backend base class and backendUtils in preparation of more wide-spr=
ead use
=3D=3D=3D=3D CL 8449 =3D=3D=3D=3D
@NEW: a pure-python implementatio=
n of the cmdrange jobtype; it implements intra-chunk progress by parsing th=
e output stream from the command as it's being written to disk during the c=
ourse of the job, not after the job completes. Progess calculation works on=
both single- and multiple-item agenda jobs.
=3D=3D=3D=3D CL 8436 =3D=3D=3D=3D
@NEW: add doc for per-user/pgrp s=
ubjobs limits